56 min readAutonomous agents

Implementation Guide: Conduct full due diligence review of a document package and produce a risk report

Step-by-step implementation guide for deploying AI to conduct full due diligence review of a document package and produce a risk report for Legal Services clients.

Hardware Procurement

Attorney Workstation

Attorney Workstation

DellDell Latitude 5550 (i7-1365U, 16GB RAM, 512GB SSD)Qty: 10

$1,350 per unit MSP cost / $1,750 suggested resale

Primary attorney workstation for accessing AI due diligence tools, reviewing documents side-by-side with AI risk analysis, and approving human-in-the-loop checkpoints. 16GB RAM minimum required for smooth operation of Microsoft 365 Copilot, Spellbook Word add-in, and browser-based AI dashboards simultaneously.

Dual Monitor Setup

Dual Monitor Setup

DellDell P2423D (24-inch QHD IPS)Qty: 20

$280 per unit MSP cost / $370 suggested resale

Two monitors per attorney workstation. Left monitor displays source documents from VDR/DMS, right monitor displays AI review pane and risk report. QHD resolution essential for reading dense legal documents without excessive scrolling.

Document Scanner

FujitsuFujitsu fi-8170 ADF ScannerQty: 2

$550 per unit MSP cost / $750 suggested resale

High-speed duplex document scanner for digitizing paper documents that arrive as part of due diligence packages. 70 ppm scan speed handles large document sets efficiently. TWAIN/ISIS drivers integrate with Adobe Acrobat and DMS ingest workflows. Two units provide redundancy and serve separate office areas.

Network Switch Upgrade

UniFi USW-Pro-24-PoE

UbiquitiUSW-Pro-24-PoEQty: 1

$400 MSP cost / $550 suggested resale

Ensures sufficient bandwidth for uploading large document packages (100+ MB per transaction) to cloud AI services and VDRs. PoE supports VoIP phones and access points without separate power infrastructure. Replaces aging consumer-grade switches common in SMB law firms.

Wireless Access Point

UbiquitiUbiquiti UniFi U6 ProQty: 2

$150 per unit MSP cost / $220 suggested resale

Provides reliable Wi-Fi 6 coverage for attorneys working from conference rooms during deal closings. Ensures stable connectivity to cloud AI services from anywhere in the office.

Software Procurement

Microsoft 365 E3

Microsoftper-seat SaaSQty: 10 users

$36/user/month x 10 users = $360/month MSP cost / $450/month suggested resale

Foundation platform providing Exchange Online, SharePoint Online, Teams, Word, and Azure AD/Entra ID for SSO. SharePoint serves as interim document staging area for AI ingest. Word is the primary contract review surface for Spellbook integration. Azure AD provides identity and access management for all AI tools.

Microsoft 365 Copilot

Microsoftper-seat SaaS add-onQty: 10 users

$30/user/month x 10 users = $300/month MSP cost / $400/month suggested resale

AI assistant embedded in Word, Outlook, Teams, and SharePoint. Used for summarizing email threads related to transactions, drafting correspondence about due diligence findings, searching across SharePoint document libraries, and generating meeting notes from deal team calls. Complements but does not replace the specialized legal AI tools.

Clio Manage (Complete Plan)

Clio (Themis Solutions)Complete PlanQty: 10 users

$139/user/month x 10 users = $1,390/month MSP cost / $1,750/month suggested resale

Practice management system providing matter management, time tracking, client communication, and billing. Manage AI (formerly Clio Duo) provides built-in AI capabilities for summarizing matter timelines, drafting client communications, and searching firm knowledge. Critical for logging AI-assisted work hours and generating LEDES billing exports. All due diligence matters are tracked here.

Spellbook

Rally Legalper-seat SaaSQty: 5 power users

$100–$179/user/month = $500–$895/month MSP cost / $700–$1,100/month suggested resale

Word-native AI contract review tool that provides clause-level analysis, risk flagging, missing clause detection, and suggested language. Licensed for the 5 attorneys who most frequently handle due diligence matters. Integrates directly into Microsoft Word ribbon for seamless workflow. Uses GPT-5, Claude, and other leading LLMs. Lowest barrier to entry for SMB firms.

Azure OpenAI Service (GPT-5.4)

Microsoft AzureGPT-5.4Qty: usage-based API

$2.50/million input tokens + $10.00/million output tokens; estimated $500–$1,500/month based on 3–5 DD transactions/month / suggested resale with 25% markup = $625–$1,875/month

Core LLM API powering the custom due diligence orchestration agent. GPT-5.4 provides 128K context window for processing large contract sections. Azure deployment ensures data residency within chosen region, BAA availability for HIPAA-adjacent matters, and compliance with firm security policies. All API calls route through Azure Virtual Network for network isolation.

Azure AI Document Intelligence

Microsoft Azureusage-based API

$1.50–$15.00 per 1,000 pages; estimated $50–$200/month / suggested resale $75–$275/month

OCR and document parsing service that converts scanned PDFs, images, and complex multi-column documents into structured text. Extracts tables, key-value pairs, and form fields. Critical for processing paper documents and poorly formatted PDFs common in due diligence packages. Pre-built models for invoices, receipts, and contracts accelerate processing.

Pinecone Vector Database

PineconeSaaS usage-based

Free tier for development; Standard plan $70–$200/month for production / suggested resale $100–$275/month

Stores vector embeddings of all documents in the due diligence package, enabling semantic search across the entire corpus. The custom DD agent queries Pinecone to find related clauses across different contracts, identify contradictions between documents, and retrieve relevant precedent from the firm's historical DD reports. Serverless deployment eliminates infrastructure management.

iManage Work 10 Cloud

iManageWork 10 CloudQty: 10 users

$39–$50/user/month x 10 users = $390–$500/month MSP cost / $500–$650/month suggested resale

Legal-grade document management system serving as the system of record for all due diligence documents, work product, and final reports. Provides ethical walls, matter-centric organization, version control, and audit trails required for legal compliance. AI agent reads from and writes to iManage via REST API. If the client already has NetDocuments, substitute accordingly.

Adobe Acrobat Pro

Adobeper-seat SaaSQty: 10 users

$23/user/month x 10 users = $230/month MSP cost / $300/month suggested resale

PDF manipulation, OCR, redaction, and Bates stamping for due diligence documents. Attorneys use it to review AI-flagged sections in context, apply redactions before sharing, and prepare final DD report PDFs. Built-in OCR supplements Azure AI Document Intelligence for complex layouts.

Veeam Backup for Microsoft 365

Veeamper-seat SaaSQty: 10 users

$2–$4/user/month x 10 users = $20–$40/month MSP cost / $50–$80/month suggested resale

Backs up all SharePoint, OneDrive, Exchange, and Teams data including AI-generated reports and client communications stored in Microsoft 365. Legal hold and granular restore capabilities support litigation readiness and data retention obligations.

Prerequisites

  • Stable internet connection with minimum 100 Mbps symmetric bandwidth and less than 50ms latency to Azure East US or nearest region
  • Active Microsoft 365 E3 or E5 tenant with Azure AD/Entra ID configured for all attorney and staff accounts
  • Azure subscription with billing configured and OpenAI Service access approved (may require application at https://aka.ms/oai/access)
  • Active iManage Work 10 Cloud or NetDocuments subscription with API access enabled and matter structure configured
  • Clio Manage account provisioned with Complete plan and Manage AI enabled for all users
  • Client engagement letters updated to include AI disclosure language per ABA Formal Opinion 512 — client must have informed consent for AI use on their matters
  • Firm AI usage policy drafted and signed by all attorneys covering: permitted uses, confidentiality obligations, supervision requirements, and prohibited uses per ABA Model Rules 1.1, 1.6, and 5.3
  • Data Processing Agreement (DPA) executed with every AI vendor (Azure, Spellbook, Pinecone) specifying: no training on firm data, data residency requirements, breach notification within 72 hours, and right to deletion
  • Python 3.11+ runtime environment on the MSP's deployment workstation or Azure VM for custom agent deployment
  • Domain name or subdomain for the DD report portal (e.g., dd-reports.firmname.com) with DNS access
  • SSL/TLS certificates for any custom-hosted endpoints
  • Administrative access to firm's firewall/router for allowlisting Azure and SaaS endpoints
  • Fujitsu PaperStream drivers installed on scanning workstations
  • All attorneys who will use the system have completed a baseline AI ethics training (provided by MSP in Phase 4)

Installation Steps

Step 1: Azure Environment Setup and OpenAI Service Provisioning

Create the Azure resource group, configure networking, and provision the Azure OpenAI Service instance that will power the custom due diligence agent. This establishes the AI compute foundation with proper security boundaries.

bash
az login
az group create --name rg-legal-dd-agent --location eastus
az network vnet create --resource-group rg-legal-dd-agent --name vnet-dd-agent --address-prefix 10.0.0.0/16 --subnet-name subnet-ai --subnet-prefix 10.0.1.0/24
az cognitiveservices account create --name oai-legal-dd --resource-group rg-legal-dd-agent --kind OpenAI --sku S0 --location eastus --custom-domain oai-legal-dd
az cognitiveservices account deployment create --name oai-legal-dd --resource-group rg-legal-dd-agent --deployment-name gpt-5.4-dd --model-name gpt-5.4 --model-version 2024-08-06 --model-format OpenAI --sku-capacity 80 --sku-name Standard
az cognitiveservices account deployment create --name oai-legal-dd --resource-group rg-legal-dd-agent --deployment-name text-embedding-3-large --model-name text-embedding-3-large --model-version 1 --model-format OpenAI --sku-capacity 120 --sku-name Standard
az cognitiveservices account keys list --name oai-legal-dd --resource-group rg-legal-dd-agent
Note

Request Azure OpenAI access in advance — approval can take 1–5 business days. The SKU capacity of 80K tokens per minute for GPT-5.4 should handle 3–5 concurrent DD transactions. Increase capacity if the firm processes more than 5 deals simultaneously. Store the API key securely in Azure Key Vault (configured in step 3). Choose the Azure region closest to the firm's office for lowest latency.

Step 2: Azure AI Document Intelligence Provisioning

Deploy the document parsing service that converts scanned PDFs and images into structured text for the AI agent to process. This is critical because due diligence packages frequently contain scanned documents, faxes, and poorly formatted legacy files.

bash
az cognitiveservices account create --name doc-intel-legal-dd --resource-group rg-legal-dd-agent --kind FormRecognizer --sku S0 --location eastus
az cognitiveservices account keys list --name doc-intel-legal-dd --resource-group rg-legal-dd-agent
Note

The S0 tier supports up to 15 concurrent requests. For large DD packages (500+ documents), consider queuing document processing to stay within rate limits. Azure AI Document Intelligence replaces the older Form Recognizer branding but uses the same API endpoints.

Step 3: Azure Key Vault and Security Configuration

Create a centralized secret store for all API keys, connection strings, and certificates. Configure managed identities so the DD agent can access services without embedding credentials in code. This is essential for legal compliance and audit readiness.

Provision Key Vault, store all secrets, assign managed identity, and grant access policy
bash
az keyvault create --name kv-legal-dd --resource-group rg-legal-dd-agent --location eastus --enable-rbac-authorization true
az keyvault secret set --vault-name kv-legal-dd --name AzureOpenAIKey --value <YOUR-OPENAI-KEY>
az keyvault secret set --vault-name kv-legal-dd --name DocIntelKey --value <YOUR-DOC-INTEL-KEY>
az keyvault secret set --vault-name kv-legal-dd --name PineconeApiKey --value <YOUR-PINECONE-KEY>
az keyvault secret set --vault-name kv-legal-dd --name iManageClientSecret --value <YOUR-IMANAGE-SECRET>
az webapp identity assign --resource-group rg-legal-dd-agent --name app-dd-agent
az keyvault set-policy --name kv-legal-dd --object-id <MANAGED-IDENTITY-OID> --secret-permissions get list
Note

Never store API keys in source code or environment variables on developer machines. All secrets must be retrieved at runtime from Key Vault. Enable Azure Key Vault logging to capture all access events for compliance audits. Rotate all keys on a 90-day schedule.

Step 4: Pinecone Vector Database Setup

Create the Pinecone index that stores document embeddings for semantic search across the due diligence corpus. The DD agent uses this to find related clauses across different documents, detect contradictions, and retrieve relevant precedent.

Install Pinecone client and create the legal DD vector index
bash
pip install pinecone-client
python3 -c "
import pinecone
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key='YOUR_PINECONE_API_KEY')
pc.create_index(
    name='legal-dd-corpus',
    dimension=3072,
    metric='cosine',
    spec=ServerlessSpec(
        cloud='aws',
        region='us-east-1'
    )
)
print('Index created successfully')
print(pc.describe_index('legal-dd-corpus'))
"
Note

Dimension 3072 matches the text-embedding-3-large model output. Use cosine similarity for best results with normalized legal text embeddings. The serverless spec on AWS us-east-1 ensures low latency from Azure East US. Each DD matter should use a separate namespace within the index for data isolation: namespace format is 'matter-{CLIO_MATTER_ID}'.

Step 5: iManage API Integration Setup

Configure the iManage REST API connection that allows the DD agent to read documents from and write reports to the firm's document management system. This ensures all work product flows through the DMS for proper matter-centric organization and audit trails.

1
Register an application in iManage Control Center
2
Navigate to: Control Center > API Integrations > Register Application
3
Application Name: DD-Agent-Integration
4
Redirect URI: https://dd-reports.firmname.com/callback
5
Scopes: imanage.document.read, imanage.document.write, imanage.folder.read
6
Record the Client ID and Client Secret
Test the connection
bash
curl -X POST https://cloudimanage.com/auth/oauth2/token \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET&scope=imanage.document.read imanage.document.write'
Verify document access
bash
curl -X GET 'https://cloudimanage.com/work/api/v2/customers/YOUR_CUSTOMER_ID/documents?limit=5' \
  -H 'Authorization: Bearer YOUR_ACCESS_TOKEN'
Note

iManage API access must be requested through the firm's iManage administrator or iManage support. Ensure the integration application is assigned to a service account with appropriate security group membership — do NOT use an individual attorney's credentials. If the firm uses NetDocuments instead of iManage, substitute the NetDocuments REST API (https://api.vault.netvoyage.com) with equivalent OAuth2 configuration.

Step 6: Clio API Integration Setup

Configure the Clio API connection for matter data synchronization, time entry logging, and client/matter validation. The DD agent reads matter metadata from Clio to scope its analysis and writes time entries for AI-assisted review hours.

1
Register application at https://app.clio.com/nc/#/settings/developer_applications
2
Application Name: DD-Agent-Automation
3
Redirect URI: https://dd-reports.firmname.com/clio-callback
4
Scopes: matters:read, activities:write, contacts:read, documents:read
Test OAuth2 flow
bash
curl -X POST https://app.clio.com/oauth/token \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'grant_type=authorization_code&code=AUTH_CODE&redirect_uri=https://dd-reports.firmname.com/clio-callback&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET'
Verify matter access
bash
curl -X GET 'https://app.clio.com/api/v4/matters?fields=id,display_number,description,client' \
  -H 'Authorization: Bearer YOUR_ACCESS_TOKEN'
Note

Clio API rate limit is 600 requests per minute per application. The DD agent should batch API calls and implement exponential backoff. Store the refresh token securely in Key Vault — Clio access tokens expire after 1 hour. Ensure the Clio admin has enabled API access for the firm's subscription tier.

Step 7: Deploy Custom DD Agent Application to Azure App Service

Deploy the Python-based due diligence orchestration agent as an Azure Web App. This application coordinates document ingestion, LLM analysis, vector search, and report generation. It exposes a REST API that attorneys trigger from the firm's internal portal.

Create App Service plan, web app, and configure application settings
bash
az appservice plan create --name plan-dd-agent --resource-group rg-legal-dd-agent --sku B2 --is-linux
az webapp create --resource-group rg-legal-dd-agent --plan plan-dd-agent --name app-dd-agent --runtime 'PYTHON:3.11'
az webapp config appsettings set --resource-group rg-legal-dd-agent --name app-dd-agent --settings \
  AZURE_OPENAI_ENDPOINT=https://oai-legal-dd.openai.azure.com/ \
  AZURE_OPENAI_DEPLOYMENT=gpt-5.4-dd \
  AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-3-large \
  DOC_INTEL_ENDPOINT=https://doc-intel-legal-dd.cognitiveservices.azure.com/ \
  PINECONE_INDEX=legal-dd-corpus \
  PINECONE_ENVIRONMENT=us-east-1 \
  KEY_VAULT_URI=https://kv-legal-dd.vault.azure.net/ \
  CLIO_CLIENT_ID=YOUR_CLIO_CLIENT_ID \
  IMANAGE_CUSTOMER_ID=YOUR_IMANAGE_CUSTOMER_ID \
  FLASK_ENV=production
1
Clone the DD agent repository and deploy
2
Configure custom domain and SSL
Clone repository and deploy application package
bash
git clone https://github.com/your-msp/legal-dd-agent.git
cd legal-dd-agent
az webapp deploy --resource-group rg-legal-dd-agent --name app-dd-agent --src-path . --type zip
Bind custom domain and SSL certificate
bash
az webapp config hostname add --resource-group rg-legal-dd-agent --webapp-name app-dd-agent --hostname dd-reports.firmname.com
az webapp config ssl bind --resource-group rg-legal-dd-agent --name app-dd-agent --certificate-thumbprint YOUR_CERT_THUMBPRINT --ssl-type SNI
Note

The B2 App Service plan (2 vCPU, 3.5 GB RAM) handles typical DD workloads. Scale to P1V3 if processing more than 5 concurrent transactions. Enable Always On to prevent cold starts. Configure deployment slots for zero-downtime updates. All environment variables reference Key Vault — sensitive values like API keys are retrieved at runtime via managed identity, not stored in app settings.

Step 8: Microsoft 365 and Copilot Configuration

Configure Microsoft 365 E3 licenses, enable Copilot add-ons, configure SharePoint document libraries for DD staging, and set up Teams channels for deal team collaboration with AI notifications.

PowerShell: Assign M365 E3 + Copilot licenses
powershell
Connect-MgGraph -Scopes 'User.ReadWrite.All','Organization.Read.All'
$E3Sku = Get-MgSubscribedSku | Where-Object { $_.SkuPartNumber -eq 'SPE_E3' }
$CopilotSku = Get-MgSubscribedSku | Where-Object { $_.SkuPartNumber -eq 'Microsoft_365_Copilot' }
$users = Get-MgUser -Filter "department eq 'Legal'" -All
foreach ($user in $users) {
    Set-MgUserLicense -UserId $user.Id -AddLicenses @(@{SkuId=$E3Sku.SkuId},@{SkuId=$CopilotSku.SkuId}) -RemoveLicenses @()
    Write-Host "Licensed: $($user.DisplayName)"
}
1
Navigate to SharePoint Admin Center > Active Sites > Create > Team Site. Set Site Name: Due Diligence Staging, URL: firmname.sharepoint.com/sites/dd-staging, Privacy: Private, Owners: Lead M&A partner + MSP admin.
2
In Teams Admin Center, create a new team 'Deal Room' with channels per active transaction.
3
Configure incoming webhook for AI agent notifications: Teams > Deal Room > General > Connectors > Incoming Webhook > Name: DD-Agent-Alerts.
Note

Copilot requires E3 or E5 as a prerequisite — it cannot be added to Business Premium. Some firms may prefer E5 for advanced compliance features (eDiscovery Premium, Information Barriers). SharePoint DD staging site should have sensitivity labels applied for 'Highly Confidential' classification. Configure retention policies to preserve DD documents for 7 years per typical legal retention requirements.

Step 9: Spellbook Installation and Configuration

Deploy Spellbook's Microsoft Word add-in for the 5 power-user attorneys who handle most DD work. Configure firm-specific clause libraries and review templates.

1
Open Microsoft Word on each attorney workstation
2
Navigate to Insert > Get Add-ins > Store
3
Search for 'Spellbook' by Rally Legal
4
Click Add and authorize with firm M365 credentials
5
Sign in to Spellbook with firm-specific license key provided by Rally Legal
  • Configure Spellbook organization settings at https://app.spellbook.legal/settings
  • Enable 'Review Mode' as default for DD workflows
  • Upload firm clause library (standard acceptable clauses)
  • Configure risk sensitivity levels: High/Medium/Low
  • Set data handling to 'Do not train on our data'
  • Enable audit logging for all AI interactions
Note

Spellbook works as a Word ribbon extension — no separate application to install. Ensure Word is updated to the latest version (at least build 16.0.17000 or later) for full add-in compatibility. Only license 5 power users initially and expand based on adoption metrics. Spellbook's review mode flags non-standard clauses, missing provisions, and potential risks — complementing the custom DD agent's cross-document analysis.

Step 10: Document Scanner Configuration and OCR Pipeline

Install and configure the Fujitsu fi-8170 scanners with PaperStream software, create scan profiles for due diligence documents, and connect the scanning workflow to the Azure AI Document Intelligence OCR pipeline.

1
Download PaperStream IP driver and PaperStream Capture from https://www.fujitsu.com/global/support/products/computing/peripheral/scanners/fi/software/fi-8170.html
2
Run PaperStreamIP_Setup.exe and PaperStreamCapture_Setup.exe
3
Create DD scan profile in PaperStream Capture with the following settings: Profile Name: Due Diligence Intake | Resolution: 300 DPI (optimal for OCR) | Color Mode: Auto Color Detection | File Format: PDF/A (archival) | Destination: SharePoint DD Staging library via WebDAV | Naming Convention: {Date}_{MatterNumber}_{SequenceNumber}.pdf
4
Test scan-to-SharePoint pipeline: Place test documents in scanner ADF
5
Select 'Due Diligence Intake' profile
6
Scan and verify files appear in SharePoint DD Staging
7
Verify Azure AI Document Intelligence can parse the output
Note

300 DPI provides the best balance of OCR accuracy and file size. Do not scan at 600 DPI unless documents contain very small print or complex diagrams. PDF/A format ensures long-term archival compliance. Train firm staff on proper document preparation: remove staples, orient pages correctly, separate double-sided documents. The OCR pipeline in the custom agent automatically processes these scanned PDFs.

Step 11: Firewall and Network Security Configuration

Configure the firm's firewall to allowlist all necessary cloud service endpoints while maintaining security. Set up conditional access policies in Azure AD to restrict AI tool access to managed devices and approved locations.

1
Allowlist the following endpoints on the firm's firewall: Azure OpenAI: *.openai.azure.com (TCP 443), Azure Document Intelligence: *.cognitiveservices.azure.com (TCP 443), Pinecone: *.pinecone.io (TCP 443), iManage Cloud: *.cloudimanage.com (TCP 443), Clio: *.clio.com, app.clio.com (TCP 443), Spellbook: *.spellbook.legal (TCP 443), Microsoft 365: Per Microsoft published IP ranges, SharePoint: *.sharepoint.com (TCP 443)
2
Configure Conditional Access in Azure AD: Navigate to Azure Portal > Azure AD > Security > Conditional Access
3
Policy 1: Require compliant device for AI tool access — Users: All attorneys | Cloud apps: Azure OpenAI, Clio, Spellbook (via Enterprise Apps) | Conditions: All platforms | Grant: Require device to be marked as compliant
4
Policy 2: Block access from untrusted locations — Named locations: Office IP range + approved VPN range | Block all other locations for legal AI applications
Note

Do not use broad wildcard rules — be specific about which endpoints are allowed. Enable TLS inspection for non-privileged traffic but EXCLUDE attorney-client communications from TLS inspection to avoid privilege waiver issues. Document all firewall rules in the firm's security policy. Review conditional access policies with the managing partner to ensure they do not impede legitimate remote work during deal closings.

Step 12: Veeam Backup Configuration

Configure Veeam Backup for Microsoft 365 to protect all SharePoint DD staging content, OneDrive attorney files, Exchange mailboxes with deal communications, and Teams deal room conversations.

1
Add Microsoft 365 organization via modern app-only authentication
2
Create backup job: 'Legal DD Backup' — Scope: All licensed users + SharePoint DD Staging site, Schedule: Every 4 hours, Retention: 7 years (legal hold compliant)
3
Configure backup repository: Azure Blob Storage (Hot tier) — Container: veeam-legal-backup, Encryption: AES-256 with firm-managed key

Test Restore

1
Delete a test document from SharePoint DD Staging
2
In Veeam console, explore latest restore point
3
Locate and restore the deleted document
4
Verify restoration in SharePoint
Note

7-year retention aligns with most state bar record retention requirements. For matters involving SEC-regulated entities, consider extending to 10 years. Enable backup encryption with a key stored separately from the backup repository. Test restores quarterly and document results for compliance audits.

Step 13: End-to-End Integration Testing

Run a complete test transaction through the entire pipeline: upload documents to iManage, trigger the DD agent, verify vector indexing, review AI analysis, and validate the final risk report output.

1
Upload test DD package to iManage: Create matter folder: TEST-DD-001. Upload 15-20 sample contracts (NDAs, purchase agreements, IP assignments). Include at least one scanned/image PDF to test OCR pipeline.
2
Trigger DD agent via API:
3
Monitor processing:
4
Retrieve report:
Trigger DD agent via API for matter TEST-DD-001
bash
curl -X POST https://dd-reports.firmname.com/api/v1/review \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "matter_id": "TEST-DD-001",
    "clio_matter_id": "12345",
    "imanage_folder_id": "FOLDER_ID",
    "review_type": "full_due_diligence",
    "risk_categories": ["change_of_control", "ip_assignment", "indemnification", "termination", "non_compete", "data_privacy", "governing_law"],
    "report_format": "pdf_and_json",
    "notify_teams_channel": true
  }'
Monitor processing status for TEST-DD-001
bash
curl https://dd-reports.firmname.com/api/v1/review/TEST-DD-001/status
Retrieve final DD report as PDF
bash
curl https://dd-reports.firmname.com/api/v1/review/TEST-DD-001/report -o test-dd-report.pdf
Note

Use real but anonymized documents for testing — synthetic documents may not trigger the same edge cases as actual legal documents. The test should include at least one document with: a change-of-control clause, a non-standard indemnification provision, a missing governing law clause, and a data privacy clause referencing GDPR. Verify that the Teams notification was received in the Deal Room channel. Check that a time entry was created in Clio for the AI-assisted review.

Custom AI Components

Document Ingestion and Preprocessing Pipeline

Type: workflow

Automated pipeline that retrieves documents from iManage or SharePoint, performs OCR on scanned documents via Azure AI Document Intelligence, extracts clean text, chunks documents semantically, generates embeddings via text-embedding-3-large, and stores them in Pinecone with metadata including document type, matter ID, date, and source filename. This pipeline runs automatically when a new DD review is triggered.

Implementation

document_ingestion.py
python
# document_ingestion.py
import os
import io
import hashlib
from typing import List, Dict, Any
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
from openai import AzureOpenAI
from pinecone import Pinecone
import requests
import tiktoken

# Configuration
KEY_VAULT_URI = os.environ['KEY_VAULT_URI']
credential = DefaultAzureCredential()
kv_client = SecretClient(vault_url=KEY_VAULT_URI, credential=credential)

AZURE_OPENAI_ENDPOINT = os.environ['AZURE_OPENAI_ENDPOINT']
AZURE_OPENAI_KEY = kv_client.get_secret('AzureOpenAIKey').value
DOC_INTEL_ENDPOINT = os.environ['DOC_INTEL_ENDPOINT']
DOC_INTEL_KEY = kv_client.get_secret('DocIntelKey').value
PINECONE_API_KEY = kv_client.get_secret('PineconeApiKey').value
PINECONE_INDEX = os.environ['PINECONE_INDEX']

# Initialize clients
oai_client = AzureOpenAI(
    api_key=AZURE_OPENAI_KEY,
    api_version='2024-06-01',
    azure_endpoint=AZURE_OPENAI_ENDPOINT
)
doc_intel_client = DocumentAnalysisClient(
    endpoint=DOC_INTEL_ENDPOINT,
    credential=AzureKeyCredential(DOC_INTEL_KEY)
)
pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(PINECONE_INDEX)
encoding = tiktoken.encoding_for_model('gpt-5.4')

def fetch_documents_from_imanage(access_token: str, customer_id: str, folder_id: str) -> List[Dict]:
    """Retrieve all documents from an iManage folder."""
    headers = {'Authorization': f'Bearer {access_token}'}
    url = f'https://cloudimanage.com/work/api/v2/customers/{customer_id}/folders/{folder_id}/documents'
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    documents = response.json()['data']
    result = []
    for doc in documents:
        doc_url = f'https://cloudimanage.com/work/api/v2/customers/{customer_id}/documents/{doc["id"]}/download'
        doc_response = requests.get(doc_url, headers=headers)
        result.append({
            'id': doc['id'],
            'name': doc['name'],
            'extension': doc.get('extension', 'pdf'),
            'content': doc_response.content,
            'metadata': {
                'author': doc.get('author', 'Unknown'),
                'create_date': doc.get('create_date', ''),
                'document_class': doc.get('class', 'General')
            }
        })
    return result

def ocr_document(content: bytes, filename: str) -> str:
    """Extract text from a document using Azure AI Document Intelligence."""
    poller = doc_intel_client.begin_analyze_document(
        'prebuilt-layout',
        document=io.BytesIO(content)
    )
    result = poller.result()
    full_text = ''
    for page in result.pages:
        for line in page.lines:
            full_text += line.content + '\n'
    # Also extract tables
    for table in result.tables:
        full_text += '\n[TABLE]\n'
        for cell in sorted(table.cells, key=lambda c: (c.row_index, c.column_index)):
            full_text += f'Row {cell.row_index}, Col {cell.column_index}: {cell.content}\n'
        full_text += '[/TABLE]\n'
    return full_text

def semantic_chunk(text: str, max_tokens: int = 1500, overlap_tokens: int = 200) -> List[str]:
    """Split text into semantically meaningful chunks with overlap."""
    paragraphs = text.split('\n\n')
    chunks = []
    current_chunk = ''
    current_tokens = 0
    for para in paragraphs:
        para_tokens = len(encoding.encode(para))
        if current_tokens + para_tokens > max_tokens and current_chunk:
            chunks.append(current_chunk.strip())
            # Overlap: keep last portion
            overlap_text = current_chunk.split('\n\n')[-2:] if len(current_chunk.split('\n\n')) > 2 else []
            current_chunk = '\n\n'.join(overlap_text) + '\n\n' + para
            current_tokens = len(encoding.encode(current_chunk))
        else:
            current_chunk += '\n\n' + para
            current_tokens += para_tokens
    if current_chunk.strip():
        chunks.append(current_chunk.strip())
    return chunks

def generate_embeddings(texts: List[str]) -> List[List[float]]:
    """Generate embeddings using text-embedding-3-large."""
    response = oai_client.embeddings.create(
        input=texts,
        model='text-embedding-3-large'
    )
    return [item.embedding for item in response.data]

def ingest_document_package(matter_id: str, imanage_token: str, customer_id: str, folder_id: str) -> Dict[str, Any]:
    """Full ingestion pipeline for a DD document package."""
    documents = fetch_documents_from_imanage(imanage_token, customer_id, folder_id)
    stats = {'total_documents': len(documents), 'total_chunks': 0, 'total_pages_ocr': 0}
    
    for doc in documents:
        # OCR and text extraction
        text = ocr_document(doc['content'], doc['name'])
        stats['total_pages_ocr'] += text.count('\n') // 50  # Rough page estimate
        
        # Chunk the document
        chunks = semantic_chunk(text)
        stats['total_chunks'] += len(chunks)
        
        # Generate embeddings in batches of 20
        for i in range(0, len(chunks), 20):
            batch = chunks[i:i+20]
            embeddings = generate_embeddings(batch)
            
            # Upsert to Pinecone
            vectors = []
            for j, (chunk, embedding) in enumerate(zip(batch, embeddings)):
                chunk_id = hashlib.sha256(f'{doc["id"]}_{i+j}'.encode()).hexdigest()[:32]
                vectors.append({
                    'id': chunk_id,
                    'values': embedding,
                    'metadata': {
                        'matter_id': matter_id,
                        'document_id': doc['id'],
                        'document_name': doc['name'],
                        'chunk_index': i + j,
                        'text': chunk[:8000],  # Pinecone metadata limit
                        'author': doc['metadata']['author'],
                        'create_date': doc['metadata']['create_date'],
                        'document_class': doc['metadata']['document_class']
                    }
                })
            index.upsert(vectors=vectors, namespace=f'matter-{matter_id}')
    
    return stats

Due Diligence Risk Analysis Agent

Type: agent

The core autonomous agent that orchestrates the multi-step due diligence analysis. It implements a ReAct (Reasoning + Acting) loop that: (1) classifies each document by type, (2) extracts key clauses and provisions using targeted prompts, (3) performs cross-document consistency checks via vector search, (4) identifies risks and anomalies, (5) categorizes findings by severity, and (6) generates the structured risk report. The agent uses tool-calling to access vector search, document retrieval, and classification functions. It includes human-in-the-loop checkpoints where attorneys must approve critical risk assessments before the final report is generated.

Implementation

dd_risk_agent.py
python
# dd_risk_agent.py
import json
import logging
from typing import List, Dict, Any, Optional
from dataclasses import dataclass, field, asdict
from enum import Enum
from openai import AzureOpenAI
from pinecone import Pinecone
import datetime

logger = logging.getLogger(__name__)

class RiskSeverity(str, Enum):
    CRITICAL = 'critical'
    HIGH = 'high'
    MEDIUM = 'medium'
    LOW = 'low'
    INFO = 'informational'

class ReviewStatus(str, Enum):
    PENDING = 'pending'
    IN_PROGRESS = 'in_progress'
    AWAITING_HUMAN_REVIEW = 'awaiting_human_review'
    COMPLETED = 'completed'
    FAILED = 'failed'

@dataclass
class RiskFinding:
    finding_id: str
    category: str
    severity: RiskSeverity
    title: str
    description: str
    affected_documents: List[str]
    relevant_clauses: List[str]
    recommendation: str
    confidence_score: float
    requires_human_review: bool = False
    human_approved: Optional[bool] = None

@dataclass
class DDReviewState:
    matter_id: str
    status: ReviewStatus = ReviewStatus.PENDING
    documents_classified: List[Dict] = field(default_factory=list)
    clause_extractions: List[Dict] = field(default_factory=list)
    cross_doc_findings: List[Dict] = field(default_factory=list)
    risk_findings: List[RiskFinding] = field(default_factory=list)
    human_checkpoints: List[Dict] = field(default_factory=list)
    processing_log: List[str] = field(default_factory=list)

RISK_CATEGORIES = [
    'change_of_control',
    'ip_assignment_and_ownership',
    'indemnification_and_liability',
    'termination_and_expiration',
    'non_compete_and_non_solicit',
    'data_privacy_and_security',
    'governing_law_and_jurisdiction',
    'consent_and_notice_requirements',
    'material_adverse_change',
    'representations_and_warranties',
    'payment_and_financial_terms',
    'insurance_requirements',
    'environmental_compliance',
    'employment_and_labor'
]

SYSTEM_PROMPT = """You are a senior legal due diligence analyst AI agent. Your role is to systematically review documents in a transaction and identify risks, anomalies, missing provisions, and inconsistencies.

You have access to the following tools:
1. classify_document - Classify a document by type and relevance
2. extract_clauses - Extract specific clause types from a document
3. search_related_clauses - Search across all documents for related or contradictory clauses
4. assess_risk - Evaluate a specific finding and assign risk severity
5. request_human_review - Flag a finding for attorney review (use for CRITICAL findings)

IMPORTANT RULES:
- Always cite specific document names and clause text in findings
- Never fabricate clause text - only reference text you have actually retrieved
- Assign confidence scores honestly (0.0-1.0)
- Flag ANY finding rated CRITICAL for human review
- Consider cross-border implications for international transactions
- Check for consistency of defined terms across all documents
- Identify missing standard provisions (e.g., no governing law clause)
- Note any unusual or non-market terms

Risk severity guidelines:
- CRITICAL: Deal-breaker issues requiring immediate attorney attention (e.g., undisclosed litigation, missing IP assignments, change of control triggers that would terminate key contracts)
- HIGH: Significant issues that may affect deal terms or valuation (e.g., broad indemnification obligations, restrictive non-competes, unfavorable termination provisions)
- MEDIUM: Notable issues requiring negotiation or disclosure (e.g., non-standard payment terms, ambiguous definitions, consent requirements for assignment)
- LOW: Minor issues or standard deviations from market norms (e.g., slightly non-standard notice periods, minor formatting inconsistencies)
- INFORMATIONAL: Observations for awareness (e.g., contracts nearing expiration, standard renewal provisions)"""

TOOLS = [
    {
        'type': 'function',
        'function': {
            'name': 'classify_document',
            'description': 'Classify a document by type (contract, corporate filing, financial statement, IP record, etc.) and assess its relevance to the transaction.',
            'parameters': {
                'type': 'object',
                'properties': {
                    'document_name': {'type': 'string'},
                    'document_text_preview': {'type': 'string', 'description': 'First 2000 characters of the document'},
                    'classification': {'type': 'string', 'enum': ['contract', 'amendment', 'corporate_filing', 'financial_statement', 'ip_record', 'real_estate', 'employment_agreement', 'regulatory_filing', 'litigation_record', 'insurance_policy', 'other']},
                    'relevance_score': {'type': 'number', 'minimum': 0, 'maximum': 1},
                    'key_parties': {'type': 'array', 'items': {'type': 'string'}},
                    'effective_date': {'type': 'string'},
                    'expiration_date': {'type': 'string'}
                },
                'required': ['document_name', 'classification', 'relevance_score']
            }
        }
    },
    {
        'type': 'function',
        'function': {
            'name': 'extract_clauses',
            'description': 'Extract specific clause types from a document chunk. Call this for each risk category you need to analyze.',
            'parameters': {
                'type': 'object',
                'properties': {
                    'document_name': {'type': 'string'},
                    'clause_category': {'type': 'string', 'enum': RISK_CATEGORIES},
                    'extracted_text': {'type': 'string', 'description': 'The exact clause text found'},
                    'clause_summary': {'type': 'string'},
                    'is_standard': {'type': 'boolean', 'description': 'Whether this is a market-standard provision'},
                    'notable_deviations': {'type': 'string'}
                },
                'required': ['document_name', 'clause_category', 'extracted_text', 'clause_summary', 'is_standard']
            }
        }
    },
    {
        'type': 'function',
        'function': {
            'name': 'search_related_clauses',
            'description': 'Search the full document corpus for clauses related to a specific topic or that might contradict a given clause.',
            'parameters': {
                'type': 'object',
                'properties': {
                    'query': {'type': 'string', 'description': 'Semantic search query for related clauses'},
                    'category_filter': {'type': 'string'},
                    'top_k': {'type': 'integer', 'default': 10}
                },
                'required': ['query']
            }
        }
    },
    {
        'type': 'function',
        'function': {
            'name': 'assess_risk',
            'description': 'Record a risk finding with severity assessment and recommendation.',
            'parameters': {
                'type': 'object',
                'properties': {
                    'category': {'type': 'string', 'enum': RISK_CATEGORIES},
                    'severity': {'type': 'string', 'enum': ['critical', 'high', 'medium', 'low', 'informational']},
                    'title': {'type': 'string'},
                    'description': {'type': 'string'},
                    'affected_documents': {'type': 'array', 'items': {'type': 'string'}},
                    'relevant_clause_text': {'type': 'array', 'items': {'type': 'string'}},
                    'recommendation': {'type': 'string'},
                    'confidence_score': {'type': 'number', 'minimum': 0, 'maximum': 1}
                },
                'required': ['category', 'severity', 'title', 'description', 'affected_documents', 'recommendation', 'confidence_score']
            }
        }
    },
    {
        'type': 'function',
        'function': {
            'name': 'request_human_review',
            'description': 'Flag a critical finding for mandatory attorney review before including in the final report.',
            'parameters': {
                'type': 'object',
                'properties': {
                    'finding_title': {'type': 'string'},
                    'reason': {'type': 'string'},
                    'urgency': {'type': 'string', 'enum': ['immediate', 'before_report', 'informational']}
                },
                'required': ['finding_title', 'reason', 'urgency']
            }
        }
    }
]

class DDRiskAgent:
    def __init__(self, oai_client: AzureOpenAI, pinecone_index, deployment_name: str = 'gpt-5.4-dd', embedding_deployment: str = 'text-embedding-3-large'):
        self.oai_client = oai_client
        self.index = pinecone_index
        self.deployment = deployment_name
        self.embedding_deployment = embedding_deployment
    
    def _vector_search(self, query: str, matter_id: str, top_k: int = 10, category_filter: str = None) -> List[Dict]:
        """Search the document corpus for relevant chunks."""
        embedding = self.oai_client.embeddings.create(
            input=[query], model=self.embedding_deployment
        ).data[0].embedding
        
        filter_dict = {'matter_id': matter_id}
        if category_filter:
            filter_dict['document_class'] = category_filter
        
        results = self.index.query(
            vector=embedding,
            top_k=top_k,
            include_metadata=True,
            namespace=f'matter-{matter_id}',
            filter=filter_dict
        )
        return [{'score': m.score, 'text': m.metadata.get('text', ''), 'document_name': m.metadata.get('document_name', ''), 'chunk_index': m.metadata.get('chunk_index', 0)} for m in results.matches]
    
    def _execute_tool(self, tool_name: str, arguments: Dict, state: DDReviewState) -> str:
        """Execute a tool call and update state."""
        if tool_name == 'classify_document':
            state.documents_classified.append(arguments)
            return json.dumps({'status': 'classified', 'document': arguments.get('document_name'), 'type': arguments.get('classification')})
        
        elif tool_name == 'extract_clauses':
            state.clause_extractions.append(arguments)
            return json.dumps({'status': 'extracted', 'document': arguments.get('document_name'), 'category': arguments.get('clause_category')})
        
        elif tool_name == 'search_related_clauses':
            results = self._vector_search(
                query=arguments.get('query', ''),
                matter_id=state.matter_id,
                top_k=arguments.get('top_k', 10),
                category_filter=arguments.get('category_filter')
            )
            state.cross_doc_findings.append({'query': arguments.get('query'), 'results_count': len(results)})
            return json.dumps({'results': results})
        
        elif tool_name == 'assess_risk':
            finding = RiskFinding(
                finding_id=f'RF-{len(state.risk_findings)+1:04d}',
                category=arguments.get('category', 'other'),
                severity=RiskSeverity(arguments.get('severity', 'medium')),
                title=arguments.get('title', ''),
                description=arguments.get('description', ''),
                affected_documents=arguments.get('affected_documents', []),
                relevant_clauses=arguments.get('relevant_clause_text', []),
                recommendation=arguments.get('recommendation', ''),
                confidence_score=arguments.get('confidence_score', 0.5),
                requires_human_review=(arguments.get('severity') == 'critical')
            )
            state.risk_findings.append(finding)
            return json.dumps({'status': 'recorded', 'finding_id': finding.finding_id, 'severity': finding.severity.value})
        
        elif tool_name == 'request_human_review':
            checkpoint = {
                'finding_title': arguments.get('finding_title'),
                'reason': arguments.get('reason'),
                'urgency': arguments.get('urgency'),
                'timestamp': datetime.datetime.utcnow().isoformat(),
                'resolved': False
            }
            state.human_checkpoints.append(checkpoint)
            return json.dumps({'status': 'flagged_for_review', 'checkpoint': checkpoint})
        
        return json.dumps({'error': f'Unknown tool: {tool_name}'})
    
    def run_analysis(self, matter_id: str, document_summaries: List[Dict], risk_categories: List[str] = None) -> DDReviewState:
        """Run the full DD analysis agent loop."""
        state = DDReviewState(matter_id=matter_id, status=ReviewStatus.IN_PROGRESS)
        categories = risk_categories or RISK_CATEGORIES
        
        # Build initial context with document summaries
        doc_context = 'DOCUMENTS IN THIS DUE DILIGENCE PACKAGE:\n'
        for doc in document_summaries:
            doc_context += f"- {doc['name']} (first 2000 chars): {doc['preview'][:2000]}\n\n"
        
        user_prompt = f"""Conduct a comprehensive due diligence review of the following document package for matter {matter_id}.

{doc_context}

Analyze these risk categories: {', '.join(categories)}

Procedure:
1. First, classify each document by type and relevance
2. For each relevant document, extract key clauses in each risk category
3. Use search_related_clauses to find cross-document inconsistencies and contradictions
4. Assess each finding with appropriate severity
5. Flag any CRITICAL findings for human review
6. Ensure you check for: missing standard provisions, non-standard terms, cross-document term definition consistency, and assignment/change-of-control implications

Be thorough and systematic. Process every document."""
        
        messages = [
            {'role': 'system', 'content': SYSTEM_PROMPT},
            {'role': 'user', 'content': user_prompt}
        ]
        
        max_iterations = 50  # Safety limit
        iteration = 0
        
        while iteration < max_iterations:
            iteration += 1
            state.processing_log.append(f'Iteration {iteration}')
            
            response = self.oai_client.chat.completions.create(
                model=self.deployment,
                messages=messages,
                tools=TOOLS,
                tool_choice='auto',
                temperature=0.1,
                max_tokens=4096
            )
            
            choice = response.choices[0]
            messages.append(choice.message)
            
            if choice.finish_reason == 'stop':
                state.processing_log.append('Agent completed analysis')
                break
            
            if choice.finish_reason == 'tool_calls':
                for tool_call in choice.message.tool_calls:
                    fn_name = tool_call.function.name
                    fn_args = json.loads(tool_call.function.arguments)
                    state.processing_log.append(f'Tool call: {fn_name}')
                    
                    result = self._execute_tool(fn_name, fn_args, state)
                    messages.append({
                        'role': 'tool',
                        'tool_call_id': tool_call.id,
                        'content': result
                    })
        
        # Determine final status
        if any(cp for cp in state.human_checkpoints if not cp['resolved']):
            state.status = ReviewStatus.AWAITING_HUMAN_REVIEW
        else:
            state.status = ReviewStatus.COMPLETED
        
        return state

Risk Report Generator

Type: workflow Generates a structured PDF and JSON risk report from the DDReviewState produced by the analysis agent. The report includes an executive summary, document inventory, findings organized by severity and category, cross-reference matrix, and recommended next steps. Outputs both a formatted PDF for attorney review and a structured JSON for programmatic consumption and integration with Clio matter notes.

Implementation

report_generator.py
python
# report_generator.py
import json
import datetime
from typing import List, Dict, Any
from dataclasses import asdict
from openai import AzureOpenAI
from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.colors import HexColor
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle, PageBreak
from reportlab.lib.units import inch
from io import BytesIO

class DDReportGenerator:
    def __init__(self, oai_client: AzureOpenAI, deployment: str = 'gpt-5.4-dd'):
        self.oai_client = oai_client
        self.deployment = deployment
    
    def _generate_executive_summary(self, state) -> str:
        """Use LLM to generate a natural-language executive summary."""
        findings_summary = json.dumps([{
            'id': f.finding_id, 'severity': f.severity.value,
            'title': f.title, 'category': f.category
        } for f in state.risk_findings], indent=2)
        
        prompt = f"""Based on the following due diligence findings, write a concise executive summary (3-4 paragraphs) suitable for a senior partner reviewing this DD report.

Matter ID: {state.matter_id}
Total documents reviewed: {len(state.documents_classified)}
Total findings: {len(state.risk_findings)}
Critical findings: {sum(1 for f in state.risk_findings if f.severity.value == 'critical')}
High findings: {sum(1 for f in state.risk_findings if f.severity.value == 'high')}

Findings overview:
{findings_summary}

Write in professional legal tone. Highlight deal-breaker issues first. Note areas that require further investigation. Do NOT provide legal advice — present findings for attorney evaluation."""
        
        response = self.oai_client.chat.completions.create(
            model=self.deployment,
            messages=[{'role': 'user', 'content': prompt}],
            temperature=0.3, max_tokens=2000
        )
        return response.choices[0].message.content
    
    def generate_json_report(self, state) -> Dict[str, Any]:
        """Generate structured JSON report."""
        exec_summary = self._generate_executive_summary(state)
        severity_order = {'critical': 0, 'high': 1, 'medium': 2, 'low': 3, 'informational': 4}
        sorted_findings = sorted(state.risk_findings, key=lambda f: severity_order.get(f.severity.value, 5))
        
        report = {
            'report_metadata': {
                'matter_id': state.matter_id,
                'generated_at': datetime.datetime.utcnow().isoformat(),
                'report_version': '1.0',
                'agent_version': 'dd-agent-v1.0',
                'status': state.status.value,
                'disclaimer': 'This report was generated by an AI due diligence agent and must be reviewed by a licensed attorney before reliance. AI-generated findings may contain errors or omissions. This report does not constitute legal advice.'
            },
            'executive_summary': exec_summary,
            'statistics': {
                'total_documents': len(state.documents_classified),
                'total_findings': len(state.risk_findings),
                'by_severity': {
                    'critical': sum(1 for f in state.risk_findings if f.severity.value == 'critical'),
                    'high': sum(1 for f in state.risk_findings if f.severity.value == 'high'),
                    'medium': sum(1 for f in state.risk_findings if f.severity.value == 'medium'),
                    'low': sum(1 for f in state.risk_findings if f.severity.value == 'low'),
                    'informational': sum(1 for f in state.risk_findings if f.severity.value == 'informational')
                },
                'human_review_required': len([cp for cp in state.human_checkpoints if not cp['resolved']])
            },
            'document_inventory': state.documents_classified,
            'findings': [asdict(f) for f in sorted_findings],
            'human_review_checkpoints': state.human_checkpoints,
            'processing_log_summary': {
                'total_iterations': len(state.processing_log),
                'tool_calls': len([l for l in state.processing_log if 'Tool call' in l])
            }
        }
        return report
    
    def generate_pdf_report(self, state, output_path: str = None) -> bytes:
        """Generate formatted PDF report."""
        buffer = BytesIO()
        doc = SimpleDocTemplate(buffer, pagesize=letter, topMargin=0.75*inch, bottomMargin=0.75*inch)
        styles = getSampleStyleSheet()
        
        # Custom styles
        title_style = ParagraphStyle('DDTitle', parent=styles['Title'], fontSize=18, textColor=HexColor('#1a237e'))
        heading_style = ParagraphStyle('DDHeading', parent=styles['Heading2'], textColor=HexColor('#1a237e'), spaceAfter=12)
        severity_colors = {'critical': '#d32f2f', 'high': '#f57c00', 'medium': '#fbc02d', 'low': '#388e3c', 'informational': '#1976d2'}
        
        story = []
        
        # Title page
        story.append(Paragraph('DUE DILIGENCE RISK REPORT', title_style))
        story.append(Spacer(1, 12))
        story.append(Paragraph(f'Matter: {state.matter_id}', styles['Normal']))
        story.append(Paragraph(f'Generated: {datetime.datetime.utcnow().strftime("%B %d, %Y at %H:%M UTC")}', styles['Normal']))
        story.append(Paragraph('CONFIDENTIAL — ATTORNEY WORK PRODUCT', ParagraphStyle('Confidential', parent=styles['Normal'], textColor=HexColor('#d32f2f'), fontSize=12, spaceAfter=24)))
        story.append(Paragraph('DISCLAIMER: This report was generated by an AI system and must be reviewed and validated by a licensed attorney before any reliance. Findings may contain errors or omissions. This does not constitute legal advice.', ParagraphStyle('Disclaimer', parent=styles['Normal'], backColor=HexColor('#fff3e0'), fontSize=9, spaceAfter=24)))
        
        # Executive Summary
        exec_summary = self._generate_executive_summary(state)
        story.append(Paragraph('EXECUTIVE SUMMARY', heading_style))
        for para in exec_summary.split('\n\n'):
            story.append(Paragraph(para, styles['Normal']))
            story.append(Spacer(1, 6))
        
        # Statistics table
        story.append(PageBreak())
        story.append(Paragraph('FINDINGS SUMMARY', heading_style))
        stats_data = [['Severity', 'Count']]
        for sev in ['critical', 'high', 'medium', 'low', 'informational']:
            count = sum(1 for f in state.risk_findings if f.severity.value == sev)
            stats_data.append([sev.upper(), str(count)])
        stats_table = Table(stats_data, colWidths=[3*inch, 2*inch])
        stats_table.setStyle(TableStyle([
            ('BACKGROUND', (0, 0), (-1, 0), HexColor('#1a237e')),
            ('TEXTCOLOR', (0, 0), (-1, 0), HexColor('#ffffff')),
            ('GRID', (0, 0), (-1, -1), 0.5, HexColor('#cccccc')),
            ('FONTSIZE', (0, 0), (-1, -1), 10),
            ('PADDING', (0, 0), (-1, -1), 8)
        ]))
        story.append(stats_table)
        story.append(Spacer(1, 24))
        
        # Detailed findings
        story.append(Paragraph('DETAILED FINDINGS', heading_style))
        severity_order = {'critical': 0, 'high': 1, 'medium': 2, 'low': 3, 'informational': 4}
        for finding in sorted(state.risk_findings, key=lambda f: severity_order.get(f.severity.value, 5)):
            color = severity_colors.get(finding.severity.value, '#000000')
            story.append(Paragraph(f'<font color="{color}"><b>[{finding.severity.value.upper()}]</b></font> {finding.finding_id}: {finding.title}', styles['Heading3']))
            story.append(Paragraph(f'<b>Category:</b> {finding.category.replace("_", " ").title()}', styles['Normal']))
            story.append(Paragraph(f'<b>Description:</b> {finding.description}', styles['Normal']))
            story.append(Paragraph(f'<b>Affected Documents:</b> {", ".join(finding.affected_documents)}', styles['Normal']))
            story.append(Paragraph(f'<b>Recommendation:</b> {finding.recommendation}', styles['Normal']))
            story.append(Paragraph(f'<b>Confidence:</b> {finding.confidence_score:.0%}', styles['Normal']))
            if finding.requires_human_review:
                story.append(Paragraph('<font color="#d32f2f"><b>⚠ REQUIRES ATTORNEY REVIEW BEFORE FINALIZATION</b></font>', styles['Normal']))
            story.append(Spacer(1, 16))
        
        doc.build(story)
        pdf_bytes = buffer.getvalue()
        
        if output_path:
            with open(output_path, 'wb') as f:
                f.write(pdf_bytes)
        
        return pdf_bytes

Human-in-the-Loop Review Portal

Type: integration A lightweight Flask web application that serves as the attorney review interface. Attorneys receive Teams/email notifications when the DD agent flags critical findings for human review. They access this portal to review flagged findings, approve or modify risk assessments, add comments, and authorize final report generation. This satisfies ABA Model Rule 5.3 supervisory obligations.

Implementation

review_portal.py
python
# review_portal.py
from flask import Flask, request, jsonify, render_template_string
from functools import wraps
import jwt
import datetime
import json
from typing import Dict
import requests

app = Flask(__name__)
app.config['SECRET_KEY'] = 'loaded-from-key-vault-at-startup'

# In-memory store (replace with Azure Cosmos DB or PostgreSQL in production)
review_queue: Dict[str, dict] = {}

REVIEW_PAGE_TEMPLATE = """
<!DOCTYPE html>
<html>
<head><title>DD Review Portal - {{ matter_id }}</title>
<style>
  body { font-family: 'Segoe UI', sans-serif; max-width: 900px; margin: 0 auto; padding: 20px; }
  .finding { border: 1px solid #ddd; border-radius: 8px; padding: 16px; margin: 12px 0; }
  .critical { border-left: 4px solid #d32f2f; }
  .high { border-left: 4px solid #f57c00; }
  .badge { display: inline-block; padding: 2px 8px; border-radius: 4px; color: white; font-size: 12px; }
  .badge-critical { background: #d32f2f; }
  .badge-high { background: #f57c00; }
  .btn { padding: 8px 16px; border: none; border-radius: 4px; cursor: pointer; margin: 4px; }
  .btn-approve { background: #388e3c; color: white; }
  .btn-modify { background: #1976d2; color: white; }
  .btn-reject { background: #d32f2f; color: white; }
  textarea { width: 100%; min-height: 60px; margin: 8px 0; }
  .disclaimer { background: #fff3e0; padding: 12px; border-radius: 4px; margin-bottom: 20px; }
</style></head>
<body>
  <h1>Due Diligence Review Portal</h1>
  <p>Matter: <strong>{{ matter_id }}</strong></p>
  <div class='disclaimer'>Items below have been flagged by the AI agent as requiring attorney review per ABA Model Rule 5.3. Please review each finding, verify accuracy against source documents, and approve, modify, or reject.</div>
  {% for finding in findings %}
  <div class='finding {{ finding.severity }}'>
    <span class='badge badge-{{ finding.severity }}'>{{ finding.severity | upper }}</span>
    <h3>{{ finding.finding_id }}: {{ finding.title }}</h3>
    <p><strong>Category:</strong> {{ finding.category }}</p>
    <p><strong>Description:</strong> {{ finding.description }}</p>
    <p><strong>Affected Documents:</strong> {{ finding.affected_documents | join(', ') }}</p>
    <p><strong>AI Recommendation:</strong> {{ finding.recommendation }}</p>
    <p><strong>AI Confidence:</strong> {{ (finding.confidence_score * 100) | int }}%</p>
    <form method='POST' action='/api/v1/review/{{ matter_id }}/findings/{{ finding.finding_id }}'>
      <textarea name='attorney_notes' placeholder='Attorney notes (required for modifications)'></textarea>
      <select name='modified_severity'>
        <option value=''>Keep current severity</option>
        <option value='critical'>Critical</option>
        <option value='high'>High</option>
        <option value='medium'>Medium</option>
        <option value='low'>Low</option>
        <option value='informational'>Informational</option>
      </select>
      <br>
      <button type='submit' name='action' value='approve' class='btn btn-approve'>✓ Approve Finding</button>
      <button type='submit' name='action' value='modify' class='btn btn-modify'>✎ Modify & Approve</button>
      <button type='submit' name='action' value='reject' class='btn btn-reject'>✗ Reject Finding</button>
    </form>
  </div>
  {% endfor %}
  <form method='POST' action='/api/v1/review/{{ matter_id }}/finalize'>
    <button type='submit' class='btn btn-approve' style='font-size:16px; padding:12px 24px; margin-top:24px;'>Generate Final Report</button>
  </form>
</body></html>
"""

def require_auth(f):
    @wraps(f)
    def decorated(*args, **kwargs):
        token = request.headers.get('Authorization', '').replace('Bearer ', '')
        if not token:
            token = request.cookies.get('auth_token', '')
        try:
            payload = jwt.decode(token, app.config['SECRET_KEY'], algorithms=['HS256'])
            request.user = payload
        except jwt.InvalidTokenError:
            return jsonify({'error': 'Authentication required'}), 401
        return f(*args, **kwargs)
    return decorated

@app.route('/api/v1/review/<matter_id>', methods=['GET'])
@require_auth
def get_review_page(matter_id):
    if matter_id not in review_queue:
        return jsonify({'error': 'Matter not found'}), 404
    state = review_queue[matter_id]
    flagged = [f for f in state['findings'] if f.get('requires_human_review', False) and not f.get('human_approved')]
    return render_template_string(REVIEW_PAGE_TEMPLATE, matter_id=matter_id, findings=flagged)

@app.route('/api/v1/review/<matter_id>/findings/<finding_id>', methods=['POST'])
@require_auth
def review_finding(matter_id, finding_id):
    action = request.form.get('action')
    notes = request.form.get('attorney_notes', '')
    modified_severity = request.form.get('modified_severity', '')
    state = review_queue.get(matter_id)
    if not state:
        return jsonify({'error': 'Matter not found'}), 404
    for finding in state['findings']:
        if finding['finding_id'] == finding_id:
            finding['human_approved'] = (action in ['approve', 'modify'])
            finding['attorney_notes'] = notes
            finding['reviewed_by'] = request.user.get('email', 'unknown')
            finding['reviewed_at'] = datetime.datetime.utcnow().isoformat()
            if action == 'modify' and modified_severity:
                finding['severity'] = modified_severity
            if action == 'reject':
                finding['rejected'] = True
            break
    return jsonify({'status': 'updated', 'finding_id': finding_id, 'action': action})

@app.route('/api/v1/review/<matter_id>/finalize', methods=['POST'])
@require_auth
def finalize_report(matter_id):
    state = review_queue.get(matter_id)
    if not state:
        return jsonify({'error': 'Matter not found'}), 404
    unreviewed = [f for f in state['findings'] if f.get('requires_human_review') and not f.get('human_approved') and not f.get('rejected')]
    if unreviewed:
        return jsonify({'error': f'{len(unreviewed)} findings still require review', 'unreviewed_ids': [f['finding_id'] for f in unreviewed]}), 400
    # Trigger report generation (calls report_generator)
    state['finalized'] = True
    state['finalized_by'] = request.user.get('email')
    state['finalized_at'] = datetime.datetime.utcnow().isoformat()
    return jsonify({'status': 'finalized', 'message': 'Report generation initiated'})

def send_teams_notification(webhook_url: str, matter_id: str, critical_count: int, total_count: int):
    """Send notification to Teams deal room when review is needed."""
    card = {
        '@type': 'MessageCard',
        'themeColor': 'd32f2f' if critical_count > 0 else 'f57c00',
        'summary': f'DD Review Required: {matter_id}',
        'sections': [{
            'activityTitle': f'Due Diligence Review Required: {matter_id}',
            'facts': [
                {'name': 'Total Findings', 'value': str(total_count)},
                {'name': 'Critical (Needs Review)', 'value': str(critical_count)},
                {'name': 'Generated', 'value': datetime.datetime.utcnow().strftime('%Y-%m-%d %H:%M UTC')}
            ],
            'markdown': True
        }],
        'potentialAction': [{
            '@type': 'OpenUri',
            'name': 'Open Review Portal',
            'targets': [{'os': 'default', 'uri': f'https://dd-reports.firmname.com/api/v1/review/{matter_id}'}]
        }]
    }
    requests.post(webhook_url, json=card)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

Clio Time Entry and Matter Sync Integration

Type: integration Automatically logs time entries in Clio for AI-assisted due diligence work, syncs matter metadata for agent context, and posts DD report summaries as matter notes. This ensures proper billing attribution and matter documentation.

Implementation

clio_integration.py
python
# Clio API integration for time entry logging, matter note posting, and DD
# completion sync

# clio_integration.py
import requests
import datetime
from typing import Dict, Optional

class ClioIntegration:
    BASE_URL = 'https://app.clio.com/api/v4'
    
    def __init__(self, access_token: str, refresh_token: str, client_id: str, client_secret: str):
        self.access_token = access_token
        self.refresh_token = refresh_token
        self.client_id = client_id
        self.client_secret = client_secret
    
    def _headers(self) -> Dict:
        return {'Authorization': f'Bearer {self.access_token}', 'Content-Type': 'application/json'}
    
    def _refresh_token_if_needed(self):
        """Refresh OAuth token if expired."""
        response = requests.post(f'{self.BASE_URL}/../oauth/token', data={
            'grant_type': 'refresh_token',
            'refresh_token': self.refresh_token,
            'client_id': self.client_id,
            'client_secret': self.client_secret
        })
        if response.ok:
            data = response.json()
            self.access_token = data['access_token']
            self.refresh_token = data['refresh_token']
    
    def get_matter(self, matter_id: str) -> Dict:
        """Retrieve matter details for agent context."""
        response = requests.get(
            f'{self.BASE_URL}/matters/{matter_id}',
            headers=self._headers(),
            params={'fields': 'id,display_number,description,client,practice_area,status,custom_field_values'}
        )
        response.raise_for_status()
        return response.json()['data']
    
    def log_time_entry(self, matter_id: str, user_id: str, duration_seconds: int, description: str, activity_description_id: Optional[str] = None) -> Dict:
        """Log a time entry for AI-assisted DD work."""
        entry = {
            'data': {
                'date': datetime.date.today().isoformat(),
                'quantity': duration_seconds,
                'quantity_in_hours': round(duration_seconds / 3600, 2),
                'type': 'TimeEntry',
                'note': f'[AI-Assisted] {description}',
                'matter': {'id': int(matter_id)},
                'user': {'id': int(user_id)}
            }
        }
        if activity_description_id:
            entry['data']['activity_description'] = {'id': int(activity_description_id)}
        
        response = requests.post(f'{self.BASE_URL}/activities', headers=self._headers(), json=entry)
        response.raise_for_status()
        return response.json()['data']
    
    def post_matter_note(self, matter_id: str, subject: str, detail: str) -> Dict:
        """Post a note to the matter with DD report summary."""
        note = {
            'data': {
                'subject': subject,
                'detail': detail,
                'type': 'Note',
                'matter': {'id': int(matter_id)},
                'date': datetime.date.today().isoformat()
            }
        }
        response = requests.post(f'{self.BASE_URL}/notes', headers=self._headers(), json=note)
        response.raise_for_status()
        return response.json()['data']
    
    def sync_dd_completion(self, matter_id: str, report_summary: Dict, processing_time_seconds: int, reviewing_attorney_user_id: str):
        """Complete sync: log time, post note, update matter."""
        # Log AI processing time
        self.log_time_entry(
            matter_id=matter_id,
            user_id=reviewing_attorney_user_id,
            duration_seconds=processing_time_seconds,
            description=f'AI due diligence review - {report_summary["statistics"]["total_documents"]} documents analyzed, {report_summary["statistics"]["total_findings"]} findings identified'
        )
        
        # Post summary note
        findings_text = f"""Due Diligence AI Review Completed
        
Documents Reviewed: {report_summary['statistics']['total_documents']}
Total Findings: {report_summary['statistics']['total_findings']}
Critical: {report_summary['statistics']['by_severity']['critical']}
High: {report_summary['statistics']['by_severity']['high']}
Medium: {report_summary['statistics']['by_severity']['medium']}
Low: {report_summary['statistics']['by_severity']['low']}

Executive Summary:
{report_summary['executive_summary'][:2000]}

Full report saved to iManage matter folder."""
        
        self.post_matter_note(
            matter_id=matter_id,
            subject=f'DD Risk Report - {report_summary["statistics"]["total_findings"]} Findings',
            detail=findings_text
        )

DD Agent Orchestrator Prompt Library

Type: prompt A versioned library of specialized prompts used by the DD agent for different analysis stages. Each prompt is optimized for legal due diligence accuracy and includes grounding instructions to prevent hallucination. Prompts are stored as configuration files and can be updated without redeploying the agent application.

Implementation

prompts/dd_prompts.yaml
yaml
# Versioned prompt library for DD agent analysis stages

# prompts/dd_prompts.yaml
# Version: 1.0.0
# Last Updated: 2025-07-30
# Reviewed By: [MSP Legal AI Lead]

prompt_library:
  document_classification:
    version: '1.0'
    system: |
      You are a legal document classifier specializing in M&A due diligence.
      Classify the document into one of these categories:
      - contract: Any binding agreement between parties
      - amendment: Modification to an existing contract
      - corporate_filing: Articles of incorporation, bylaws, board resolutions, annual reports
      - financial_statement: Balance sheets, income statements, audited financials
      - ip_record: Patents, trademarks, copyrights, IP assignments, license agreements
      - real_estate: Leases, deeds, environmental reports
      - employment_agreement: Offer letters, employment contracts, severance agreements
      - regulatory_filing: Government filings, permits, licenses
      - litigation_record: Court filings, settlement agreements, demand letters
      - insurance_policy: Insurance certificates, policy documents
      - other: Documents that don't fit above categories
      
      Also extract: key parties, effective date, expiration date, governing law.
      If you cannot determine a field with confidence, respond with 'UNDETERMINED' rather than guessing.
    temperature: 0.1
    max_tokens: 1000

  clause_extraction:
    version: '1.0'
    system: |
      You are a senior contract analyst extracting specific clause types from legal documents.
      
      CRITICAL RULES:
      1. Only extract text that is ACTUALLY PRESENT in the provided document chunk
      2. Never generate, infer, or fabricate clause text
      3. If a clause type is not present in this chunk, respond with 'NOT FOUND IN THIS SECTION'
      4. Quote the exact language - do not paraphrase
      5. Note any defined terms that are referenced but defined elsewhere
      6. Identify if the clause is mutual or one-sided
      7. Flag any carve-outs or exceptions within the clause
      
      For each extracted clause, assess:
      - Is this market-standard language? (Yes/No/Partially)
      - What specific deviations from market standard exist?
      - Are there any ambiguities in the language?
      - Does this clause reference or depend on other clauses/documents?
    temperature: 0.0
    max_tokens: 2000

  cross_document_analysis:
    version: '1.0'
    system: |
      You are analyzing clauses across multiple documents in a due diligence package to identify:
      1. CONTRADICTIONS: Clauses in different documents that conflict with each other
      2. INCONSISTENCIES: Different defined terms or standards across documents
      3. GAPS: Standard provisions present in some documents but missing from others
      4. DEPENDENCIES: Clauses that reference or are contingent on provisions in other documents
      5. CUMULATIVE RISK: Individual clauses that are acceptable alone but create risk when combined
      
      For each finding, cite the EXACT document names and clause text.
      Rate the significance: Critical / High / Medium / Low / Informational
      Explain the practical business impact of each finding.
    temperature: 0.1
    max_tokens: 3000

  change_of_control_analysis:
    version: '1.0'
    system: |
      You are analyzing change-of-control provisions across all contracts in a due diligence package.
      
      For each contract, determine:
      1. Is there a change-of-control clause? (Quote exact text)
      2. What triggers the clause? (Merger, acquisition, asset sale, board change, >50% ownership change, other)
      3. What are the consequences? (Termination right, consent required, acceleration, price adjustment, other)
      4. Is consent required for assignment? (Quote exact text)
      5. Is there an anti-assignment clause that could prevent the transaction?
      6. Are there any exceptions or carve-outs for affiliated entities?
      7. What is the notice period required?
      8. Would the contemplated transaction trigger this clause?
      
      Assess aggregate risk: How many key contracts would be affected by the transaction?
      Identify any contracts that could be deal-breakers if consent is not obtained.
    temperature: 0.0
    max_tokens: 3000

  ip_ownership_analysis:
    version: '1.0'
    system: |
      You are analyzing intellectual property ownership and assignment provisions.
      
      For each document, determine:
      1. Are there IP assignment clauses? (work-for-hire, assignment of inventions)
      2. Are assignments present-tense ('hereby assigns') or future ('agrees to assign')?
      3. Are there any retained rights or licenses back to the assignor?
      4. Do employment agreements contain invention assignment clauses?
      5. Are there any third-party IP licenses that may not be transferable?
      6. Are open-source software obligations disclosed?
      7. Is there a complete chain of title from creator to current owner?
      8. Are there any IP-related representations and warranties?
      
      Flag any gaps in the IP ownership chain as HIGH or CRITICAL risk.
      Flag any licenses that contain anti-assignment or change-of-control provisions.
    temperature: 0.0
    max_tokens: 3000

  risk_report_executive_summary:
    version: '1.0'
    system: |
      You are drafting the executive summary section of a legal due diligence risk report.
      
      RULES:
      1. Write in professional, objective legal tone
      2. Lead with the most significant findings (Critical and High severity)
      3. Quantify: number of documents reviewed, number of findings by severity
      4. Identify the top 3-5 risks that could affect deal terms or valuation
      5. Note any areas where further investigation is recommended
      6. Do NOT provide legal advice or opinions on whether to proceed
      7. Do NOT use marketing language or superlatives
      8. Include a statement that this is AI-generated and requires attorney review
      9. Keep to 3-4 concise paragraphs
      10. Use specific document references, not vague generalizations
    temperature: 0.3
    max_tokens: 2000

Testing & Validation

  • DOCUMENT INGESTION TEST: Upload a package of 20 mixed-format documents (10 native PDFs, 5 scanned PDFs, 3 Word documents, 2 image files) to iManage test matter folder. Trigger ingestion pipeline. Verify all 20 documents are processed, OCR produces readable text for scanned documents, and all chunks are indexed in Pinecone namespace 'matter-TEST-DD-001'. Expected: 100% document ingestion success rate, OCR accuracy >95% on scanned documents.
  • VECTOR SEARCH ACCURACY TEST: After ingesting test documents, perform 10 semantic searches for known clause types (e.g., 'change of control provision', 'indemnification cap'). Verify that the top-3 results for each query contain the correct document chunks. Expected: At least 8 out of 10 queries return the correct document in the top-3 results.
  • AGENT ANALYSIS COMPLETENESS TEST: Run the full DD agent on a test package containing documents with known planted issues: (1) a contract with a change-of-control termination trigger, (2) a missing governing law clause in one agreement, (3) contradictory indemnification caps across two contracts, (4) an IP assignment with only future-tense language. Verify the agent identifies all 4 issues. Expected: 100% detection of planted issues.
  • HUMAN-IN-THE-LOOP CHECKPOINT TEST: Trigger a DD review that generates at least one CRITICAL finding. Verify: (1) Teams notification is sent to the Deal Room channel within 2 minutes, (2) the review portal URL in the notification is accessible, (3) the portal displays the flagged finding with approve/modify/reject options, (4) approving the finding updates the state correctly, (5) the final report cannot be generated until all critical findings are reviewed. Expected: All 5 verification points pass.
  • RISK REPORT GENERATION TEST: After completing a test analysis, generate both PDF and JSON reports. Verify: (1) PDF is properly formatted with all sections (executive summary, statistics, findings, disclaimer), (2) JSON schema validates correctly, (3) findings are sorted by severity, (4) all finding IDs are unique, (5) confidence scores are between 0 and 1, (6) the AI disclaimer appears on the first page of the PDF. Expected: Both reports generate without errors and pass all verification points.
  • CLIO TIME ENTRY TEST: After completing a DD review, verify: (1) a time entry is created in Clio for the reviewing attorney with the correct matter ID, (2) the time entry description includes '[AI-Assisted]' prefix, (3) a matter note is posted with the DD summary statistics, (4) the note contains the executive summary text. Expected: All Clio entries are created and visible in the attorney's Clio dashboard.
  • IMANAGE WRITE-BACK TEST: After report generation, verify: (1) the PDF report is saved to the correct iManage matter folder, (2) the document is classified with the correct document class, (3) the document metadata includes the generation timestamp and agent version, (4) the document is accessible to attorneys with matter access. Expected: Report appears in iManage within 30 seconds of generation.
  • SECURITY AND ACCESS CONTROL TEST: Attempt to access the review portal without authentication (should return 401). Attempt to access a matter belonging to a different practice group (should return 403). Verify that API keys are not exposed in application logs. Verify that TLS 1.2+ is enforced on all endpoints. Check that Azure OpenAI audit logs capture all API calls. Expected: All security controls function correctly.
  • PERFORMANCE AND SCALABILITY TEST: Process a large DD package of 100 documents (approximately 2,000 pages total). Measure: (1) total ingestion time, (2) total agent analysis time, (3) report generation time. Expected: Ingestion completes within 30 minutes, analysis completes within 60 minutes, report generation completes within 5 minutes. Total end-to-end under 2 hours for 100 documents.
  • DATA ISOLATION TEST: Create two separate test matters (TEST-DD-001 and TEST-DD-002) with different document packages. Run DD analysis on both. Verify: (1) vector search for matter 001 does not return results from matter 002, (2) the review portal for matter 001 does not display findings from matter 002, (3) Clio entries are posted to the correct respective matters. Expected: Complete data isolation between matters.
  • DISASTER RECOVERY TEST: Simulate failure scenarios: (1) Azure OpenAI rate limit exceeded — verify graceful retry with exponential backoff, (2) Pinecone timeout — verify partial results are preserved and agent can resume, (3) iManage connection failure — verify documents are cached locally and retried, (4) Mid-analysis application restart — verify state is persisted and analysis can resume from last checkpoint. Expected: All failure scenarios handled gracefully without data loss.
  • END-TO-END USER ACCEPTANCE TEST: Have a participating attorney run a complete DD workflow on a real (but low-stakes) matter: upload documents to iManage, trigger DD review from the portal, receive Teams notification, review flagged findings, approve/modify findings, generate final report, verify Clio time entry. Collect attorney feedback on: report quality, finding accuracy, ease of use, time saved vs. manual review. Expected: Attorney confirms the system is usable and findings are directionally accurate, with specific feedback documented for iteration.

Client Handoff

The client handoff meeting should be scheduled as a 2-hour session with the managing partner, lead M&A attorney, IT administrator (if any), and all attorneys who will use the system. Cover the following topics:

1
SYSTEM OVERVIEW (20 min): Walk through the complete architecture — what each component does, where data flows, and where it is stored. Show the data flow diagram. Explain that Azure OpenAI does NOT train on their data per the DPA.
2
WORKFLOW DEMONSTRATION (30 min): Run a live DD review on a sample matter from start to finish. Show document upload to iManage, triggering the review, monitoring progress, receiving the Teams notification, reviewing findings in the portal, approving/rejecting findings, and generating the final report.
3
AI ETHICS AND COMPLIANCE OBLIGATIONS (20 min): Review ABA Formal Opinion 512 requirements. Emphasize: (a) attorneys must review all AI-generated findings before reliance, (b) the AI disclaimer must remain on all reports, (c) informed client consent must be obtained per the updated engagement letter template, (d) attorneys remain personally responsible for all work product. Distribute printed copies of the firm's AI Usage Policy.
4
SPELLBOOK AND COPILOT TRAINING (20 min): Demonstrate Spellbook's contract review features in Word. Show how to use Copilot for summarizing deal correspondence. Provide the Spellbook quick-start guide.
5
TROUBLESHOOTING AND SUPPORT (15 min): Review common issues: slow processing (check document count), failed OCR (check scan quality), authentication errors (token refresh). Provide the MSP support contact, escalation path, and SLA terms. Show how to check system status at the monitoring dashboard.
6
SUCCESS CRITERIA REVIEW (15 min): Review together: (a) system processes a DD package of 50+ documents within 2 hours, (b) AI identifies at least 80% of risk issues compared to a manual review benchmark, (c) attorneys report time savings of at least 30% on DD matters within 90 days, (d) all compliance checkpoints are functioning. Schedule 30-day and 90-day review meetings.

Documentation to Leave Behind

  • System Architecture Diagram (PDF)
  • Attorney Quick-Start Guide (laminated desk reference)
  • AI Usage Policy (firm-customized, signed by all attorneys)
  • Updated Engagement Letter Template with AI Disclosure
  • Vendor DPA Summary Sheet (what each vendor can/cannot do with data)
  • Troubleshooting Runbook with screenshots
  • MSP Support Contact Card with escalation tiers
  • Spellbook Quick Reference Card
  • Copilot Tips Sheet for Legal Professionals
  • 90-Day Adoption Roadmap with milestones

Maintenance

ONGOING MSP MAINTENANCE RESPONSIBILITIES:

1. Weekly (30 min/week)

2. Monthly (2 hrs/month)

3. Quarterly (4 hrs/quarter)

4. Annually

SLA Considerations

  • Response time for system outages: 1 hour during business hours, 4 hours after hours
  • Response time for non-critical issues: 4 business hours
  • Scheduled maintenance window: Sundays 2:00–6:00 AM local time
  • Uptime target: 99.5% during business hours (Mon–Sat 7 AM–9 PM)
  • Maximum acceptable DD processing time: 3 hours for packages under 200 documents

Model Retraining / Prompt Update Triggers

  • Attorney feedback indicates >20% false positive rate on risk findings
  • New regulation or ABA opinion affects DD analysis requirements
  • Firm begins handling a new transaction type (e.g., healthcare M&A requiring HIPAA analysis)
  • Azure OpenAI releases a new model version with significant capability improvements
  • Spellbook or Clio releases major feature updates affecting integration points

Escalation Path

  • Tier 1 (MSP Help Desk): Password resets, Spellbook/Copilot basic issues, scanner problems
  • Tier 2 (MSP Cloud Engineer): API integration failures, Azure service issues, Pinecone connectivity
  • Tier 3 (MSP AI Specialist): Agent behavior issues, prompt engineering, report quality concerns
  • Tier 4 (Vendor Support): Platform-specific bugs — contact Harvey/Spellbook/Clio/Microsoft directly
  • Emergency: Managing partner contacts MSP account manager directly for deal-critical system failures

Alternatives

Harvey AI Agent Builder (Enterprise Turnkey)

Instead of building a custom DD agent on Azure OpenAI + LangChain, deploy Harvey AI's Agent Builder platform. Harvey provides a fully managed legal AI environment where attorneys can create custom DD workflows without code. Harvey's platform already has legal-specific training, built-in compliance features, and handles all LLM infrastructure. The MSP's role shifts from building the AI to managing the Harvey deployment, integration, and training.

Kira by Litera (ML-Based Extraction)

Replace the custom GPT-5.4 agent with Kira's established ML-based contract review platform. Kira uses purpose-trained machine learning models (not general LLMs) for clause extraction and has 18+ years of training data from top global law firms. The new Kira experience includes generative AI capabilities at no additional cost. Integrates natively with Litera's document management and transaction management tools.

Microsoft Copilot Studio Custom Agents (Low-Code)

Instead of the Python-based custom agent, build the DD orchestration workflow using Microsoft Copilot Studio's visual agent builder. This creates agents that run within Microsoft Teams and can be triggered by attorneys directly from their collaboration environment. Uses Azure OpenAI under the hood but with a low-code configuration interface. Agents can call external APIs (iManage, Clio, Pinecone) via custom connectors.

Note

Recommend this for firms wanting a lighter DD capability (e.g., contract review for non-M&A transactions) or as a Phase 1 proof of concept before investing in the full custom agent.

Self-Hosted Open Source LLM (On-Premises)

For firms with extreme confidentiality requirements (e.g., national security matters, pre-announcement M&A for public companies), deploy an open-source LLM (DeepSeek-R1 or Qwen3-235B) on on-premises GPU servers. No data ever leaves the firm's network. The MSP procures, installs, and maintains the GPU infrastructure and manages model updates.

Warning

Recommend this ONLY for firms with documented regulatory or client requirements prohibiting cloud AI, and only after confirming the firm's facilities can support the power and cooling requirements.

For firms focused specifically on M&A due diligence (rather than general contract review), deploy Emma Legal as the primary AI platform integrated directly with Intralinks or Datasite virtual data rooms. Emma is purpose-built for M&A DD and provides pre-configured workflows for analyzing deal documents, flagging clause-level risks, and generating structured DD reports that can be shared with counterparties.

Note

Recommend this for boutique M&A firms that do high-volume deal work and want the fastest, most focused solution with lower cost of entry.

Want early access to the full toolkit?