Reinventing Corporate Document Management with Generative AI

Forge builds a GenAI document processing solution that reads Certificates of Incorporation, extracts the data, and hands managers a reviewable spreadsheet, so corporate docs stop blocking document-heavy workflows.


Client profile

A fintech company providing liquidity in private markets

Industry

Financial Services

Region

North America, EMEA

4 mos.

From idea to production document processing solution

5%

Of documents now reviewed by humans on first run


Forge Global has been providing liquidity in private markets since 2014 – a platform for trading shares in pre-IPO companies globally. Private-market transactions generate a lot of unstructured corporate paperwork. Forge’s managers process every page of it.

01 The Challenge

Corporate documents do not fit inside pre-defined data models

Certificates of Incorporation and related corporate documents come in large volumes, variable formats, and jurisdictional jargons. Legal and financial terminology is specialized. Structure shifts by company, jurisdiction, and time period. None of it fits a clean schema.

The work could not be offshored, but security and compliance required Forge’s own managers to handle the data in-house. That meant a dedicated team spending its hours on manual data processing rather than on the higher-value judgement and decision-making parts of the workflow.

02 The Approach

A measured target: 95% automation, 5% human review, at 70%+ model accuracy

Provectus started the engagement with a discovery session, then broke the work into four focused phases: a data lake, ML pipelines, deep-learning data extraction, and LLMs for text understanding.

The goal was clear: 95% automation, 5% human review at production quality. The accuracy target for the ML model was 70% or higher before operationalization. Further improvements would be driven by evidence gathered during model development, not by scope creep mid-build.

03 The Build

Data lake plus accurate LLM plus reviewer UI

Infrastructure runs on AWS. The pipeline ingests PDF and Excel files, runs passage classification and field extraction as an integrated pipeline, and uses a selected LLM to extract specific values from unstructured text.

The outputs land as user-friendly spreadsheets that managers review. Every manager correction feeds back into filtering and model calibration. Infrastructure improvements, observability, and security were treated as in-scope from the start.

The initial ML model was trained on a Forge-supplied dataset of PDFs and spreadsheets, with a Provectus subject-matter expert validating markup accuracy before production runs.

04 The Results

Corporate documentation no longer the bottleneck

Forge’s managers now upload corporate documents, review an automatically generated spreadsheet, and approve. The volume the team handles scales with infrastructure, not with headcount. Manager time that used to go into manual document work goes into higher-value review and decision work.

4 months

From idea to an in-production document processing solution

The engagement also built the robust foundation – AWS infrastructure, observability, CI/CD, security components – that Forge can apply if the next document type is added to the workflow.

05 What’s Next

Forge owns the platform; extensions are incremental

The solution’s pipeline is architected for reuse: add a new document type, retrain against it, adjust the reviewer UI. Provectus continues the engagement on the extension path as Forge’s private-market operations grow.

Ready to discuss your AI infrastructure?
Schedule a technical conversation with our team.