Reinventing Corporate Documentation Management with Generative AI
Forge empowers managers with Generative AI & IDP, to process and generate certificates of incorporation, faster and at scale
Forge Global, Inc. is a leading fintech company that has been democratizing access to private market investments since 2014. The company's pioneer platform provides liquidity solutions for private markets, enabling participants to trade shares in the most prominent pre-IPO companies globally. Leveraging advanced technology, a robust network of relationships, and extensive market expertise, Forge provides clients with efficient, transparent, and fair access to private market transactions. Committed to innovation, Forge adapts its solutions and services to meet clients' needs, shaping the future of private market investing.
Challenge
Forge wanted its managers to be able to handle corporate documentation more quickly and efficiently, and at scale. Starting with incorporation documents, the leaders of Forge planned to use AI to automatically extract data, to streamline document-centric operations across the organization. They also wanted to use Generative AI to quickly generate comprehensive incorporation reports. They expected the implementation of intelligent document processing (IDP) and Generative AI technologies to optimize time spent on manual review and preparation of corporate documentation, improve throughput rates, and free up managers’ time to take on higher-priority tasks.
Solution
Provectus approached the project through a series of engagements, ranging from discovery sessions (project overview, data assessment, metrics and KPIs), to solution development, implementation, and testing in real-world scenarios. A combination of AWS services, Deep Learning and Natural Language Processing (NLP) algorithms, and Pytorch/Tensorflow and NLTK, along with Generative AI technology (GPT-3) were used to extract data from various incorporation documents. Provectus’ Intelligent Document Processing (IDP) solution was employed as a foundation and a platform to handle the results of document processing and report generation.
Outcome
Forge received a sophisticated platform for document processing and report generation, underpinned by the latest advancements in Deep Learning, NLP, and Generative AI. By augmenting the Provectus IDP solution with GPT-3, Provectus was able to deliver the expected platform in four months. The platform enabled Forge to rapidly extract data from a variety of corporate documents in PDF format and build spreadsheets featuring all the extracted data points for simplified document review by managers. Forge’s adoption of the Generative AI-enabled platform helps to automate routine tasks, optimize costs, and empower managers to utilize their time more effectively.
A Generative AI-powered platform delivered in four months
Intelligent Document Processing (IDP) for critical document-centric operations
At-scale automation of routine corporate documentation management tasks
Augmenting Document Management Operations with Automated Data Extraction and Generative AI
Forge Global, Inc. is a leading financial technology company, and a trusted trading and settlement partner for private companies and investors worldwide. Driven by a passion for innovation, Forge continually enhances its offerings to meet the evolving needs of its clients and the wider private market ecosystem. The company plays a key role in shaping the future of private market investing.
As a fintech company, Forge relies on document- and data-centric operations. The ability of its employees to quickly and accurately collect, store, process, manage, and leverage massive volumes of documents — and data — that the company’s partners share with Forge is crucial.
The task of processing clients’ corporate documentation cannot be outsourced due to security concerns. These documents often contain sensitive information, and any data extraction process must comply with privacy laws and regulations. There are also numerous industry-specific regulations that must be adhered to. To meet security and compliance requirements, Forge dedicates an entire team of managers to extract information and insights from corporate documents, which are then used to generate comprehensive reports.
Document processing can be exceedingly tedious and challenging for the following reasons:
- Corporate documents come in large volumes and are often complex, containing specialized language, legal terminology, and financial jargon.
- Lack of a standardized document structure adds to the complexity. Even within the same type of document, the format and structure can vary, depending on variables like company, jurisdiction, time period, etc.
- Corporate documents are often unstructured and do not fit into pre-defined data models or database tables. Extracting data from such documents requires advanced techniques like NLP and NLU.
- To be valuable, extracted data must be accurate. Any errors or inconsistencies in the data can lead to misguided decisions and potential regulatory issues.
Forge’s leaders recognized that their existing manual and semi-automated document processing pipelines, handled by a team of managers, were unsustainable. They lacked speed, cost-efficiency, and scalability. The need for a more advanced and automation-focused solution was evident.
Forge’s leadership chose to employ AI/ML to develop an intelligent document processing (IDP) solution that could expedite the processing of incorporation documents, automatically extract data from them, and generate incorporation reports.
They expected the solution to be able to present the extracted data in user-friendly spreadsheets, enabling managers to quickly review them for accuracy. This approach would reduce time spent on document review and substantially increase the throughput rate.
The task presented a formidable challenge to Forge’s engineering team, necessitating the integration of Cloud technology, Deep Learning and NLP algorithms, and the PyTorch/TensorFlow and NLTK frameworks, to guarantee superior results. The potential application of Generative AI for report generation was considered. Forge sought to partner with an AI solutions provider and technology consultancy to design and build the desired platform, while maintaining the security and compliance of its document- and data-centric operations.
Provectus, an AWS Premier Consulting Partner with competencies in Machine Learning and Data & Analytics, was chosen to deliver the project. Our established track record of building intelligent document processing (IDP) solutions in the Cloud bolstered our position, enabling a swift transition from ideation to development and deployment with Forge.
Combining Advanced Document Processing Techniques with Generative AI Technology
Following the initial discovery session, the project was divided into several key phases. These included the development of a data lake using Amazon S3, the development and implementation of machine learning pipelines using Amazon SageMaker and AWS StepFunctions, the application of various Deep Learning and NLP/NLU algorithms and frameworks for enhanced data extraction, and the utilization of Large Language Models (LLMs) for text understanding and data extraction.
It was decided that Forge would supply Provectus with an extensive dataset comprised of PDF files and Excel spreadsheets, to be utilized for model training. The parties also agreed that the document processing pipeline’s scalability would depend on the volume of loaded documents. The pipeline would be designed to extract meaningful information from each type of document using both ML models and regular expressions.
A brief overview of the platform’s development:
- Infrastructure setup using AWS services, and the deployment of Provectus’ IDP solution (used as a foundation for document processing).
- Initial review and testing of the provided dataset to discover how the extracted data would be presented in a generated datasheet report.
- Dataset work, which included: generation of a text-format dataset from markup documents; evaluation of the desired metrics for text extraction; markup accuracy review of the provided documents by Provectus’ Subject Matter Expert; preparation and review of training and validation datasets.
- Setup of passage classification and field extraction pipelines, and their integration as a single pipeline, to test the impact of increased context on the quality of the extracted information. Pipeline improvements.
- Development of a data extraction pipeline, which included: rigorous processing and testing of the provided dataset; handling of the required inputs and outputs from the documents; work on extracted passages and data values; utilizing GPT-3 model to extract the data from unstructured text; and their combination in a datasheet format.
- Quality control of the deployed intelligent documents processing (IDP) solution, to check for model accuracy, minor performance bugs, etc.
In the initial stage of the project, Provectus committed to providing the source code for both the document classification and processing components and custom models, along with the necessary documentation. This included the source code for data preparation, training, and evaluation pipelines associated with the custom models. Additionally, Provectus provided the infrastructure as code, artifacts of the trained models, and data storage components to facilitate future model re-training.
In the later stages of the project, Provectus refined the detection mechanism for the correct page, passage, or text block for subsequent data extraction. The team also upgraded the data filtering approach for the extracted values, based on the model knowledge. Additionally, a user interface was developed to enable a team of managers to more easily handle the documents, spreadsheets, and extracted data. The deliverables at this stage included source code for the document classification and processing components, custom models, their data preparation, training, and evaluation pipelines, along with corresponding documentation. Infrastructure as code, artifacts of trained models, datasets, and the source code of components used to gather them were also provided.
The Provectus Intelligent Document Processing (IDP) solution was used as a foundation for the document processing and report generation platform.
During the solution’s customization, Provectus took advantage of industry best practices to:
- Enhance platform resilience to guarantee seamless and continuous pipeline operations
- Refine the infrastructure for increased stability and reliability
- Strengthen security measures in line with AWS best practices
- Document both current and future infrastructure and practices used, to facilitate any future development work
- Improve platform observability and traceability for better understanding and management
Note: The initial goal of Forge was to develop an automated solution with a human review component, aiming for a review ratio of 95% automation and 5% human review upon completion of ML model training. The accuracy target for the ML model was set at 70% or higher. Further model improvements would be influenced by factors and hypotheses validated during the Exploratory Data Analysis (EDA) and ML model development phase.
Strengthening Document- and Data-Centric Operations with IDP and Generative AI
By partnering with Provectus, Forge revolutionized its document-centric operations, unlocking new levels of speed, efficiency, and scalability.
By combining Provectus Intelligent Document Processing (IDP) solution and GPT-3 model, Forge can now rapidly extract data from various corporate documents (e.g. Certificates of Incorporation). These documents can then be automatically converted into user-friendly spreadsheets, dramatically simplifying the review process for the managerial team.
The Generative AI capabilities of the delivered document processing and report generation platform empower Forge to automate the routine tasks and enable them to optimize costs while freeing up managers’ time to focus on higher-priority tasks, thus driving productivity and strategic decision-making.
The successful implementation of the platform within a four-month timeframe is a testament to Provectus’ expertise in Deep Learning, Natural Language Processing (NLP), Generative AI, and cloud engineering.
By incorporating AI, Forge has successfully modernized its document processing workflows, marking a pivotal step in its journey towards digital and AI transformation. The significant reduction in manual review and preparation of corporate documentation has bolstered throughput rates, demonstrating the immense potential of Generative AI to enhance operational efficiency.
The Forge project provides a prime example of how innovative AI/ML solutions and Generative AI technologies can reinvent business processes, setting a new benchmark in corporate documentation management. Provectus looks forward to the next steps of cooperation with Forge, to help them meet the evolving needs of their clients and the demands of the private market ecosystem.
Moving Forward
- Learn more about the Provectus Intelligent Document Processing (IDP) solution
- Check out Provectus’s latest research: A Comparison of Large Language Models (LLMs) in Biomedical Domain
- Watch the webinar on Choosing the right document processing solution
- Apply for Intelligent Document Processing Solution Discovery Program to get started
contact us!
Looking to explore the solution?