Document Classification Automation for an Investment Management Software Provider
Dynamo automates the classification of various investment portfolio documents with AI, and taps into insights for advanced analytics
Dynamo Software Inc. (formerly, Netage Solutions) is one of the world's leading cloud providers of alternative investment management software. Dynamo specializes in premium, industry-specific, configurable asset management and reporting software for the alternative assets industry. Its products and services cover such markets as private equity and venture capital funds, real estate investment firms, hedge funds, funds of funds, prime brokers, foundations, endowments, pension funds, and family offices.
The Dynamo platform is an intuitive and highly configurable, end-to-end cloud solution that improves the productivity of fundraising, deal, research, investor relationship, and portfolio management teams worldwide. The platform integrates all of Dynamo’s modules so that teams can focus on providing a human touch and insights that make their firms succeed, rather than spending precious time on repetitive and manual processes.
Dynamo was looking to enhance its document classification platform through AI and automation. The platform was designed to store various documents, and to classify and transfer the information and metadata it scrapes from PDFs and emails to appropriate investments. Dynamo wanted to improve the accuracy of document classification, and to gain the ability to make predictions based on a document's content. By making those improvements, the leaders of Dynamo hoped to reduce the amount of repetitive manual work performed by their data team, to lower operational costs, increase performance, and minimize the time needed for making decisions on client investment portfolios.
Provectus has a proven record of expertise in developing document processing solutions powered by AI/ML in the cloud. We utilize best practices for data security and privacy, to ensure data confidentiality. Provectus applied this combination of skills to build a new document classification solution for Dynamo. We set up an infrastructure for their dev & management environments, as well as an experimentation infrastructure for document classification. We conducted EDA on Dynamo’s datasets to develop a testing dataset and data extraction datasets, to build a baseline classification model. A robust pipeline for document classification was implemented.
Provectus was able to develop a new document classification solution for Dynamo in six weeks. As agreed by Provectus and Dynamo, the initial implementation would handle PDF documents using four labels: capital calls, distributions, capital account statements, and tax documents. When run on a test dataset, the solution returned an f1-score of 95%, which exceeded the requested threshold of 85%. The automation accuracy enabled by the solution demonstrated to Dynamo that it was possible to continually enhance their existing document classification platform. That success incentivized the leaders of Dynamo to move forward into the next stage of cooperation with Provectus.
95% accuracy on a customer test dataset of PDF documents
New document classification solution delivered in six weeks
Improvements to Dynamo’s document classification pipeline
Automating Document Classification with AI for More Accurate Predictions, Faster Decision-Making
Dynamo Software Inc. has been revolutionizing the alternative investments industry and optimizing processes for over 1,000 clients, including fund managers, institutional investors, and service providers, since 1998. With a diverse portfolio of configurable, cloud-based, automated and versatile software, Dynamo helps its clients to identify and solve key challenges in the alternative investments ecosystem, thus improving the productivity of fundraising, deal, research, investor relationships, and portfolio management teams worldwide.
Data collection and processing is one of many business-critical pipelines that ensure the success of Dynamo’s clients. The faster the documents are collected, classified, and transferred to appropriate investments, the higher the return on investment that Dynamo’s clients can expect over time.
The Dynamo teams themselves can benefit from a rapid, more accurate and efficient processing of documents. Benefits include:
- Minimized amount of repetitive manual work
- Reduced operational expenses
- Increased performance and productivity
- Rapid decision-making on client investment portfolios
With this understanding, the leaders of Dynamo were looking to enhance and modernize their document classification platform through AI/ML and automation.
The existing platform received thousands of various types of documents every month. Some of them were sent directly to the platform, while others were emailed, to be manually added by managers. Once added, the documents would be properly stored and classified, either manually or by a machine learning tool.
The leaders of Dynamo wanted to significantly improve the accuracy of their existing ML tool, and to automate a manual portion of the data processing pipeline. Their expectation was that a new AI-powered document classification solution would be at least 85% accurate on new data. On top of that, they wanted to gain the ability to make predictions, based on a document’s contents.
The document collection and processing pipeline is only a part of the Dynamo platform that enables them to track contacts and transactions, view account balances, and make decisions on client portfolios with advanced analytics. In light of the scope of their activities, Dynamo was also considering other opportunities for adopting AI & ML.
Provectus, an AWS Premier Consulting Partner with competencies in Machine Learning and Data & Analytics, was selected to join the Dynamo project through the AWS Fast Start Program. Our proven track record of developing intelligent document processing (IDP) solutions in the cloud positioned us favorably, and helped us to move forward into the next stage of cooperation with Dynamo.
Developing a Document Classification Model and a Training Pipeline for Dynamo
Provectus approached the opportunity, bearing in mind that Dynamo was ready to deploy the enhanced document classification platform only after the model had exceeded the required accuracy threshold of 85% on new data. Thus, the team’s primary focus was on developing a highly accurate and precise model, and a robust training pipeline, to justify further cooperation.
Overall, our work consisted of a series of steps that encompassed:
- Setting up an infrastructure for the development and management environments
- Building and implementing an experimentation infrastructure for document classification
- Conducting an exploratory data analysis (EDA) of various datasets provided by Dynamo
- Developing a testing dataset, and datasets for data extraction
- Training a baseline model based on unstructured data
- Developing a training pipeline for the document classification model
Training and tuning the document classification model for the extraction of data from PDF documents (four labels)
The document classification model designed and built by Provectus exceeded the expectations of Dynamo, returning an f1-score of 95% on a test dataset.
The training pipeline was built using AWS best practices for data security and privacy, to ensure data confidentiality. The suite of Amazon SageMaker services was used to develop the model building pipeline. In the meantime, the inference pipeline, built on AWS Step Functions, was built and improved by the Provectus IDP team.
Note: The solution developed by Provectus for Dynamo is an essential and customizable part of the Intelligent Document Processing (IDP) platform. This enabled us to easily integrate the solution into our platform, and to provide Dynamo with state-of-the-art accuracy and quality, and the best cost per document on the market from day one.
Realizing the Benefits of Automated AI-powered Document Classification and Next Steps
The first milestone of the Provectus team was to develop a document classification model that could confidently and correctly classify at least 85% of the documents, to incentivize the leaders of Dynamo to proceed to the next steps. We were able to build and train the model, and to develop its training pipeline, in less than six weeks.
When run on a test dataset, the model returned an f1-score of 95%, which exceeded the requested threshold of 85%. The initial implementation handled PDF documents using four labels: capital calls, distributions, capital account statements, and tax documents.
The model’s success demonstrated the potential of AI and automation to Dynamo. Not only could they automatically classify over 90% of various documents with AI; they would also be able to make predictions based on extracted document data.
The leaders of Dynamo saw the opportunity to continually enhance their document classification platform with Provectus, to reduce the amount of repetitive manual work performed by their data team, lower operational costs, increase performance and productivity, and minimize the time needed for decision-making on client investment portfolios.
Dynamo is now ready to move forward into the next stage of cooperation with Provectus and AWS.
- Learn more about the Provectus Intelligent Document Processing (IDP) solution
- Watch the webinar on Choosing the right document processing solution
- Apply for Intelligent Document Processing Solution Discovery Program to get started
Looking to explore the solution?