Scaling AI Training Data Operations to Meet Global Demand

Appen has spent three decades building training data behind the world’s largest AI systems. Today, 80% of leading LLM builders are Appen customers. The company processes text, image, audio, and video across 180+ languages.

`01` The Challenge

A monolithic platform serving a market growing at 22% CAGR

The AI training data market was valued at $3.6B in 2025 and is projected to exceed $16B by 2033. Appen sat at the center of that growth. Enterprise clients needed training datasets faster, in more modalities, and at higher volumes.

However, the client’s platform ran as a single monolithic application. Every deployment touched the entire codebase. Every scale event affected the whole system. Engineers faced ambiguity in build processes, and coordinated releases across teams slowed the delivery cycle.

New services could take weeks to stand up. A single code merge could ripple across unrelated features. The monolith was limiting how fast Appen could grow.

Three goals drove the decision to modernize:

Accelerate delivery speed of data tasks for clients
Standardize build and deployment for engineering teams
Enable independent scaling of platform components

In April 2020, Appen acquired Figure Eight, a human-in-the-loop platform for data transformation. The acquisition added further pressure on the architecture to absorb new workloads. Appen needed a partner with production experience in large-scale AWS modernization.

`02` The Approach

Decompose the monolith without stopping the business

Provectus partnered with Appen’s engineering team to plan, standardize, and execute the migration from monolith to microservices on AWS. The challenge was clear: the platform served global clients 24/7. Downtime was not an option during transition.

The approach followed a strangler pattern. Provectus extracted services one at a time, validated each in production, then retired the corresponding monolith component. Nothing was decommissioned until the replacement proved stable.

Discovery sessions mapped Appen’s domain boundaries, deployment dependencies, and scaling profiles. From that work came a target architecture. Containerized services on managed orchestration. Automated infrastructure provisioning. A standardized path for standing up new services.

For on-premise clients, Provectus deployed services on Kubernetes alongside the AWS-hosted stack, ensuring consistent delivery across environments.

`03` The Build

50+ containerized services, each deployable and independently owned

The migration touched every layer of the platform.

Compute and orchestration. Legacy applications were redesigned, containerized, and deployed on managed container orchestration on AWS. Each service owns its own release cycle, monitoring, and scaling rules.

Messaging and caching. Legacy messaging and caching layers were replaced with managed AWS equivalents. Operational overhead dropped. Reliability went up.

Content delivery. A global CDN backed by cloud storage replaced the prior content-serving layer. Users worldwide get fast access regardless of region.

Infrastructure as code. Automated provisioning made adding a new microservice a repeatable, standardized process. No custom effort per service.

Serverless automation. Functions handle notifications, UX automation, and user-profile processing. A managed batch-processing service analyzes application logs at scale. Appen uses the output to calculate client refunds based on error-rate analysis.

The result: 50+ independent services where one monolith used to stand. Each team owns its service from development through deployment.

`04` The Results

From weeks to deploy a new service, to one day

The operating model changed. Engineering teams release independently. The coordinated, cross-team release cycle that once gated every update is gone.

50+ microservices

Deployed from a single monolith

New services to production in one day

Platform uptime reached 99.99%. Appen’s global clients got the reliability they expected from a partner processing millions of annotations per day. AWS infrastructure costs dropped by more than 10%. Resources could now be allocated precisely where demand existed.

Technical debt and source-code dependencies shrank. Each team owns its service from commit to production. What once required coordinated releases now ships on each team’s own schedule.

`05` What’s Next

A platform architecture that compounds with the market

Appen now operates on an architecture designed to absorb growth. New modalities and client workloads slot into a standardized service framework. So do acquisitions. The engineering team spends its time building product capabilities, not managing deployment complexity.

Scaling AI Training Data Operations to Meet Global Demand

01 The Challenge