Appen automates fraud detection across its crowdsourcing platform, monitoring 20x more annotation jobs per day and reducing scammer activity by 25%.
Client profile
A global AI training data company serving enterprise ML teams
Industry
Other, AI & Data Services
Region
Global
More annotation jobs monitored per day
Reduction in scammer activity on the platform
Appen uses a global crowd of over one million contributors across 180+ languages to label, annotate, and categorize text, image, audio, and video data into training datasets for enterprise AI teams. In 2020, Appen integrated Figure Eight, a human-in-the-loop platform for data transformation, expanding its annotation capacity and the surface area that needed monitoring.
01 The ChallengeCrowdsourcing platforms carry a structural risk: bad actors exploit the open model. They sell accounts, manage multiple identities, misrepresent qualifications, and submit low-effort annotations designed to pass basic checks. When those contributions reach a training dataset uncaught, they poison the model downstream. For enterprise clients spending six and seven figures on training data, a single contaminated batch can invalidate weeks of work.
Appen’s fraud detection system ran on manually triggered scripts. Capacity: roughly 50 jobs monitored per day. The team could catch known patterns, but the approach required constant hands-on effort and could not keep pace with platform growth. Churned judgments (annotations discarded after quality failures) were climbing. Each churned judgment represented wasted contributor time, wasted compute, and eroded client trust.
The hire-versus-build decision was straightforward. Scaling the manual approach meant 20+ new data analysts. Building an ML-powered system meant automating detection at the volume the platform actually required. Appen partnered with Provectus to build it.
02 The ApproachBefore writing code, the Provectus team reviewed published research on crowdsourcing fraud: contributor behavior modeling, spammer classification taxonomies, and adversarial annotation attacks. The review ensured the detection models would reflect the state of the field, not just Appen’s historical heuristics.
From research, the team scoped four workstreams:
The design principle: ML handles volume, humans handle judgment. Analysts review edge cases, contested flags, and policy decisions. The system does not replace the trust and safety team; it gives them 20x the coverage with the same headcount.
03 The BuildThe platform runs as an automated pipeline. Contributor activity flows in. ML models score each contribution against learned fraud patterns. Flagged items route to the analyst interface.
Detection models. Multiple ML models form the scoring core. They identify behavioral signals that manual scripts missed: unusual submission velocity, copy-paste patterns, time-on-task anomalies, and cross-account coordination. Each model outputs a confidence score that determines routing.
Analyst interface. A purpose-built web application gives fraud analysts a queue sorted by severity and confidence. Analysts confirm, dismiss, or escalate. Every decision feeds back into model calibration, so the system improves with use.
Automation layer. Ingestion, scoring, alerting, and routing run without manual triggers. 97% of jobs process with no human intervention. Analysts see only what the models cannot resolve alone.
Monitoring. Model performance metrics are visible to engineers and business stakeholders. Drift detection flags when fraud patterns shift and models need recalibration.
04 The ResultsThe platform replaced a script-driven process with continuous, ML-powered monitoring at the scale the business requires.
50 → 1,000+ jobs/day
Monitored for fraud
97% automated
Scammer activity dropped 25%. Churned judgments fell 5x. Bad actors are caught before their contributions reach client-facing datasets, not after.
Appen avoided hiring 20+ data analysts. The existing team shifted from manual monitoring to oversight and policy work. For enterprise clients, the result is verified contributor integrity behind every training dataset delivered.
05 What’s NextProvectus and Appen are building contributor reputation models that learn from every annotation cycle and extend fraud detection into multimodal workflows as Appen’s platform grows.