NinthDecimal: ML-powered Data & Analytics Platform
NinthDecimal increases visibility into location data and extracts actionable insights from consumers’ behavior in real world
$1M increase in ROI per year
$100K increase in monthly savings
10x increase in cluster throughput
99% completion rate of data processing events
50% increase in team’s productivity
NinthDecimal’s legacy data platform could not accommodate the growing amount of real-time location data collected from multiple sources. With more than 5 billion of events to be processed daily, the platform was built using 50 AWS nodes and 400 bare metal nodes managed by Apache Mesos — the architecture that caused delays, bottlenecks, and inefficiencies.
NinthDecimal needed to update its legacy data platform, since:
- It was estimated that it could take up to twelve months to hand off a data pipeline from data scientists to data engineers, and then to operations, to deploy it in production. Given the specifics of NinthDecimal’s business model, that was unacceptable.
- The existing platform suffered from developmental delays, which caused implementation bugs and produced highly inaccurate timeline projections. That cut into NinthDecimal’s ability to attract marquee brands, which slowed down revenue growth.
- The platform’s performance lagged behind, demonstrating a job success rate of 40%. As 60% of Apache Spark jobs were randomly aborted in the system, data scientists had to cluster resources by re-running multiple Spark jobs, which was inefficient.
NinthDecimal reached out to the Provectus team to design and build a modern and robust ML- and data-driven platform that could cope with processing over 5bn events per day and supporting 10 specific data products.
Provectus designed and built an ML-powered data & analytics platform with the ability to scale dozens of thousands of analytics operations. The platform allows to rapidly and efficiently build, test, deploy, and monitor predictive algorithms and models.
The platform’s data pipeline was implemented using Apache Spark, which was managed by AWS EMR. Amazon S3 was used to store RTB Logs from partners.
AWS Lambda was implemented to process S3 events and to notify the Kinesis producers into SQS for initial Kinesis processing. Data landed in Kinesis streams, triggering Spark Streaming jobs to perform required data transformations and aggregation (by location, time, etc.), and also to clean the data.
Once the data was aggregated and cleaned, it was uploaded to the output S3 bucket and loaded into the Snowflake DWH for BI Analytics.
Microservices architecture was used to build real-time production clusters and interactive production clusters of the platform. Microservices were also used to optimize development, testing, and deployment operations. Apache Kafka was used as a major messaging bus, as well as to store durable distributes events.
The solution was deployed across four main clusters:
- Data cluster powered by Amazon EMR and AWS Lambda and used to ingest and filter initial data
- Kubernetes cluster with stateful microservices powered by Apache Kafka to process and manage on-premise data and operations
- Batch data processing cluster with Apache Spark and Apache Mesos used for large scale ad campaign analytics jobs
- Snowflake cluster used for custom research and analytics as well as for storing final data shapes for partners
The delivered solution was aimed at accelerating the performance of NinthDecimal’s data team, increasing analysis quality of the location data, and processing multiple RTB logs at scale.
NinthDecimal received a robust data & analytics platform powered by machine learning. The platform is capable of processing and managing over 5bn events per day while supporting 10 specific data products.
The ML-powered data & analytics platform allowed NinthDecimal to bring their data analysis to a whole new level. Specifically, the completion rate for Apache Spark jobs that did not exceed 40% reached 99%. That resulted in a 50% boost in productivity of the company’s data team, which allowed NinthDecimal to enjoy a $1M yearly increase in ROI.
The platform also demonstrated a 10x increase in cluster throughput, which allowed to optimize resources and save up to $100K per month.
The platform opened unique competitive advantages for NinthDecimal, allowing the company not only to cost-efficiently collect and process billions of events on a daily basis, but also to deliver actionable insights to their clients in real time. That improved NinthDecimal’s appeal to marquee brands and strengthened the company’s leading position among omnichannel marketing platforms for Fortune 500 brands.
Looking to explore the solution? Contact Us!