InMarket: ML-powered Data & Analytics Platform
InMarket increases visibility into location data and extracts actionable insights from consumers’ behavior in real world
InMarket is the omnichannel marketing platform helping Fortune 500 brands identify new prospects and customers, drive store visits, and increase sales using AI- and data-driven consumer intelligence.
InMarket’s legacy data platform was inefficient, with a job success rate of 40%, and suffered from data deployment and development delays
Provectus built an ML-powered data & analytics platform capable of processing 5bn events daily and supporting 10 data products using AWS products
InMarket’s advanced ML platform demonstrated a 99% job success rate and a 50% increase in productivity, which increased saving and bolstered ROI
Increase in ROI per year
Increase in monthly savings
Increase in cluster throughput
Completion rate of data processing events
Increase in team’s productivity
InMarket’s legacy data platform could not accommodate the growing amount of real-time location data collected from multiple sources. With more than 5 billion of events to be processed daily, the platform was built using 50 AWS nodes and 400 bare metal nodes managed by Apache Mesos — the architecture that caused delays, bottlenecks, and inefficiencies.
InMarket needed to update its legacy data platform, since:
- It was estimated that it could take up to twelve months to hand off a data pipeline from data scientists to data engineers, and then to operations, to deploy it in production. Given the specifics of InMarket’s business model, that was unacceptable.
- The existing platform suffered from developmental delays, which caused implementation bugs and produced highly inaccurate timeline projections. That cut into InMarket’s ability to attract marquee brands, which slowed down revenue growth.
- The platform’s performance lagged behind, demonstrating a job success rate of 40%. As 60% of Apache Spark jobs were randomly aborted in the system, data scientists had to cluster resources by re-running multiple Spark jobs, which was inefficient.
InMarket reached out to the Provectus team to design and build a modern and robust ML- and data-driven platform that could cope with processing over 5bn events per day and supporting 10 specific data products.
Provectus designed and built an ML-powered data & analytics platform with the ability to scale dozens of thousands of analytics operations. The platform allows to rapidly and efficiently build, test, deploy, and monitor predictive algorithms and models.
The platform’s data pipeline was implemented using Apache Spark, which was managed by AWS EMR. Amazon S3 was used to store RTB Logs from partners.
AWS Lambda was implemented to process S3 events and to notify the Kinesis producers into SQS for initial Kinesis processing. Data landed in Kinesis streams, triggering Spark Streaming jobs to perform required data transformations and aggregation (by location, time, etc.), and also to clean the data.
Once the data was aggregated and cleaned, it was uploaded to the output S3 bucket and loaded into the Snowflake DWH for BI Analytics.
Microservices architecture was used to build real-time production clusters and interactive production clusters of the platform. Microservices were also used to optimize development, testing, and deployment operations. Apache Kafka was used as a major messaging bus, as well as to store durable distributes events.
The solution was deployed across four main clusters:
- Data cluster powered by Amazon EMR and AWS Lambda and used to ingest and filter initial data
- Kubernetes cluster with stateful microservices powered by Apache Kafka to process and manage on-premise data and operations
- Batch data processing cluster with Apache Spark and Apache Mesos used for large scale ad campaign analytics jobs
- Snowflake cluster used for custom research and analytics as well as for storing final data shapes for partners
The delivered solution was aimed at accelerating the performance of InMarket’s data team, increasing analysis quality of the location data, and processing multiple RTB logs at scale.
InMarket received a robust data & analytics platform powered by machine learning. The platform is capable of processing and managing over 5bn events per day while supporting 10 specific data products.
The ML-powered data & analytics platform allowed InMarket to bring their data analysis to a whole new level. Specifically, the completion rate for Apache Spark jobs that did not exceed 40% reached 99%. That resulted in a 50% boost in productivity of the company’s data team, which allowed InMarket to enjoy a $1M yearly increase in ROI.
The platform also demonstrated a 10x increase in cluster throughput, which allowed to optimize resources and save up to $100K per month.
The platform opened unique competitive advantages for InMarket, allowing the company not only to cost-efficiently collect and process billions of events on a daily basis, but also to deliver actionable insights to their clients in real time. That improved InMarket’s appeal to marquee brands and strengthened the company’s leading position among omnichannel marketing platforms for Fortune 500 brands.