Data Infrastructure Migration and Modernization
IMVU uses their new re-architected data platform for data streaming and analytics to generate faster critical insights on customer lifetime behavior at scale on AWS
IMVU is the world’s largest avatar-based social network where shared experiences build deeper friendships and foster creativity, and all relationships matter. At IMVU, over 7 million users monthly customize their avatars, chat with friends, shop, hang out at cool parties, and earn real money by creating virtual products.
IMVU wanted to enhance and re-architect their aging on-premise data platform, to support advanced analytics and Machine Learning use cases. With exponentially growing data volume and monolithic Hadoop architecture, the IMVU team was challenged to innovate and take advantage of user-generated data more efficiently.
The IMVU data platform was re-architected for the AWS cloud. Apache Hadoop clusters were migrated to Amazon EMR, Hive/Spark jobs were optimized, and Apache Kafka data streams were mirrored to the cloud, to enable data analytics.
IMVU received a scalable, cloud-native infrastructure for its modernized data platform. Through data streaming and advanced analytics, IMVU generates insights into customer lifetime behavior and improves customer retention with ML. The platform was a driving factor of IMVU’s 50% growth amid the Covid-19 pandemic.
On-Premises Infrastructure Limits Innovation and Capacity for Advanced Analytics
IMVU was looking to enhance and re-architect their platform by augmenting it through advanced analytics and data streaming. The company was one of the pioneers and early adopters of Apache Hadoop, taking advantage of Big Data technologies before they became mainstream. Though IMVU maintained deep internal expertise, it was challenging for them to support and upgrade their 90-node Hadoop cluster and in-house built tooling.
For IMVU, it was critical that their new platform could support comprehensive analytics. IMVU’s analysts did not have the tools to rapidly generate a range of business-critical reports on customer in-game behavior at scale. They had to work with historical data in batches (run analytics jobs every night) instead of generating reports and taking action in real time. That made analytics more complex and created multiple bottlenecks. For instance, late reports resulted in inaccurate assumptions about customer in-game purchases, which caused slower sales and loss of profit.
The analytics team also did not have a test environment to efficiently check analytics assumptions.
That is why, IMVU sought to modernize their platform’s data architecture by introducing CI/CD, Infrastructure as Code (IaC), and other best practices, to achieve faster analytics iterations, better maintainability, and lower TCO.
Technology-wise, the IMVU platform was powered by a 90-node on-prem Hadoop cluster. Compute was coupled with storage, with 300-400 Hive/Spark jobs used to continually reprocess data from scratch, which was not cost-efficient. IMVU needed a new solution to run the cluster only when required.
- Tightly coupled compute and storage required purchasing excess capacity
- Hadoop cluster was over-utilized during peak hours and under-utilized at other times
- Solution as such resulted in high costs and low efficiency
IMVU teamed up with Provectus to drive innovation as well as implement streaming architecture, enable advanced analytics, and build several AI-powered apps (recommendations, abuse detection, etc.) to improve customer retention in the long term.
Building a New Data Platform: Migration, Re-Architecture, and Enhancements
IMVU has partnered with Provectus and AWS and came up with a comprehensive migration and modernization strategy towards NextGen architecture. Multi-phase approach was designed to ensure business continuity and address migration risks as well as security best practices.
To ensure the effectiveness and cost-efficiency of data analysis on the IMVU platform, their data pipelines were modernized and re-architected to meet the requirements for a modern Data Platform. For that, open source solutions and AWS services were utilized.
Data Platform was designed to use Airflow for both job scheduling and monitoring, to run on Amazon EKS. Data pipelines were optimized to utilize Amazon’s EMR Autoscaling policies, to account for increasing data volumes without sacrificing time of delivery for reports. Along the way, Provectus optimized Hive/Spark jobs, decoupled compute and storage layers, introduced a new query engine based on Apache Presto as well as the concept of ephemeral purposely instantiated clusters for data processing. We implemented concise and self-documented delivery pipelines covering all aspects of the Modernized Data Platform using Terraform, Helm, Jenkins, Prometheus, and OpsGenie. Every delivery pipeline was integrated with Bitbucket, Jira, and Slack.
The Business Intelligence (BI) layer was optimized to serve the needs of IMVU’s data analytics team and their growing demand for cluster resources. With modernized architecture and data decoupling, Provectus deployed and built a custom solution using PrestoSQL for data access and Apache Ranger for managing the security aspects of the Data Platform.
As part of data platform modernization and re-architecture, Provectus migrated Hadoop clusters to Amazon EMR with Data and Compute decoupling.
Advanced Data Platform Helps Generate Insights Faster, on a Larger Scale
IMVU received a scalable, cloud-native infrastructure for its data platform, which unlocked the full potential of AWS cloud and IMVU unique user generated data and created a foundation for near real time analytical and machine learning use cases.
The IMVU team can now take advantage of vast amounts of data generated by users to look into customer lifetime behavior and generate a range of comprehensive reports, including but not limited to in-game purchases, customer engagement, and avatar preferences. The analysts of IMVU can also generate reports much faster, whenever they need to.
Infrastructure-wise, IMVU has overcome significant technological barriers to fast and seamless code deployments. By having their platform re-architected for AWS, IMVU has created a solid foundation for their cloud-based products, as well as introduced the potential for using Artificial Intelligence and Machine Learning for customer retention.
The innovation was timely. More efficient analytics on a sophisticated data platform paved the way for IMVU’s success during the pandemic of Covid-19. The company managed to grow by no less than 50% while reducing operational expenses.
IMVU plans to continue to collaborate with Provectus to further improve their platform’s analytics capabilities through Machine Learning. The company aims to increase its analytics potential to closely monitor customer-to-customer engagements (e.g. detect abuse in chat rooms) using AI.
- Learn more about the Provectus Amazon EMR Migration and NextGen Data Platform
- Watch the webinar on Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
- Apply for Amazon EMR Migration Acceleration Program to get started