Streamlining Microbiome Research on a Secure Data Platform

Second Genome enhances its data platform to accelerate and scale biomarker research, drug trials and development in a secure and compliant environment

Home » Case Study » Secure Data Infrastructure for Microbiome Research

Second Genome is a biotechnology company that extracts microbial genetic insights to make transformational precision therapies and biomarkers through clinical development and commercialization. It used machine-learning analytics, customized protein engineering techniques, phage library screening, mass spec analysis and CRISPR, coupled with traditional drug development approaches, to build a proprietary microbiome-based drug discovery and development platform. The company collaborates with industry, academia and government to optimize its microbiome platform. Gilead Sciences, Inc, one of Second Genome’s strategic partners, is using the platform and comprehensive data sets to identify novel biomarkers for clinical responses to their investigative medicines.


Second Genome wanted to accelerate and scale microbiome drug discovery and development by improving data ingestion and staging, and by refining the codebase of its data platform. Operating in a highly regulated pharmaceutical industry, the company needed to raise the bar for data security compliance, to create a safe drug research and development environment for its clients and partners.


Second Genome partnered with Provectus to revamp data ingestion and staging of the data pipeline, and to make improvements to its current codebase. A new secure cloud-native data infrastructure on AWS was built in close collaboration with the Second Genome team. It was designed as fully automated, with CI/CD in place, and in line with AWS security guidelines for healthcare data.


The new data infrastructure for the Microbiome Drug Discovery and Development Platform enabled Second Genome to run petabyte-scale projects for biomarker research, drug trials, and drug development in a secure compliant environment. A fast, scalable data platform makes it easier for Second Genome to partner with premium biopharma companies, to develop novel therapeutics.

At-scale drug trials and development enabled

AWS requirements for data security followed

New data infrastructure built in record time


Efficient Handling of Microbiome Data Speeds Up Microbial Research, Drug Trials and Discovery

In recent decades, the healthcare industry has been undergoing a rapid transformation, from a one‐drug‐fits‐all approach, to more personalized medicine. The customization of healthcare, with medical decisions, treatment, practices, and products that are tailored to a subgroup of patients, is reported to offer such advantages as:

  • Efficient preemptive disease detection and treatment
  • Considerable reductions in time, cost, and failure rates of pharmaceutical clinical trials
  • Fewer trial-and-error inefficiencies that inflate healthcare costs and undermine patient care

As a biopharmaceutical company with 10 years of experience in microbiome research, Second Genome is at the forefront of the “right patient, right therapy” transformation, helping to identify responder/non-responder populations and determine the optimal approach to therapy.

Second Genome has developed a proprietary data platform — Microbiome Drug Discovery and Development Platform — that uses machine learning to rapidly translate clinical and molecular data into actionable insights for novel biomarkers and drug discovery programs. With more than 400 structured databases of public and proprietary microbiome datasets available on SGKnowledgeBaseTM, the platform helps pharmaceutical companies augment their research on genetic variations and enable novel drug discovery by identifying precision biomarkers.

Second Genome joined forces with Provectus in an effort to enhance its data platform by making it faster, more scalable, secure and compliant. It was critical for Second Genome to raise the bar for data security compliance, to encourage premium pharmaceutical companies, as well as government agencies and academia, to conduct their drug research and development on its data platform.

Provectus collaborated with Second Genome on the design and development of a new secure, cloud-native data infrastructure on AWS, starting with the improvement of the existing data ingestion and staging components of its platform’s data pipeline. The teams also partnered to build a Machine Learning solution, to contextualize the IO biomarkers of specific therapeutic programs.


Building a Secure and Compliant Data Infrastructure for Industrial-Scale Drug Candidate Discovery

Provectus reviewed the data ingestion and staging portions of the data pipeline of Second Genome’s data platform. Along the way, the team looked into such issues as data quality, error monitoring and logging, handling of hard-coded variables, and running API tests on sample data.

To deliver a new secure cloud-native data infrastructure for Second Genome’s data platform, Provectus applied DevOps best practices to automate patch management, centralized logging and disaster recovery, and to introduce and improve CI/CD. The CI/CD work helped drive efficiencies in the handling of features and feature environments, deployment to production and delivery pipelines, and monitoring and reporting.

From a security standpoint, data in transit and at rest were secured in Amazon S3 with TLS. Access logs were securely stored in Amazon S3, with Amazon VPC flow logs collected as well. All open security groups were eliminated, with MFA ensured for most users. Penetration tests were run to confirm that the delivered infrastructure meets AWS security requirements.

On the development side, Provectus helped anonymize Neo4J Data and optimize Django settings. The pipeline was thoroughly tested and made ready for new features to be added as the platform develops.

As part of the data infrastructure building process, Provectus and Second Genome are now in the process of revamping ML infrastructure and MLOps, to further enable Second Genome to contextualize its IO biomarker program to its therapeutic program, with the help of machine learning.


Microbiome Data Available on a Safe and Secure Data Platform Accelerates and Scales Drug Trials, Advances Precision Medicine

The new data infrastructure enabled Second Genome to enhance its Microbiome Drug Discovery and Development Platform, making it possible to run petabyte-scale projects for biomarker research, drug trials, and drug development.

The data infrastructure meets the strict requirements and high standards of AWS for security and operational efficiency in the cloud. Data pipelines are fully automated, with CI/CD in place, to ensure that Second Genome’s microbiome datasets are collected, stored, and used in a compliant manner.

Thanks to the new secure cloud-native data infrastructure on AWS, Second Genome is now able to more easily onboard premium pharmaceutical companies to its platform. The partners of Second Genome can now discover, develop, and test drugs much faster and on an industrial scale, without having to worry about the security of their research. With its new platform, the company expects to quickly build new collaborations across industry, government, and academia, to advance its biomarker and drug discovery programs.

Moving Forward

  1. Learn more about the Provectus NextGen Data Platform and Machine Learning Infrastructure solutions
  2. Watch the webinar on MLOps and reproducible ML on AWS
  3. Apply for NextGen Data Platform Acceleration Program or Machine Learning Infrastructure Acceleration Program, to get started


Looking to explore the solution?

  • Hidden
  • Hidden
  • This field is for validation purposes and should be left unchanged.