ON-DEMAND WORKSHOP

Serverless Data Lake
Immersion Day

Learn how to build a Serverless Data Lake solution on AWS

On-demand

The Data Lake Immersion Day is a hands-on workshop for learning and practicing how to use AWS Kinesis, Amazon S3, Amazon Athena, AWS Glue, and Amazon QuickSight to design and build a serverless data lake. The event has hands-on labs and modules that focus on data ingestion, storage, cataloging, ETL, processing, analysis, and visualization in the data lake in AWS.

Why Attend?

  • Find out how to use AWS to take advantage of all data types for advanced analytics
  • Explore best practices around data lake architecture
  • Learn AWS data lake services through hands-on experience
  • Address your data lake questions to experts from AWS and Provectus

NOTE: You will need an AWS account and a laptop to participate in lab exercises.

Workshop Registration

  • This field is for validation purposes and should be left unchanged.

See the Provectus privacy policy for details on how we collect, use, and share information you provided.

On-demand

The Data Lake Immersion Day is a hands-on workshop for learning and practicing how to use AWS Kinesis, Amazon S3, Amazon Athena, AWS Glue, and Amazon QuickSight to design and build a serverless data lake. The event has hands-on labs and modules that focus on data ingestion, storage, cataloging, ETL, processing, analysis, and visualization in the data lake in AWS.

Why Attend?

  • Find out how to use AWS to take advantage of all data types for advanced analytics
  • Explore best practices around data lake architecture
  • Learn AWS data lake services through hands-on experience
  • Address your data lake questions to experts from AWS and Provectus

NOTE: You will need an AWS account and a laptop to participate in lab exercises.

Agenda

Welcome & Introductions

Data Ingestion & Central Storage [speaking session]

The data ingestion step comprises data ingestion by both the speed and batch layer, usually in parallel. For the batch layer, historical data can be ingested at any desired interval. For the speed layer, the fast-moving data must be captured as it is produced and streamed for analysis. For fast data ingestion Kinesis Data Streams is the recommended service to ingest streaming data into AWS. Customers can use Amazon Kinesis Agent, a pre-built application, to collect and send data to an Amazon Kinesis stream or use the Amazon Kinesis Producer Library (KPL) as part of a custom application. For batch ingestions, customers can use AWS Glue or AWS Database Migration Service to read from source systems, such as RDBMS, Data Warehouses, and Nosql databases.

Ingestion & Storage [lab]

A data lake is a centralized repository for both structured and unstructured data, where you store data as-is, in open source file formats to enable direct analytics. Implementing a Data Lake architecture requires a broad set of tools and technologies to serve an increasingly diverse set of applications and use cases.

Break

Data Cataloging and ETL [speaking session]

Data cataloging is the ability to understand what data is in the lake through crawling, cataloging, and indexing of data. ETL is performing data engineering on the data.

CAT ETL Process Data [hands-on lab]

Now that you have data in your data lake, this lab will introduce you to AWS Glue, a fully managed serverless extract, transform, and load (ETL) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.

Lunch

Data Analysis and Visualization [hands-on lab]

In the previous labs, you worked with an extremely small dataset (less than < 10MB) and with a single data source. In this lab, let’s use a public dataset with bigger size and more tables and observe various AWS services in action.

Break

Q&A

Who Should Attend

The labs, modules, and exercises are geared toward Business Intelligence analysts, Database administrators, infrastructure administrators, developers, and architects. Basic familiarity with AWS is preferred.

Presented by

photo

Stepan Pushkarev

CTO at Provectus

photo

Nirav Shah

AWS Solutions Architect

photo

Patrick McDermott

Business Development Leader

Ready to build a Serverless Data Lake on AWS?