mobile-logo
menu-icon
prov-orange prov-orange-new
home-text

Odessa ,July 21, 2018

Provectus organized the international technical Data Science and Big Data conference — Data Summer Conf.

We invited speakers from such established brands as Amazon Web Services, Google, NVIDIA, Hydrosphere.io, Spotify, and SoftServe, and they were excited to share their vision and experience with more than 400 attendees.

The event was held in 3 tracks. The first track was dedicated to Big Data, while the second track was all about Data Science. Additionally, attendees could participate in three individual workshops where the speakers Fedor Navruzov, Akmal Chaudhri and Jonathan Taws shared helpful use cases, tech details, and insights.

After the tracks, we have panel discussion and provided detailed replies to challenging questions from the audience.

Knowledge sharing, networking, workshops, keynotes, and the Afterparty at the seaside — that was all about Data Summer Conf.

Speakers

14

Participants

400

Agenda

9:00

Registration & Welcome Coffee

9:50

Welcome speech

10:00

Akmal Chaudhri (Technical Evangelist) (BIG DATA & IOT)

Apache Ignite + Apache Spark RDDs and DataFrames integration (ENG)

Dmitry Korobchenko (Deep Learning R&D Engineer, NVIDIA Ltd) (DATA SCIENCE)

How to accelerate your neural net inference with TensorRT (ENG)

Fedor Navruzov (Data Scientist, Speak With a Geek) (WORKSHOPS)

Automated feature engineering with FeatureTools (RUS)

11:00

Jacek Laskowski (Spark Consultant ) (BIG DATA & IOT)

Deep Dive into Query Execution in Spark SQL 2.3 (ENG)

Javier Rodriguez Zaurin (Data Scientist, Simply Business) (DATA SCIENCE)

From the math to the business value: machine learning in the real world (ENG)

12:00

Coffee break

12:30

Vadim Chelyshov (Software engineer in Provectus) (BIG DATA & IOT)

Mist – Serverless proxy for Apache Spark (RUS)

Giuseppe Angelo Porcelli (Solutions Architect at Amazon Web Services)(DATA SCIENCE)

Build, train, and deploy machine learning models at scale with Amazon SageMaker (ENG)

Akmal Chaudhri (Technical Evangelist) (WORKSHOPS)

Hands-on with Apache Spark for Beginners (ENG)

13:30

Rudradeb Mitra (Product Mentor, Google Developers) (BIG DATA & IOT)

Architecting IoT system with Machine Learning (ENG)

Sri Sri (Sr. Data Scientist, Spotify) (DATA SCIENCE)

Multi-touch Attribution: Key challenge around designing a solution (ENG)

14:30

Lunch

15:30

Giorgi Jvaridze (Senior Software Engineer, Zalando) (BIG DATA & IOT)

Analysing Billion Node Graphs (ENG)

Stepan Pushkarev (CTO, Hydrosphere.io) (DATA SCIENCE)

Monitoring AI with AI (RUS)

Jonathan Taws (Data Scientist at Amazon Web Services) (WORKSHOPS)

Hands-on using Apache Spark with Amazon SageMaker (ENG)

16:30

Oleksandr Saienko (Tech Leader/ Senior Software Engineer at SoftServe)(BIG DATA & IOT)

Building unified Batch and Stream processing pipeline with Apache Beam (RUS)

Roman Storchak (CTO, DatAI ) (DATA SCIENCE)

How we build Computer vision as a service (ENG)

17:30

Panel Discussion

18:30

Conference Closing

19:00

Afterparty (True Man Hot Boat bar )

19:30

BIG DATA & IOT

DATA SCIENCE

WORKSHOPS

9:00

Registration & Welcome Coffee

9:50

Welcome speech

10:00

Akmal Chaudhri (Technical Evangelist)

Apache Ignite + Apache Spark RDDs and DataFrames integration (ENG)

This session will explain how Apache Spark and Ignite are integrated, and how they are used to together for analytics, stream processing and machine learning. By the end of this session attendees will understand: – How Apache Ignite’s native RDD and new native DataFrame APIs work – How to use Ignite as an in-memory database and massively parallel processing (MPP) style collocated processing for preparing and managing data for Spark – How to leverage Ignite to easily share state across Spark jobs using mutable RDDs and DataFrames – How to leverage Ignite distributed SQL and advanced indexing in memory to improve SQL performance

11:00

Jacek Laskowski (Spark Consultant )

Deep Dive into Query Execution in Spark SQL 2.3 (ENG)

If you want to get even slightly better performance of your structured queries (regardless whether they are batch or streaming) you have to peek at the foundations of Dataset API starting with QueryExecution. That’s where any structured query ends at and my talk starts from. The talk will show you what stages a structured query has to go through before execution in Spark SQL. I’ll be talking about the different phases of query execution and the logical and physical optimizations. I’ll show the different optimizations in Spark SQL 2.3 and how to write one yourself (in Scala).

Dmitry Korobchenko (Deep Learning R&D Engineer, NVIDIA Ltd)

How to accelerate your neural net inference with TensorRT (ENG)

Modern neural networks are based on high-load computing. Both hardware and software are important for fast training and inference. Modern high-level frameworks, used to build and train a neural net, can sacrifice performance in favor of greater flexibility. Therefore, for deployment of a trained neural net in production an optimization of the performance is required. In the talk I will demonstrate possibility of such optimization and further fast inference on GPU using TensorRT.

Javier Rodriguez Zaurin (Data Scientist, Simply Business)

From the math to the business value: machine learning in the real world (ENG)

I will illustrate the “life-cycle” of a machine learning project, particularly the building of a recommender system, from the technical design (business problem and detailed algorithmic solution) to deployment. I will also briefly mentioned other examples (e.g. marketing multi-channel attribution models using Markov-Chain models, or boosted methods and deep learning for risk modelling) trying to emphasise the journey from the code to the business with a couple of histories of success and failure.

Fedor Navruzov (Data Scientist, Speak With a Geek)

Automated feature engineering with FeatureTools (RUS)

The workshop is dedicated to the usage of featuretools framework, which allows automated feature engineering. Real data from kaggle competition “Home Credit Default Risk Prediction” (https://www.kaggle.com/c/home-credit-default-risk / leaderboard) would be used as an example. Participants will get hands-on experience of representing data in the form of entities and relationships among them, using built-in and user-defined aggregations and transformations, parallelizing feature-set calculations, exporting and importing obtained data. The pros and cons of this library would be briefly highlighted, as well as the competition submit preparation process.*

12:00

Coffee break

12:30

Vadim Chelyshov (Software engineer in Provectus)

Mist – Serverless proxy for Apache Spark (RUS)

In this demo based talk with live coding, we’ll present a functional typeful framework for developing Apache Spark applications. We’ll walk through the following key topics: – turning unmanageable Spark scripts into typeful Spark Functions – serverless deployment of Spark functions into the cloud – unit testing Spark functions to save cluster resources and developers time – seamless Spark session management between concurrent Spark jobs in exclusive or share modes

13:30

Rudradeb Mitra (Product Mentor, Google Developers)

Architecting IoT system with Machine Learning (ENG)

In this talk, the speaker will share his experiences from building successful IoT systems. He will also explain why many IoT systems fail to get traction and how Machine Learning can help in that. Finally, he will talk about the right system architecture and touch upon some of the ML algorithms for IoT systems.

Giuseppe Angelo Porcelli (Solutions Architect at Amazon Web Services)

Build, train, and deploy machine learning models at scale with Amazon SageMaker (ENG)

Machine learning often feels a lot harder than it should be to most developers because the process to build and train models, and then deploy them into production is too complicated and too slow. Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Apache MXNet and TensorFlow are pre-installed, and Amazon SageMaker offers a range of built-in, high-performance machine learning algorithms. If you want to train with an alternative framework or algorithm, you can bring your own in a Docker container.

Sri Sri (Sr. Data Scientist, Spotify)

Multi-touch Attribution: Key challenge around designing a solution (ENG)

When I worked with Skyscanner, before Spotify, I worked on a popular marketing problem called Multi-touch Attribution. In this talk, I’ll explain Multi-touch attribution problem space and how a Data Science solution was designed for this problem.I focus on a key challenge of designing a Multi-touch attribution solution – Identifying which methodology is best for this problem. This is a thorny issue as the variable we are trying to study isn’t directly observable. I’ll discuss a novel approach to tackle this problem through simulated marketing environment.

Akmal Chaudhri (Technical Evangelist)

Hands-on with Apache Spark for Beginners (ENG)

In this workshop, attendees will use Apache Spark to undertake some simple calculations and solve some data manipulation problems. Through Python programming exercises, attendees will be able to get some hands-on experience with Spark using a cloud-based environment. The goal is to show the power of Spark, without needing to understand its complexity. Preparation for workshop: Detailed instructions will be provided closer to the event. However, essentially, attendees would need to create a free Databricks Community Edition (CE) account and bring their laptop with them. The workshop would require good wifi to connect to CE, hosted in the USA.*

14:30

Lunch

15:30

Giorgi Jvaridze (Senior Software Engineer, Zalando)

Analysing Billion Node Graphs (ENG)

Many important data science problems can be approached using graphs. The data that can be represented as graphs are everywhere: Social networks, Economic networks, Biomedical networks, Network of Neurons, the Internet itself. Some of those graphs can become very large. There are many challenges that we have to deal with when we want to process, analyse and visualize large graphs. In his talk Giorgi Jvaridze will talk about experiences he had working with multi-billion node graphs.

16:30

Oleksandr Saienko (Tech Leader/ Senior Software Engineer at SoftServe)

Building unified Batch and Stream processing pipeline with Apache Beam (RUS)

Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing pipelines, and also data ingestion and integration flows, supporting for both batch and streaming use cases. In presentation I will provide a general overview of Apache Beam and programming model comparison Apache Beam vs Apache Spark.

Stepan Pushkarev (CTO, Hydrosphere.io)

Monitoring AI with AI (RUS)

In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines. It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production.

Roman Storchak (CTO, DatAI )

How we build Computer vision as a service (ENG)

During this speech, we will look at the different versions of SaaS architectures built on the basis of ML / computer vision: – The advantages and disadvantages of using different design patterns of services – Modes of “serving” models (in most cases, TF) – Influence of architecture and the way it is implemented on product development. Bonus: Does the data scientist (y) need to know something other than data science?

Jonathan Taws (Data Scientist at Amazon Web Services)

Hands-on using Apache Spark with Amazon SageMaker (ENG)

Amazon SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Amazon SageMaker also provides an Apache Spark library, in both Python and Scala, that you can use to easily train models in Amazon SageMaker from your Spark clusters. Once a model has been trained, you can also deploy it using Amazon SageMaker hosting services. After a brief recap on Amazon SageMaker, this code-level workshop will show you how to integrate your Apache Spark application with Amazon SageMaker, including how to start training jobs from Spark, integrating them in Spark pipelines, and more.*

17:30

Panel Discussion

18:30

Conference Closing

19:00

Afterparty (True Man Hot Boat bar )

19:30

*Workshops Registration is opened for now! Registartion Forms are available after buying the Ticket to the conference.

Please select the number of tickets you want to order.

Price & Dates:

till 16 June – 1300 UAH

till 8 July – 1800 UAH

9-21  July – 2100 UAH

Group tickets & discounts

3+ tickets – 15% off

5+ tickets – 20% off

For students we have special offers.

Please feel free to contact – aspodarets@provectus.com
Group discounts and promo code don’t sum up.

Speakers

Please meet our first international guest speakers!
We work hard to add more star speakers to our pipeline.
In the coming weeks see our full agenda and finalised workshops program.
Stay tuned!

bg-image
review-img

Dmitry Korobchenko

“Data Summer Conference in Odessa was amazing: thanks to professional organization of the event, smart and interesting speakers and passionate audience. It was my first time giving a talk in Ukraine, and I discovered a huge interest in Data Science field among the society there. That’s why conducting such kind of events is very important for connecting interested people with world experts, networking and skill sharing.”

review-img

Sri Sri

“Data Summer Conf was a lot of fun, very thoughtful scientific questions and the energy exhibited by the enthusiastic Data Science community was outstanding. Odessa at that is a great location to chill right after in the summer.”

review-img

Oleksandr Saienko

“Really cool event, great speakers presentations and live knowledge sharing. Especially networking and after party on sea beach)."

review-img

Javier Rodriguez Zaurin

“An entertaining, hands on conference in a country full of talented people. Definitely a must-go summer event.”

review-img

Jonathan Taws

“I really enjoyed the Data Summer Conference in Odessa. I'm always amazed how the Ukrainian Data Science community comes together to learn and dive deep into technical challenges, this conference was no different and a great catalyst for the community.”

review-img

Rudradeb Mitra

“I had a great time in Data summer conference in Odessa. The organizers were super helpful, and made my trip and stay very comfortable. The venue was one of the best I have been and everything was very well organized. About the talk, I found the audience knowledgeable and enthusiastic. Met a lot of smart people and had very interesting discussions. Would be glad to be back.”

review-img

Stepan Pushkarev

“Top level speakers that were comepred only to best confernces in the US. Networking at the beach till 5am? Yes, we did it. The discussions and talks about AI, ML, and Big Data were exceptional.”

review-img

Fedor Navruzov

“This event was awesome, both the audience and the speakers showed mutual interest in each other, while the organizing team worked so hard to make it happen! I was very pleased to be a part of what was happening at the conference, and to contribute to the development and popularization of data science in Ukraine, specifically, in Odessa. Well done, guys!”

review-img

Vadim Chelyshov

“I was delighted to attend and be part of this conference. The more such event we have - the more chances to see the rise of big data and machine learning technologies adoption level.”

review-img

Giorgi Jvaridze

“Data Summer Conf was awesome! Some super interesting talks were given by great speakers from all around the world. Great audience as well, really enjoyed discussions after my talk. I believe this kind of events are crucial for supporting local data communities. Knowledge sharing, networking and just having fun together is what makes conferences like this super valuable. Very well organised event, kudos to the organisers!”

In good company

Industry-defining brands that have helped make
Data Summer Conference:

Group tickets & discounts

3+ tickets – 15% off

5+ tickets – 20% off

What does my ticket include?

  • Conference Welcome package
  • Access to tech talks on Data Science stream
  • Access to tech talks on BIG DATA & IOT stream
  • Access to Workshops
  • Lunch and Сoffee Breaks
  • Visiting afterparty
  • Video recordings of tech talks + Presentations

Every ticket helps mentor a child

Your contribution goes to helping kids get a better start in life at Atom Space.

atspace-img
atspace-hover
atspace-border
atspace-img
atspace-hover
atspace-border
atspace-img
atspace-hover
atspace-border
contact-icon

Diana Tereshchenko

dtereshchenko@provectus.com

skype ..........

facebook ... /

contact-icon

Victoriia Korobkina

vkorobkina@provectus.com

skype ..........

facebook ... /

contact-icon

Maxim Tereschenko

mtereschenko@provectus.com

skype ..........

facebook ... /

contact-icon

Marina Nikitchuk

mnikitchuk@provectus.com

skype ..........

facebook ... /

contact-icon

Anna Derevyanko

aderevyanko@provectus.com

skype ..........

facebook ... /

contact-icon

Dmytro Spodarets

dspodarets@provectus.com

skype ..........

facebook ... /

close-icon

Ukraine, Odessa

Sady Pobedy

vul. Varlamova, 28

datasummer@provectus.com

modal-close
modal-close
modal-close
modal-close
modal-close
modal-close
modal-close
modal-close
modal-close
modal-close
modal-close
modal-close
modal-close
modal-close