Who Should Attend?

This data engineering for data scientists course is designed for data scientists that would like to get a complete and detailed understanding of a big data solution, and the dynamics of data.

Course Prerequisites

Attendees should be familiar with the basic concepts of a big data/analytics solution, and have experience with Python or R.

Course Duration

Three days.

Course Learnings

  • What are the different components of a big data/analytics solution?
  • What are the key big data technologies?
  • What is the added value of a data lake for analytics?
  • How should I approach my analytics projects for maximum operability?
  • How can I explore the data in my lake?
  • How do I make analytics on a data lake work?
  • What are useful programming languages and libraries for running analytics on a cluster?
  • Advanced Python and Spark (packaging, generators, multiple-CPU performance, etc.)!

Hands-on exercises

Throughout the course, hands-on exercises reinforce the topics being discussed.