How To Become a Data EngineerΒΆ

Useful articlesΒΆ

TalksΒΆ

Algorithms & Data StructuresΒΆ

SQLΒΆ

ProgrammingΒΆ

DatabasesΒΆ

Distributed SystemsΒΆ

BooksΒΆ

BlogsΒΆ

  • Martin Kleppmann author of Designing Data-Intensive Application

  • BaseDS by Vaidehi Joshi about Distributed Systems

ToolsΒΆ

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python

  • Apache Spark is a unified analytics engine for large-scale data processing

  • Apache Kafka is a distributed streaming platform

  • Luigi is a Python package that helps you build complex pipelines of batch jobs.

  • Dagster.io is a system for building modern data applications.

  • Prefect includes everything you need to create and run data applications.

  • Metaflow build and manage real-life data science projects with ease

  • lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud PlatformsΒΆ

CommunitiesΒΆ

Data Engineering JobsΒΆ

OtherΒΆ

Newsletters & DigestsΒΆ