How To Become a Data EngineerΒΆ
Useful articlesΒΆ
A Beginnerβs Guide to Data Engineering
Functional Data Engineering β a modern paradigm for batch data processing
TalksΒΆ
Data Engineering Principles - Build frameworks not pipelines by Gatis Seja
Functional Data Engineering - A Set of Best Practices by Maxime Beauchemin
Advanced Data Engineering Patterns with Apache Airflow by Maxime Beauchemin
Creating a Data Engineering Culture by Jesse Anderson
Streaming 101: Hello Streaming by Josh Fischer
Algorithms & Data StructuresΒΆ
Algorithmic Toolbox in Russian
Data Structures in Russian
Data Structures & Algorithms Specialization on Coursera
Algorithms Specialization from Stanford on Coursera
SQLΒΆ
Comprehensive SQL Tutorial by Mode Analytics
SQL Practice on Leetcode
Modern SQL a website about modern SQL syntax
ProgrammingΒΆ
Scala School by Twitter
Fluent Python intermediate level book about Python
Intro to Scala in Russian on Stepik by Tinkoff Bank
The Hitchhikerβs Guide to Python by Kenneth Reitz & Tanya Schlusser
Learn Python 3 The Hard Way by Zed A. Shaw
DatabasesΒΆ
Intro to Database Systems by Carnegie Mellon University
Advanced Database Systems by Carnegie Mellon University
On Disk IO
Distributed SystemsΒΆ
Distributed systems for fun and profit by Mikito Takada
Distributed Systems by Maarten van Steen & Andrew S. Tanenbaum
CSE138: Distributed Systems by Lindsey Kuper
CS 436: Distributed Computer Systems by University of Waterloo
MIT 6.824: Distributed Systems by Robert Morris from MIT
Distributed consensus reading list maintained by Heidi Howard from University of Cambridge
BooksΒΆ
Design Data-Intensive Applications by Martin Kleppmann
Fundamentals of Data Engineering: Plan and Build Robust Data Systems by Joe Reis & Matt Housley
Introduction to Algorithms by Thomas Cormen
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Database Internals: A Deep Dive into How Distributed Data Systems Work
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
Grokking Streaming Systems by Josh Fischer & Ning Wang
Guide to High Performance Distributed Computing by K.G. Srinivasa & Anil Kumar Muppalla
Data Pipelines with Apache Airflow by Bas P. Harenslak and Julian Rutger de Ruiter
BlogsΒΆ
Martin Kleppmann author of Designing Data-Intensive Application
BaseDS by Vaidehi Joshi about Distributed Systems
ToolsΒΆ
Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
Apache Spark is a unified analytics engine for large-scale data processing
Apache Kafka is a distributed streaming platform
Luigi is a Python package that helps you build complex pipelines of batch jobs.
Dagster.io is a system for building modern data applications.
Prefect includes everything you need to create and run data applications.
Metaflow build and manage real-life data science projects with ease
lakeFS build repeatable, atomic and versioned data lake operations β from complex ETL jobs to data science and analytics.
Cloud PlatformsΒΆ
CommunitiesΒΆ
data Engineering - telegram chat about data engineering
Data Engineering Subreddit - subreddit about data engineering