Phase 2: Data Science FoundationsΒΆ
This folder is the practical base layer for the rest of the repo. If you are not yet comfortable with arrays, DataFrames, plots, train/test splits, and the fit/predict/transform workflow, later LLM and deep learning phases will feel harder than they need to.
What This Phase CoversΒΆ
NumPy for numerical thinking
pandas for messy real-world data
matplotlib for basic visualization
scikit-learn for classical ML workflows
broader data science examples for exploratory and applied practice
Folder MapΒΆ
1-numpy-examples/: array operations, broadcasting, indexing, exercises2-pandas-examples/: cleaning, joins, grouping, time-series handling, projects3-data-science-examples/: broader learning material and reference notebooks4-matplotlib/: plotting fundamentals5-scikit-learn/: a very large example library across major model families
Recommended First PassΒΆ
Work through
1-numpy-examples/Move to
2-pandas-examples/Use
4-matplotlib/for core plotting habitsIn
5-scikit-learn/, focus first on:linear_model/model_selection/preprocessing/ensemble/cluster/
Use
3-data-science-examples/as breadth and reinforcement, not as a strict sequential course
Study AdviceΒΆ
Do not try to complete every scikit-learn notebook on the first pass.
Prefer one full workflow over broad shallow browsing: data loading -> cleaning -> feature work -> split -> train -> evaluate -> explain results.
Keep notes on leakage, validation mistakes, and metric choice. Those habits matter more than memorizing APIs.
Suggested ProjectsΒΆ
Iris or wine classification with proper validation
Housing-price regression with feature engineering
Customer segmentation with clustering and a short business write-up
What Comes NextΒΆ
After this phase, move to: