Phase 2: Data Science FoundationsΒΆ

This folder is the practical base layer for the rest of the repo. If you are not yet comfortable with arrays, DataFrames, plots, train/test splits, and the fit/predict/transform workflow, later LLM and deep learning phases will feel harder than they need to.

What This Phase CoversΒΆ

  • NumPy for numerical thinking

  • pandas for messy real-world data

  • matplotlib for basic visualization

  • scikit-learn for classical ML workflows

  • broader data science examples for exploratory and applied practice

Folder MapΒΆ

  • 1-numpy-examples/: array operations, broadcasting, indexing, exercises

  • 2-pandas-examples/: cleaning, joins, grouping, time-series handling, projects

  • 3-data-science-examples/: broader learning material and reference notebooks

  • 4-matplotlib/: plotting fundamentals

  • 5-scikit-learn/: a very large example library across major model families

Study AdviceΒΆ

  • Do not try to complete every scikit-learn notebook on the first pass.

  • Prefer one full workflow over broad shallow browsing: data loading -> cleaning -> feature work -> split -> train -> evaluate -> explain results.

  • Keep notes on leakage, validation mistakes, and metric choice. Those habits matter more than memorizing APIs.

Suggested ProjectsΒΆ

  • Iris or wine classification with proper validation

  • Housing-price regression with feature engineering

  • Customer segmentation with clustering and a short business write-up

What Comes NextΒΆ

After this phase, move to:

  1. 03-maths/README.md

  2. 04-token/

  3. 05-embeddings/