Run this notebook: Open in Colab Open in Kaggle

Setup: Install Required Packages¶

Before diving into ML code, we need to install the core Python libraries that form the backbone of every machine learning project. NumPy provides fast array operations, Matplotlib and Seaborn handle visualization, SciPy offers scientific computing utilities, and scikit-learn delivers production-ready ML algorithms. These libraries are so fundamental that virtually every ML paper, tutorial, and production system relies on them.

# Run this cell if packages are not installed
# !pip install numpy matplotlib seaborn scipy scikit-learn jupyter

1. NumPy: Numerical Python¶

Why NumPy? Pure Python lists are too slow for ML workloads involving millions of data points. NumPy provides ndarray – a contiguous block of memory that supports vectorized operations executed in optimized C code, often 10-100x faster than equivalent Python loops.

Core concept: Everything in ML revolves around tensors (multi-dimensional arrays). A 1D array is a vector, a 2D array is a matrix, and higher-dimensional arrays represent batches of images, sequences of embeddings, and more. Understanding np.array shapes and operations is the single most important skill for reading and writing ML code.

Mathematical foundation: NumPy maps directly to linear algebra. When you write A @ B, you are computing the matrix product \(C_{ij} = \sum_k A_{ik} B_{kj}\), the same operation that powers every layer of a neural network.

import numpy as np

# Creating arrays
arr1d = np.array([1, 2, 3, 4, 5])
arr2d = np.array([[1, 2, 3], [4, 5, 6]])

print("1D array:", arr1d)
print("Shape:", arr1d.shape)
print("\n2D array:")
print(arr2d)
print("Shape:", arr2d.shape)

# Common array creation functions
zeros = np.zeros((3, 4))  # 3x4 array of zeros
ones = np.ones((2, 3))    # 2x3 array of ones
eye = np.eye(3)           # 3x3 identity matrix
arange = np.arange(0, 10, 2)  # [0, 2, 4, 6, 8]
linspace = np.linspace(0, 1, 5)  # 5 evenly spaced values from 0 to 1

print("Zeros:")
print(zeros)
print("\nLinspace:")
print(linspace)

# Array operations (element-wise)
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("a + b =", a + b)
print("a * b =", a * b)  # Element-wise multiplication
print("a ** 2 =", a ** 2)
print("sqrt(a) =", np.sqrt(a))
print("exp(a) =", np.exp(a))

# Matrix operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix multiplication A @ B:")
print(A @ B)

print("\nTranspose A.T:")
print(A.T)

print("\nInverse:")
print(np.linalg.inv(A))

print("\nDeterminant:", np.linalg.det(A))

# Indexing and slicing
arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

print("Full array:")
print(arr)
print("\nFirst row:", arr[0, :])  # or arr[0]
print("Second column:", arr[:, 1])
print("Subarray (top-left 2x2):")
print(arr[:2, :2])

# Aggregation functions
data = np.array([1, 2, 3, 4, 5])

print("Sum:", np.sum(data))
print("Mean:", np.mean(data))
print("Std:", np.std(data))
print("Min:", np.min(data))
print("Max:", np.max(data))
print("Argmax:", np.argmax(data))  # Index of max value

# Random numbers (crucial for ML)
np.random.seed(42)  # For reproducibility

rand_uniform = np.random.rand(5)  # Uniform [0, 1)
rand_normal = np.random.randn(5)  # Normal (mean=0, std=1)
rand_int = np.random.randint(0, 10, 5)  # Random integers

print("Uniform random:", rand_uniform)
print("Normal random:", rand_normal)
print("Random integers:", rand_int)

2. Matplotlib: Visualization Basics¶

Why Matplotlib? Visualization is not optional in ML – it is how you debug models, communicate results, and build intuition about data. Matplotlib is the foundational plotting library in Python; nearly every other visualization tool (Seaborn, Plotly, even TensorBoard) builds on top of it.

Key patterns you will use constantly: line plots for training loss curves, scatter plots for exploring feature relationships, histograms for understanding data distributions, and subplots for comparing experiments side by side. Mastering plt.figure, plt.plot, and plt.subplot gives you the vocabulary to visualize anything from a simple regression line to a complex multi-panel experiment dashboard.

import matplotlib.pyplot as plt

# Simple line plot
x = np.linspace(0, 2*np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.figure(figsize=(10, 6))
plt.plot(x, y1, label='sin(x)', linewidth=2)
plt.plot(x, y2, label='cos(x)', linewidth=2)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Sine and Cosine Functions')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Scatter plot
np.random.seed(42)
x = np.random.randn(100)
y = 2*x + np.random.randn(100)*0.5

plt.figure(figsize=(8, 6))
plt.scatter(x, y, alpha=0.6)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Scatter Plot Example')
plt.grid(True, alpha=0.3)
plt.show()

# Subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Top-left: Line plot
axes[0, 0].plot(x, y1)
axes[0, 0].set_title('Line Plot')

# Top-right: Scatter
axes[0, 1].scatter(x, y)
axes[0, 1].set_title('Scatter Plot')

# Bottom-left: Histogram
axes[1, 0].hist(np.random.randn(1000), bins=30)
axes[1, 0].set_title('Histogram')

# Bottom-right: Bar chart
categories = ['A', 'B', 'C', 'D']
values = [23, 45, 56, 78]
axes[1, 1].bar(categories, values)
axes[1, 1].set_title('Bar Chart')

plt.tight_layout()
plt.show()

# 3D plotting
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')

# Create data
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))

# Surface plot
surf = ax.plot_surface(X, Y, Z, cmap='viridis', alpha=0.8)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.set_title('3D Surface Plot')
fig.colorbar(surf)
plt.show()

3. Seaborn: Beautiful Statistical Plots¶

Why Seaborn? While Matplotlib gives you full control, Seaborn provides high-level abstractions specifically designed for statistical visualization. With a single function call, you can produce distribution plots with kernel density estimates, correlation heatmaps, or categorical comparisons – tasks that would require dozens of Matplotlib lines.

ML relevance: During exploratory data analysis (EDA), you need to quickly assess feature distributions, spot correlations between variables, and identify class separability. Functions like sns.histplot with kde=True overlay a smooth density estimate on your histogram, giving you an immediate sense of whether your data follows a normal distribution – a common assumption in many ML algorithms.

import seaborn as sns

# Set style
sns.set_style('darkgrid')
sns.set_palette('husl')

# Generate sample data
np.random.seed(42)
data = np.random.randn(100)

# Distribution plot
plt.figure(figsize=(10, 6))
sns.histplot(data, kde=True, bins=20)
plt.title('Distribution with KDE')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

# Heatmap for correlation matrices
# Create sample data
data_matrix = np.random.randn(50, 4)
correlation = np.corrcoef(data_matrix.T)

plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='coolwarm', 
            xticklabels=['Var1', 'Var2', 'Var3', 'Var4'],
            yticklabels=['Var1', 'Var2', 'Var3', 'Var4'])
plt.title('Correlation Matrix Heatmap')
plt.show()

4. SciPy: Scientific Computing¶

Why SciPy? SciPy extends NumPy with specialized modules for optimization, statistics, signal processing, and linear algebra. In ML, the two most critical subpackages are scipy.stats for working with probability distributions and scipy.optimize for finding function minima.

Connection to ML: When you evaluate a model’s predictions against a normal distribution using stats.norm.pdf, you are computing the probability density function \(f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\). When you call optimize.minimize, you are running the same class of algorithms (like L-BFGS) that train logistic regression and other classical ML models under the hood.

from scipy import stats

# Statistical distributions
normal_dist = stats.norm(loc=0, scale=1)  # mean=0, std=1

# Probability density function (PDF)
x = np.linspace(-4, 4, 100)
pdf = normal_dist.pdf(x)

plt.figure(figsize=(10, 6))
plt.plot(x, pdf, linewidth=2)
plt.fill_between(x, pdf, alpha=0.3)
plt.title('Normal Distribution PDF')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.grid(True, alpha=0.3)
plt.show()

# Calculate probabilities
print("P(X <= 0):", normal_dist.cdf(0))
print("P(X > 1.96):", 1 - normal_dist.cdf(1.96))

# Sampling from distributions
samples = normal_dist.rvs(size=1000)

plt.figure(figsize=(10, 6))
plt.hist(samples, bins=30, density=True, alpha=0.6, label='Samples')
plt.plot(x, pdf, 'r-', linewidth=2, label='True PDF')
plt.title('Sampling from Normal Distribution')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Optimization example
from scipy.optimize import minimize

# Function to minimize: f(x) = (x-3)^2 + 5
def objective(x):
    return (x - 3)**2 + 5

# Minimize starting from x=0
result = minimize(objective, x0=0)

print("Minimum found at x =", result.x[0])
print("Minimum value f(x) =", result.fun)

# Visualize
x = np.linspace(-2, 8, 100)
y = objective(x)

plt.figure(figsize=(10, 6))
plt.plot(x, y, linewidth=2)
plt.plot(result.x[0], result.fun, 'r*', markersize=20, label='Minimum')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Function Optimization')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

5. scikit-learn: Machine Learning Library¶

Why scikit-learn? It is the industry-standard library for classical machine learning, providing a consistent API across dozens of algorithms for classification, regression, clustering, and dimensionality reduction. The unified .fit() / .predict() / .score() interface means that once you learn one model, you can swap in any other with minimal code changes.

The ML workflow in four lines: scikit-learn encodes the canonical ML pipeline – split your data with train_test_split, train with model.fit(X_train, y_train), predict with model.predict(X_test), and evaluate with accuracy_score. Every production ML project, no matter how complex, follows this same fundamental pattern.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                          n_redundant=5, random_state=42)

print("Dataset shape:")
print("X:", X.shape)
print("y:", y.shape)
print("Class distribution:", np.bincount(y))

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Training set:", X_train.shape)
print("Test set:", X_test.shape)

# Train a model
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:")
print(cm)

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

Quick Reference Cheat Sheet¶

NumPy¶

np.array([1, 2, 3])           # Create array
np.zeros((3, 4))              # Array of zeros
np.random.randn(100)          # Random normal
arr.shape                      # Dimensions
arr @ arr2                     # Matrix multiplication
np.mean(arr)                   # Average
np.linalg.inv(A)              # Matrix inverse

Matplotlib¶

plt.plot(x, y)                # Line plot
plt.scatter(x, y)             # Scatter plot
plt.hist(data, bins=30)       # Histogram
plt.xlabel('label')           # Axis labels
plt.legend()                  # Show legend
plt.show()                    # Display plot

Seaborn¶

sns.histplot(data, kde=True)  # Histogram with density
sns.heatmap(matrix)           # Heatmap
sns.set_style('darkgrid')     # Set style

SciPy¶

stats.norm(0, 1)              # Normal distribution
dist.pdf(x)                   # Probability density
dist.cdf(x)                   # Cumulative probability
optimize.minimize(f, x0)      # Find minimum

scikit-learn¶

train_test_split(X, y)        # Split data
model.fit(X_train, y_train)   # Train model
model.predict(X_test)         # Make predictions
accuracy_score(y_true, y_pred) # Evaluate

Practice Exercise¶

The best way to internalize these libraries is to combine them in a realistic workflow. The exercise below walks through a complete mini-project: generate synthetic data with NumPy, visualize it with Matplotlib, fit a linear model with scikit-learn, and plot the predictions. This mirrors the exact sequence you would follow when tackling a Kaggle competition or prototyping a new model at work.

# Exercise: Create a synthetic dataset, visualize it, and fit a simple model

# 1. Generate data: y = 2x + 3 + noise
np.random.seed(42)
X_exercise = np.random.randn(100, 1)
y_exercise = 2 * X_exercise.squeeze() + 3 + np.random.randn(100) * 0.5

# 2. Visualize with scatter plot
# Your code here

# 3. Fit a linear model using scikit-learn
# from sklearn.linear_model import LinearRegression
# Your code here

# 4. Plot predictions vs actual
# Your code here

Bonus: Next Steps Examples¶

The cells below preview more advanced tools and techniques you will encounter as you progress through the curriculum. Each example is self-contained and designed to give you a working taste of what lies ahead – from the mathematical foundations in the upcoming notebooks to real-world dataset exploration, Pandas data wrangling, and deep learning with PyTorch.

1. Preview: Math Concepts (Notebooks 01-04)¶

The next four notebooks cover the mathematical pillars of machine learning: linear algebra (eigenvalues, matrix decompositions), calculus (derivatives, gradients), probability (Bayes’ theorem, distributions), and optimization (gradient descent). The snippets below give you a hands-on preview – eigenvalue decomposition reveals the principal axes of data variance, gradient descent shows how models learn iteratively, and Bayes’ theorem demonstrates how prior beliefs update with evidence.

# Linear Algebra Preview: Eigenvalues & Eigenvectors
# (Deep dive in notebook 01_linear_algebra.ipynb)

A = np.array([[4, 2],
              [1, 3]])

eigenvalues, eigenvectors = np.linalg.eig(A)

print("Matrix A:")
print(A)
print("\nEigenvalues:", eigenvalues)
print("\nEigenvectors:")
print(eigenvectors)

# Visualize eigenvectors
plt.figure(figsize=(8, 8))
plt.quiver(0, 0, eigenvectors[0, 0], eigenvectors[1, 0], 
           angles='xy', scale_units='xy', scale=1, color='r', width=0.006, label='Eigenvector 1')
plt.quiver(0, 0, eigenvectors[0, 1], eigenvectors[1, 1], 
           angles='xy', scale_units='xy', scale=1, color='b', width=0.006, label='Eigenvector 2')
plt.xlim(-2, 2)
plt.ylim(-2, 2)
plt.axhline(0, color='gray', linewidth=0.5)
plt.axvline(0, color='gray', linewidth=0.5)
plt.grid(True, alpha=0.3)
plt.legend()
plt.title('Eigenvectors of Matrix A')
plt.axis('equal')
plt.show()

# Calculus Preview: Gradient Descent
# (Deep dive in notebook 02_calculus.ipynb)

def f(x):
    """Function: f(x) = x^2 + 2x + 1"""
    return x**2 + 2*x + 1

def gradient(x):
    """Derivative: f'(x) = 2x + 2"""
    return 2*x + 2

# Gradient descent
x = 5.0  # Start point
learning_rate = 0.1
history = [x]

for i in range(20):
    grad = gradient(x)
    x = x - learning_rate * grad  # Update rule
    history.append(x)

# Visualize
x_plot = np.linspace(-6, 6, 100)
y_plot = f(x_plot)

plt.figure(figsize=(10, 6))
plt.plot(x_plot, y_plot, 'b-', linewidth=2, label='f(x) = x² + 2x + 1')
plt.plot(history, [f(x) for x in history], 'ro-', alpha=0.6, label='Gradient Descent Path')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Gradient Descent Optimization')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Minimum found at x = {x:.4f}")
print(f"Minimum value f(x) = {f(x):.4f}")

# Probability Preview: Bayes' Theorem
# (Deep dive in notebook 03_probability.ipynb)

# Example: Medical test accuracy
# P(Disease) = 0.01 (1% of population has disease)
# P(Positive|Disease) = 0.99 (99% sensitivity)
# P(Positive|No Disease) = 0.05 (5% false positive rate)

# Question: If you test positive, what's the probability you have the disease?

P_disease = 0.01
P_no_disease = 0.99
P_pos_given_disease = 0.99
P_pos_given_no_disease = 0.05

# Bayes' Theorem
P_pos = P_pos_given_disease * P_disease + P_pos_given_no_disease * P_no_disease
P_disease_given_pos = (P_pos_given_disease * P_disease) / P_pos

print("Medical Test Example:")
print(f"Prior probability of disease: {P_disease*100:.1f}%")
print(f"Test sensitivity: {P_pos_given_disease*100:.1f}%")
print(f"False positive rate: {P_pos_given_no_disease*100:.1f}%")
print(f"\nIf you test positive:")
print(f"Posterior probability of disease: {P_disease_given_pos*100:.1f}%")
print("\n⚠️ Even with 99% sensitivity, only ~16.6% of positive tests indicate disease!")
print("This is due to the low base rate (1% prevalence).")

2. Working with Real Datasets (Kaggle Style)¶

Moving from synthetic data to real-world datasets is a critical step. Here we use scikit-learn’s built-in Iris dataset – 150 samples of three flower species described by four measurements – as a stand-in for the kind of tabular data you would download from Kaggle. The workflow covers loading data, performing exploratory data analysis (EDA) with histograms and scatter plots, training a Random Forest classifier, and evaluating with a full classification report including precision, recall, and F1-score.

# Using sklearn's built-in datasets (similar to Kaggle datasets)
from sklearn.datasets import load_iris, load_wine, load_breast_cancer

# Load famous Iris dataset
iris = load_iris()

print("Dataset: Iris Flowers")
print("=" * 50)
print(f"Samples: {iris.data.shape[0]}")
print(f"Features: {iris.data.shape[1]}")
print(f"Feature names: {iris.feature_names}")
print(f"Target classes: {iris.target_names}")
print(f"\nFirst 5 samples:")
print(iris.data[:5])
print(f"\nFirst 5 labels: {iris.target[:5]}")

# Exploratory Data Analysis (EDA)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Feature distribution
axes[0].hist(iris.data[:, 0], bins=20, alpha=0.7, label='Sepal Length')
axes[0].hist(iris.data[:, 1], bins=20, alpha=0.7, label='Sepal Width')
axes[0].set_xlabel('Measurement (cm)')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Feature Distributions')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Scatter plot: Sepal Length vs Sepal Width
for i, target_name in enumerate(iris.target_names):
    mask = iris.target == i
    axes[1].scatter(iris.data[mask, 0], iris.data[mask, 1], 
                   label=target_name, alpha=0.6, s=50)

axes[1].set_xlabel(iris.feature_names[0])
axes[1].set_ylabel(iris.feature_names[1])
axes[1].set_title('Sepal Dimensions by Species')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Train a simple model on the dataset
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.3, random_state=42
)

# Train Random Forest
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Predict
y_pred = rf_model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy*100:.2f}%")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Feature importance
feature_importance = pd.DataFrame({
    'feature': iris.feature_names,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nFeature Importance:")
print(feature_importance)

3. Pandas for Data Manipulation¶

Pandas is the de facto standard for working with tabular data in Python. Its DataFrame object gives you labeled rows and columns, SQL-like groupby operations, and seamless integration with both NumPy arrays and Seaborn plots. In practice, you will spend more time cleaning, filtering, and feature-engineering data in Pandas than training models – a well-prepared dataset often matters more than a sophisticated algorithm.

import pandas as pd

# Create DataFrame from Iris dataset
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

print("DataFrame Info:")
print(df.info())
print("\nFirst 5 rows:")
print(df.head())
print("\nSummary statistics:")
print(df.describe())

# Common pandas operations

# Filter rows
setosa = df[df['species'] == 'setosa']
print("Setosa samples:", len(setosa))

# Group by and aggregate
species_stats = df.groupby('species').agg({
    'sepal length (cm)': ['mean', 'std'],
    'petal length (cm)': ['mean', 'std']
})
print("\nStatistics by species:")
print(species_stats)

# Create new columns
df['sepal_area'] = df['sepal length (cm)'] * df['sepal width (cm)']
df['petal_area'] = df['petal length (cm)'] * df['petal width (cm)']

print("\nDataFrame with new features:")
print(df[['sepal_area', 'petal_area', 'species']].head())

# Pandas + Seaborn: Powerful combination
plt.figure(figsize=(12, 5))

# Boxplot
plt.subplot(1, 2, 1)
sns.boxplot(data=df, x='species', y='petal length (cm)')
plt.title('Petal Length Distribution by Species')
plt.xticks(rotation=45)

# Violin plot
plt.subplot(1, 2, 2)
sns.violinplot(data=df, x='species', y='sepal length (cm)')
plt.title('Sepal Length Distribution by Species')
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

4. Deep Learning with PyTorch¶

PyTorch is the leading framework for deep learning research and increasingly for production systems. Unlike scikit-learn’s high-level API, PyTorch gives you explicit control over every detail: tensors (GPU-accelerated arrays), automatic differentiation (loss.backward() computes all gradients via the chain rule), and modular nn.Module layers you compose into custom architectures. The example below builds a small feed-forward network for Iris classification, illustrating the core training loop of forward pass, loss computation, backpropagation, and parameter update.

# PyTorch basics
try:
    import torch
    import torch.nn as nn
    
    pytorch_available = True
    print("✅ PyTorch is installed!")
    print(f"Version: {torch.__version__}")
    
except ImportError:
    pytorch_available = False
    print("❌ PyTorch not installed.")
    print("Install with: pip install torch torchvision")
    print("\nShowing example code (won't run without PyTorch):")

if pytorch_available:
    # Create tensors (PyTorch's version of arrays)
    x = torch.tensor([1.0, 2.0, 3.0])
    y = torch.tensor([4.0, 5.0, 6.0])
    
    print("Tensor x:", x)
    print("Tensor y:", y)
    print("x + y:", x + y)
    print("x * y:", x * y)
    
    # Matrix operations
    A = torch.randn(3, 3)
    B = torch.randn(3, 3)
    
    print("\nMatrix A:")
    print(A)
    print("\nA @ B (matrix multiplication):")
    print(A @ B)
else:
    print("""
# PyTorch tensor example (requires installation):
import torch

x = torch.tensor([1.0, 2.0, 3.0])
y = torch.tensor([4.0, 5.0, 6.0])
print("x + y:", x + y)
    """)

if pytorch_available:
    # Simple neural network for Iris classification
    class IrisNet(nn.Module):
        def __init__(self):
            super(IrisNet, self).__init__()
            self.fc1 = nn.Linear(4, 16)  # 4 input features -> 16 hidden units
            self.fc2 = nn.Linear(16, 8)  # 16 -> 8
            self.fc3 = nn.Linear(8, 3)   # 8 -> 3 output classes
            self.relu = nn.ReLU()
            
        def forward(self, x):
            x = self.relu(self.fc1(x))
            x = self.relu(self.fc2(x))
            x = self.fc3(x)
            return x
    
    # Create model
    model = IrisNet()
    print("Neural Network Architecture:")
    print(model)
    
    # Convert data to PyTorch tensors
    X_train_torch = torch.FloatTensor(X_train)
    y_train_torch = torch.LongTensor(y_train)
    X_test_torch = torch.FloatTensor(X_test)
    
    # Training setup
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    
    # Training loop
    epochs = 100
    losses = []
    
    for epoch in range(epochs):
        # Forward pass
        outputs = model(X_train_torch)
        loss = criterion(outputs, y_train_torch)
        
        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        losses.append(loss.item())
        
        if (epoch + 1) % 20 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
    
    # Evaluate
    with torch.no_grad():
        outputs = model(X_test_torch)
        _, predicted = torch.max(outputs, 1)
        accuracy = (predicted.numpy() == y_test).mean()
        print(f'\nTest Accuracy: {accuracy*100:.2f}%')
    
    # Plot training loss
    plt.figure(figsize=(10, 5))
    plt.plot(losses)
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Training Loss Over Time')
    plt.grid(True, alpha=0.3)
    plt.show()
    
else:
    print("""
# PyTorch Neural Network example (requires installation):

class IrisNet(nn.Module):
    def __init__(self):
        super(IrisNet, self).__init__()
        self.fc1 = nn.Linear(4, 16)
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 3)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = IrisNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop...
    """)

🎓 Your Learning Journey¶

You’ve now seen:

✅ Math Preview: Eigenvalues, gradient descent, Bayes’ theorem
✅ Real Datasets: Loading, exploring, and modeling Iris dataset
✅ Pandas: DataFrames, grouping, filtering, feature engineering
✅ PyTorch: Tensors, neural networks, training loops

Recommended Next Steps:¶

📚 Phase 2: Mathematics

01_linear_algebra.ipynb - Vectors, matrices, eigenvalues
02_calculus.ipynb - Derivatives, gradients, optimization
03_probability.ipynb - Distributions, Bayes’ theorem
04_statistics.ipynb - Hypothesis testing, confidence intervals

🔥 Phase 3-6: Core ML Skills

Tokenization (Phase 3)
Embeddings (Phase 4)
Neural Networks (Phase 5)
Vector Databases (Phase 6)

🚀 Phase 7-9: Production AI

RAG Systems (Phase 7)
MLOps (Phase 8)
Specializations (Phase 9)

Practice Resources:¶

Datasets:

Kaggle - Thousands of real-world datasets
UCI ML Repository - Classic ML datasets
sklearn.datasets - Built-in datasets

Practice Platforms:

Kaggle Competitions - Real ML problems
LeetCode - Coding practice
HackerRank - AI challenges

Next notebook: Open 01_linear_algebra.ipynb to continue learning! 🎯