# Install required packages
!pip install -q streamlit pandas plotly scikit-learn

Streamlit Basics: Python Scripts that Become Web AppsΒΆ

Streamlit takes a fundamentally different approach from Gradio: instead of wrapping a function, you write a top-to-bottom Python script that Streamlit re-executes on every user interaction. Each st.write(), st.slider(), or st.button() call both renders a UI widget and returns its current value, creating an implicit reactive data flow. The script written below with %%writefile can be launched from the terminal with streamlit run streamlit_basics.py, which starts a local server and opens the app in your browser.

Streamlit vs. Gradio tradeoffs: Streamlit excels at data dashboards with rich layouts (columns, tabs, expanders, sidebars) and interactive exploration where multiple widgets control a single visualization. Gradio excels at single-function ML demos with typed input/output contracts. Streamlit’s re-execution model means every widget interaction reruns the entire script – which is why caching (covered in Part 4) is essential for any operation that takes more than a fraction of a second.

%%writefile streamlit_basics.py
import streamlit as st
import pandas as pd
import numpy as np

# Title and headers
st.title("πŸš€ My First Streamlit App")
st.header("Welcome to Streamlit!")
st.subheader("Building data apps made easy")

# Text elements
st.write("This is a simple text using st.write()")
st.markdown("**Bold** and *italic* text with Markdown")
st.caption("This is a caption")

# Display data
st.write("### Sample DataFrame")
df = pd.DataFrame({
    'Column A': [1, 2, 3, 4],
    'Column B': [10, 20, 30, 40]
})
st.dataframe(df)  # Interactive table
# st.table(df)     # Static table

# Metrics
col1, col2, col3 = st.columns(3)
col1.metric("Temperature", "70 Β°F", "1.2 Β°F")
col2.metric("Wind", "9 mph", "-8%")
col3.metric("Humidity", "86%", "4%")

# Input widgets
st.write("### Input Widgets")
text_input = st.text_input("Enter your name", "John Doe")
st.write(f"Hello, {text_input}!")

number = st.number_input("Pick a number", min_value=0, max_value=100, value=50)
st.write(f"You picked: {number}")

slider_val = st.slider("Select a range", 0, 100, 25)
st.write(f"Slider value: {slider_val}")

# Selectbox and multiselect
option = st.selectbox("Choose an option", ['Option 1', 'Option 2', 'Option 3'])
st.write(f"You selected: {option}")

options = st.multiselect(
    "Choose multiple",
    ['A', 'B', 'C', 'D'],
    default=['A', 'B']
)
st.write(f"You selected: {options}")

# Checkbox and radio
agree = st.checkbox("I agree to the terms")
if agree:
    st.write("Great! You agreed.")

choice = st.radio("Pick one", ['Choice 1', 'Choice 2', 'Choice 3'])
st.write(f"You chose: {choice}")

# Button
if st.button("Click me!"):
    st.write("Button was clicked!")
    st.balloons()  # Fun celebration!

# File uploader
uploaded_file = st.file_uploader("Upload a file", type=['csv', 'txt'])
if uploaded_file is not None:
    st.write(f"Filename: {uploaded_file.name}")

print("To run this app: streamlit run streamlit_basics.py")

Running the AppΒΆ

To run the above app:

streamlit run streamlit_basics.py

This will open a browser window with your app!

ML Model Deployment App: Interactive Classification with Sidebar ControlsΒΆ

Deploying a trained model as a Streamlit app follows a standard pattern: train once (cached), expose model inputs as sidebar widgets, and display predictions in the main panel. The sidebar (st.sidebar) keeps input controls visually separated from results, and st.set_page_config(layout="wide") uses the full browser width for side-by-side columns. The st.columns([1, 2]) call creates unequal-width columns, placing compact prediction text on the left and detailed visualizations on the right.

Key architectural pattern: @st.cache_resource wraps the model training function so it runs once and persists across reruns – without this decorator, the Random Forest would retrain on every slider change (hundreds of milliseconds each time). The st.metric() component displays the model accuracy with a clean KPI card format, and st.expander() hides the full dataset behind a collapsible section to keep the primary interface focused on predictions.

%%writefile ml_model_app.py
import streamlit as st
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import plotly.express as px

# Page config
st.set_page_config(
    page_title="Iris Classifier",
    page_icon="🌸",
    layout="wide"
)

# Title
st.title("🌸 Iris Flower Classifier")
st.markdown("Predict iris species based on flower measurements")

# Sidebar for user input
st.sidebar.header("Input Features")

def user_input_features():
    sepal_length = st.sidebar.slider('Sepal Length (cm)', 4.0, 8.0, 5.5)
    sepal_width = st.sidebar.slider('Sepal Width (cm)', 2.0, 4.5, 3.0)
    petal_length = st.sidebar.slider('Petal Length (cm)', 1.0, 7.0, 4.0)
    petal_width = st.sidebar.slider('Petal Width (cm)', 0.1, 2.5, 1.3)
    
    data = {
        'sepal_length': sepal_length,
        'sepal_width': sepal_width,
        'petal_length': petal_length,
        'petal_width': petal_width
    }
    return pd.DataFrame(data, index=[0])

# Get user input
input_df = user_input_features()

# Load and prepare data
@st.cache_data
def load_data():
    iris = load_iris()
    X = pd.DataFrame(iris.data, columns=iris.feature_names)
    y = pd.Series(iris.target, name='species')
    y = y.map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})
    return X, y, iris.target_names

X, y, target_names = load_data()

# Train model
@st.cache_resource
def train_model(X, y):
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    accuracy = model.score(X_test, y_test)
    return model, accuracy

model, accuracy = train_model(X, y)

# Main content
col1, col2 = st.columns([1, 2])

with col1:
    st.subheader("Your Input")
    st.write(input_df)
    
    # Make prediction
    prediction = model.predict(input_df)
    prediction_proba = model.predict_proba(input_df)
    
    st.subheader("Prediction")
    st.success(f"**{prediction[0].upper()}**")
    
    st.subheader("Prediction Probability")
    proba_df = pd.DataFrame(
        prediction_proba,
        columns=['Setosa', 'Versicolor', 'Virginica']
    )
    st.dataframe(proba_df.style.highlight_max(axis=1))
    
    st.info(f"Model Accuracy: {accuracy:.2%}")

with col2:
    st.subheader("Prediction Confidence")
    
    # Create probability bar chart
    proba_long = pd.DataFrame({
        'Species': ['Setosa', 'Versicolor', 'Virginica'],
        'Probability': prediction_proba[0]
    })
    
    fig = px.bar(
        proba_long,
        x='Species',
        y='Probability',
        color='Species',
        title='Prediction Probabilities'
    )
    st.plotly_chart(fig, use_container_width=True)
    
    # Feature importance
    st.subheader("Feature Importance")
    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': model.feature_importances_
    }).sort_values('Importance', ascending=False)
    
    fig2 = px.bar(
        importance_df,
        x='Importance',
        y='Feature',
        orientation='h',
        title='Feature Importance in Prediction'
    )
    st.plotly_chart(fig2, use_container_width=True)

# Expandable section with dataset info
with st.expander("πŸ“Š View Dataset"):
    st.write("### Iris Dataset")
    df_combined = pd.concat([X, y], axis=1)
    st.dataframe(df_combined)
    
    st.write("### Dataset Statistics")
    st.write(df_combined.describe())

print("To run: streamlit run ml_model_app.py")

Session State: Persisting Data Across RerunsΒΆ

Streamlit’s rerun-on-interaction model means local variables reset every time a user clicks a button or moves a slider. st.session_state is a dictionary-like object that persists across reruns within a single browser session, enabling counters, form submissions, multi-step workflows, and shopping-cart-like accumulation patterns. Without session state, a counter button would always show β€œ1” because the variable resets before each render.

How it works: st.session_state is stored server-side per WebSocket connection. The pattern if 'key' not in st.session_state: st.session_state.key = default initializes values on first load, and subsequent reruns access the stored value. st.rerun() forces an immediate script rerun after a state mutation, ensuring the UI reflects the latest state. For form inputs, st.form() batches multiple widget changes into a single submission, avoiding the β€œrerun per keystroke” problem and creating a more traditional form experience.

%%writefile session_state_app.py
import streamlit as st
import pandas as pd

st.title("πŸ“ Session State Demo")

# Initialize session state
if 'counter' not in st.session_state:
    st.session_state.counter = 0

if 'todos' not in st.session_state:
    st.session_state.todos = []

# Counter example
st.header("Counter Example")
st.write(f"Current count: {st.session_state.counter}")

col1, col2, col3 = st.columns(3)
with col1:
    if st.button("Increment"):
        st.session_state.counter += 1
        st.rerun()

with col2:
    if st.button("Decrement"):
        st.session_state.counter -= 1
        st.rerun()

with col3:
    if st.button("Reset"):
        st.session_state.counter = 0
        st.rerun()

# Todo list example
st.header("Todo List Example")

with st.form("todo_form"):
    todo_input = st.text_input("Add a todo item")
    submitted = st.form_submit_button("Add")
    
    if submitted and todo_input:
        st.session_state.todos.append({
            'task': todo_input,
            'completed': False
        })

# Display todos
if st.session_state.todos:
    st.write("### Your Todos:")
    for i, todo in enumerate(st.session_state.todos):
        col1, col2, col3 = st.columns([0.1, 0.7, 0.2])
        
        with col1:
            if st.checkbox("", key=f"check_{i}", value=todo['completed']):
                st.session_state.todos[i]['completed'] = True
        
        with col2:
            if todo['completed']:
                st.markdown(f"~~{todo['task']}~~")
            else:
                st.write(todo['task'])
        
        with col3:
            if st.button("Delete", key=f"del_{i}"):
                st.session_state.todos.pop(i)
                st.rerun()
    
    if st.button("Clear All Completed"):
        st.session_state.todos = [
            t for t in st.session_state.todos if not t['completed']
        ]
        st.rerun()
else:
    st.info("No todos yet! Add one above.")

# Show session state (for debugging)
with st.expander("πŸ” View Session State"):
    st.write(st.session_state)

print("To run: streamlit run session_state_app.py")

Caching for Performance: Avoiding Redundant ComputationΒΆ

Caching is the single most important optimization for Streamlit apps, because every user interaction triggers a full script rerun. Streamlit provides two caching decorators: @st.cache_data serializes return values (via pickling) and returns a fresh copy on cache hits – use this for DataFrames, arrays, and computation results where mutations should not affect the cache. @st.cache_resource returns the same object reference without copying – use this for ML models, database connections, and large objects where copying would be wasteful or break functionality.

Cache invalidation: both decorators use function arguments and source code as cache keys. If you change a function’s code, the cache automatically invalidates. You can also set ttl=3600 (time-to-live in seconds) for data that should refresh periodically, or call st.cache_data.clear() programmatically. For ML apps, the typical pattern is: cache data loading with @st.cache_data, cache model loading with @st.cache_resource, and leave prediction calls uncached since they depend on user input and are typically fast.

%%writefile caching_demo.py
import streamlit as st
import pandas as pd
import numpy as np
import time

st.title("⚑ Caching Demo")

# Without caching (slow)
def load_data_slow():
    """Simulate slow data loading without caching."""
    time.sleep(3)  # Simulate slow operation
    return pd.DataFrame({
        'A': np.random.randn(1000),
        'B': np.random.randn(1000)
    })

# With caching (fast)
@st.cache_data
def load_data_cached():
    """Simulate slow data loading WITH caching."""
    time.sleep(3)  # Simulate slow operation
    return pd.DataFrame({
        'A': np.random.randn(1000),
        'B': np.random.randn(1000)
    })

# Cache for machine learning models
@st.cache_resource
def load_model():
    """Simulate loading a ML model (use cache_resource for models)."""
    time.sleep(2)
    from sklearn.ensemble import RandomForestClassifier
    return RandomForestClassifier(n_estimators=100)

# Demo
st.header("Without Caching")
if st.button("Load Data (No Cache)"):
    with st.spinner("Loading... (3 seconds)"):
        start = time.time()
        df = load_data_slow()
        elapsed = time.time() - start
    st.success(f"Loaded in {elapsed:.2f} seconds")
    st.dataframe(df.head())
    st.warning("Click again - it will take 3 seconds every time!")

st.header("With Caching")
if st.button("Load Data (Cached)"):
    with st.spinner("Loading... (3 seconds first time, instant after)"):
        start = time.time()
        df = load_data_cached()
        elapsed = time.time() - start
    
    if elapsed < 1:
        st.success(f"Loaded from cache in {elapsed:.3f} seconds! ⚑")
    else:
        st.success(f"Loaded in {elapsed:.2f} seconds (cached for next time)")
    
    st.dataframe(df.head())
    st.info("Click again - it will be instant!")

st.header("Caching ML Models")
if st.button("Load Model"):
    with st.spinner("Loading model..."):
        start = time.time()
        model = load_model()
        elapsed = time.time() - start
    
    if elapsed < 1:
        st.success(f"Model loaded from cache in {elapsed:.3f} seconds!")
    else:
        st.success(f"Model loaded in {elapsed:.2f} seconds")
    
    st.write(f"Model: {type(model).__name__}")
    st.write(f"Parameters: {model.n_estimators} estimators")

# Caching best practices
with st.expander("πŸ“š Caching Best Practices"):
    st.markdown("""
    ### When to use @st.cache_data:
    - Loading data from files/databases
    - Expensive computations
    - API calls
    - Data transformations
    
    ### When to use @st.cache_resource:
    - Machine learning models
    - Database connections
    - Large objects that should persist
    
    ### Key differences:
    - `cache_data`: Creates a new copy each time (safe for mutable data)
    - `cache_resource`: Returns the same object (for models, connections)
    
    ### Clear cache:
    - Press 'C' in the app
    - Or use st.cache_data.clear() or st.cache_resource.clear()
    """)

print("To run: streamlit run caching_demo.py")

Interactive Data Dashboard: Filters, KPIs, and Multi-Tab VisualizationΒΆ

A complete data dashboard brings together all Streamlit concepts: sidebar filters for data selection, st.metric() KPI cards for at-a-glance summaries, st.tabs() for organizing different visualization perspectives, and st.download_button() for data export. The dashboard below uses Plotly for interactive charts (zoom, hover, pan) instead of Matplotlib’s static images, which is the standard choice for production Streamlit apps.

Architecture pattern: the sidebar controls a global filtered_df variable that all downstream visualizations reference. This β€œfilter once, display everywhere” pattern ensures consistency across tabs and prevents the common bug where different charts show data from different filter states. The st.plotly_chart(fig, use_container_width=True) call makes charts responsive to the browser width, and custom CSS via st.markdown() with unsafe_allow_html=True enables visual polish beyond Streamlit’s default styling.

%%writefile data_dashboard.py
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

# Page config
st.set_page_config(
    page_title="Data Dashboard",
    page_icon="πŸ“Š",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Custom CSS
st.markdown("""
    <style>
    .main { padding: 0rem 1rem; }
    .stMetric { background-color: #f0f2f6; padding: 10px; border-radius: 5px; }
    </style>
""", unsafe_allow_html=True)

# Title
st.title("πŸ“Š Interactive Data Dashboard")

# Load data
@st.cache_data
def load_sample_data():
    """Generate sample sales data."""
    np.random.seed(42)
    dates = pd.date_range('2023-01-01', '2023-12-31', freq='D')
    
    df = pd.DataFrame({
        'Date': dates,
        'Sales': np.random.randint(100, 1000, len(dates)) + 
                 np.sin(np.arange(len(dates)) * 2 * np.pi / 365) * 200,
        'Customers': np.random.randint(10, 100, len(dates)),
        'Category': np.random.choice(['Electronics', 'Clothing', 'Food', 'Books'], len(dates)),
        'Region': np.random.choice(['North', 'South', 'East', 'West'], len(dates))
    })
    
    df['Revenue'] = df['Sales'] * np.random.uniform(10, 50, len(dates))
    return df

# Load data
df = load_sample_data()

# Sidebar filters
st.sidebar.header("Filters")

# Date range
date_range = st.sidebar.date_input(
    "Select Date Range",
    value=[df['Date'].min(), df['Date'].max()],
    min_value=df['Date'].min(),
    max_value=df['Date'].max()
)

# Category filter
categories = st.sidebar.multiselect(
    "Select Categories",
    options=df['Category'].unique(),
    default=df['Category'].unique()
)

# Region filter
regions = st.sidebar.multiselect(
    "Select Regions",
    options=df['Region'].unique(),
    default=df['Region'].unique()
)

# Filter data
if len(date_range) == 2:
    mask = (
        (df['Date'] >= pd.to_datetime(date_range[0])) &
        (df['Date'] <= pd.to_datetime(date_range[1])) &
        (df['Category'].isin(categories)) &
        (df['Region'].isin(regions))
    )
    filtered_df = df[mask]
else:
    filtered_df = df

# KPIs
st.header("Key Metrics")
col1, col2, col3, col4 = st.columns(4)

with col1:
    total_revenue = filtered_df['Revenue'].sum()
    st.metric("Total Revenue", f"${total_revenue:,.0f}")

with col2:
    total_sales = filtered_df['Sales'].sum()
    st.metric("Total Sales", f"{total_sales:,.0f}")

with col3:
    avg_customers = filtered_df['Customers'].mean()
    st.metric("Avg Customers/Day", f"{avg_customers:.0f}")

with col4:
    total_days = len(filtered_df)
    st.metric("Days in Range", f"{total_days}")

# Charts
tab1, tab2, tab3, tab4 = st.tabs(["πŸ“ˆ Trends", "πŸ“Š Distribution", "πŸ—ΊοΈ Regional", "πŸ“‹ Data"])

with tab1:
    st.subheader("Sales and Revenue Over Time")
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(
        x=filtered_df['Date'],
        y=filtered_df['Sales'],
        name='Sales',
        line=dict(color='blue')
    ))
    
    fig.add_trace(go.Scatter(
        x=filtered_df['Date'],
        y=filtered_df['Revenue'],
        name='Revenue',
        yaxis='y2',
        line=dict(color='green')
    ))
    
    fig.update_layout(
        yaxis=dict(title='Sales'),
        yaxis2=dict(title='Revenue', overlaying='y', side='right'),
        hovermode='x unified',
        height=400
    )
    
    st.plotly_chart(fig, use_container_width=True)

with tab2:
    col1, col2 = st.columns(2)
    
    with col1:
        st.subheader("Sales by Category")
        category_sales = filtered_df.groupby('Category')['Sales'].sum().reset_index()
        fig = px.pie(
            category_sales,
            values='Sales',
            names='Category',
            title='Sales Distribution by Category'
        )
        st.plotly_chart(fig, use_container_width=True)
    
    with col2:
        st.subheader("Revenue by Category")
        category_revenue = filtered_df.groupby('Category')['Revenue'].sum().reset_index()
        fig = px.bar(
            category_revenue,
            x='Category',
            y='Revenue',
            title='Revenue by Category'
        )
        st.plotly_chart(fig, use_container_width=True)

with tab3:
    st.subheader("Regional Performance")
    
    regional_data = filtered_df.groupby('Region').agg({
        'Sales': 'sum',
        'Revenue': 'sum',
        'Customers': 'mean'
    }).reset_index()
    
    fig = px.bar(
        regional_data,
        x='Region',
        y=['Sales', 'Revenue'],
        barmode='group',
        title='Sales and Revenue by Region'
    )
    st.plotly_chart(fig, use_container_width=True)
    
    # Regional metrics table
    st.dataframe(
        regional_data.style.format({
            'Sales': '{:.0f}',
            'Revenue': '${:.2f}',
            'Customers': '{:.0f}'
        }),
        use_container_width=True
    )

with tab4:
    st.subheader("Raw Data")
    
    # Search
    search = st.text_input("πŸ” Search in data")
    if search:
        mask = filtered_df.astype(str).apply(
            lambda x: x.str.contains(search, case=False)
        ).any(axis=1)
        display_df = filtered_df[mask]
    else:
        display_df = filtered_df
    
    st.dataframe(
        display_df,
        use_container_width=True,
        height=400
    )
    
    # Download button
    csv = display_df.to_csv(index=False)
    st.download_button(
        label="πŸ“₯ Download Data as CSV",
        data=csv,
        file_name='filtered_data.csv',
        mime='text/csv'
    )

# Footer
st.markdown("---")
st.caption("Built with Streamlit β€’ Data is randomly generated for demonstration")

print("To run: streamlit run data_dashboard.py")

🎯 Key Takeaways¢

  1. Streamlit is Python-native - Write apps in pure Python

  2. Automatic reruns - UI updates when code changes

  3. Session state - Maintain state across reruns

  4. Caching is critical - Use @st.cache_data and @st.cache_resource

  5. Rich layouts - Columns, tabs, expanders for organization

  6. Interactive widgets - Easy user input collection

πŸ“ Practice ExercisesΒΆ

  1. Build a Stock Price Dashboard

    • Fetch real stock data (yfinance)

    • Show price charts

    • Calculate moving averages

    • Display key metrics

  2. Create a Model Comparison App

    • Train multiple ML models

    • Compare accuracies

    • Show confusion matrices

    • Allow parameter tuning

  3. Build a Text Analysis Dashboard

    • Upload text files

    • Show word frequencies

    • Sentiment analysis

    • Generate word clouds

πŸ”— ResourcesΒΆ

Next: Notebook 3 - Hugging Face Spaces