Run this notebook: Open in Colab Open in Kaggle

# Install required packages
!pip install -q streamlit pandas plotly scikit-learn

Streamlit Basics: Python Scripts that Become Web Apps¶

Streamlit takes a fundamentally different approach from Gradio: instead of wrapping a function, you write a top-to-bottom Python script that Streamlit re-executes on every user interaction. Each st.write(), st.slider(), or st.button() call both renders a UI widget and returns its current value, creating an implicit reactive data flow. The script written below with %%writefile can be launched from the terminal with streamlit run streamlit_basics.py, which starts a local server and opens the app in your browser.

Streamlit vs. Gradio tradeoffs: Streamlit excels at data dashboards with rich layouts (columns, tabs, expanders, sidebars) and interactive exploration where multiple widgets control a single visualization. Gradio excels at single-function ML demos with typed input/output contracts. Streamlit’s re-execution model means every widget interaction reruns the entire script – which is why caching (covered in Part 4) is essential for any operation that takes more than a fraction of a second.

%%writefile streamlit_basics.py
import streamlit as st
import pandas as pd
import numpy as np

# Title and headers
st.title("🚀 My First Streamlit App")
st.header("Welcome to Streamlit!")
st.subheader("Building data apps made easy")

# Text elements
st.write("This is a simple text using st.write()")
st.markdown("**Bold** and *italic* text with Markdown")
st.caption("This is a caption")

# Display data
st.write("### Sample DataFrame")
df = pd.DataFrame({
    'Column A': [1, 2, 3, 4],
    'Column B': [10, 20, 30, 40]
})
st.dataframe(df)  # Interactive table
# st.table(df)     # Static table

# Metrics
col1, col2, col3 = st.columns(3)
col1.metric("Temperature", "70 °F", "1.2 °F")
col2.metric("Wind", "9 mph", "-8%")
col3.metric("Humidity", "86%", "4%")

# Input widgets
st.write("### Input Widgets")
text_input = st.text_input("Enter your name", "John Doe")
st.write(f"Hello, {text_input}!")

number = st.number_input("Pick a number", min_value=0, max_value=100, value=50)
st.write(f"You picked: {number}")

slider_val = st.slider("Select a range", 0, 100, 25)
st.write(f"Slider value: {slider_val}")

# Selectbox and multiselect
option = st.selectbox("Choose an option", ['Option 1', 'Option 2', 'Option 3'])
st.write(f"You selected: {option}")

options = st.multiselect(
    "Choose multiple",
    ['A', 'B', 'C', 'D'],
    default=['A', 'B']
)
st.write(f"You selected: {options}")

# Checkbox and radio
agree = st.checkbox("I agree to the terms")
if agree:
    st.write("Great! You agreed.")

choice = st.radio("Pick one", ['Choice 1', 'Choice 2', 'Choice 3'])
st.write(f"You chose: {choice}")

# Button
if st.button("Click me!"):
    st.write("Button was clicked!")
    st.balloons()  # Fun celebration!

# File uploader
uploaded_file = st.file_uploader("Upload a file", type=['csv', 'txt'])
if uploaded_file is not None:
    st.write(f"Filename: {uploaded_file.name}")

print("To run this app: streamlit run streamlit_basics.py")

Running the App¶

To run the above app:

streamlit run streamlit_basics.py

This will open a browser window with your app!

ML Model Deployment App: Interactive Classification with Sidebar Controls¶

Deploying a trained model as a Streamlit app follows a standard pattern: train once (cached), expose model inputs as sidebar widgets, and display predictions in the main panel. The sidebar (st.sidebar) keeps input controls visually separated from results, and st.set_page_config(layout="wide") uses the full browser width for side-by-side columns. The st.columns([1, 2]) call creates unequal-width columns, placing compact prediction text on the left and detailed visualizations on the right.

Key architectural pattern: @st.cache_resource wraps the model training function so it runs once and persists across reruns – without this decorator, the Random Forest would retrain on every slider change (hundreds of milliseconds each time). The st.metric() component displays the model accuracy with a clean KPI card format, and st.expander() hides the full dataset behind a collapsible section to keep the primary interface focused on predictions.

%%writefile ml_model_app.py
import streamlit as st
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import plotly.express as px

# Page config
st.set_page_config(
    page_title="Iris Classifier",
    page_icon="🌸",
    layout="wide"
)

# Title
st.title("🌸 Iris Flower Classifier")
st.markdown("Predict iris species based on flower measurements")

# Sidebar for user input
st.sidebar.header("Input Features")

def user_input_features():
    sepal_length = st.sidebar.slider('Sepal Length (cm)', 4.0, 8.0, 5.5)
    sepal_width = st.sidebar.slider('Sepal Width (cm)', 2.0, 4.5, 3.0)
    petal_length = st.sidebar.slider('Petal Length (cm)', 1.0, 7.0, 4.0)
    petal_width = st.sidebar.slider('Petal Width (cm)', 0.1, 2.5, 1.3)
    
    data = {
        'sepal_length': sepal_length,
        'sepal_width': sepal_width,
        'petal_length': petal_length,
        'petal_width': petal_width
    }
    return pd.DataFrame(data, index=[0])

# Get user input
input_df = user_input_features()

# Load and prepare data
@st.cache_data
def load_data():
    iris = load_iris()
    X = pd.DataFrame(iris.data, columns=iris.feature_names)
    y = pd.Series(iris.target, name='species')
    y = y.map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})
    return X, y, iris.target_names

X, y, target_names = load_data()

# Train model
@st.cache_resource
def train_model(X, y):
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    accuracy = model.score(X_test, y_test)
    return model, accuracy

model, accuracy = train_model(X, y)

# Main content
col1, col2 = st.columns([1, 2])

with col1:
    st.subheader("Your Input")
    st.write(input_df)
    
    # Make prediction
    prediction = model.predict(input_df)
    prediction_proba = model.predict_proba(input_df)
    
    st.subheader("Prediction")
    st.success(f"**{prediction[0].upper()}**")
    
    st.subheader("Prediction Probability")
    proba_df = pd.DataFrame(
        prediction_proba,
        columns=['Setosa', 'Versicolor', 'Virginica']
    )
    st.dataframe(proba_df.style.highlight_max(axis=1))
    
    st.info(f"Model Accuracy: {accuracy:.2%}")

with col2:
    st.subheader("Prediction Confidence")
    
    # Create probability bar chart
    proba_long = pd.DataFrame({
        'Species': ['Setosa', 'Versicolor', 'Virginica'],
        'Probability': prediction_proba[0]
    })
    
    fig = px.bar(
        proba_long,
        x='Species',
        y='Probability',
        color='Species',
        title='Prediction Probabilities'
    )
    st.plotly_chart(fig, use_container_width=True)
    
    # Feature importance
    st.subheader("Feature Importance")
    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': model.feature_importances_
    }).sort_values('Importance', ascending=False)
    
    fig2 = px.bar(
        importance_df,
        x='Importance',
        y='Feature',
        orientation='h',
        title='Feature Importance in Prediction'
    )
    st.plotly_chart(fig2, use_container_width=True)

# Expandable section with dataset info
with st.expander("📊 View Dataset"):
    st.write("### Iris Dataset")
    df_combined = pd.concat([X, y], axis=1)
    st.dataframe(df_combined)
    
    st.write("### Dataset Statistics")
    st.write(df_combined.describe())

print("To run: streamlit run ml_model_app.py")

Session State: Persisting Data Across Reruns¶

Streamlit’s rerun-on-interaction model means local variables reset every time a user clicks a button or moves a slider. st.session_state is a dictionary-like object that persists across reruns within a single browser session, enabling counters, form submissions, multi-step workflows, and shopping-cart-like accumulation patterns. Without session state, a counter button would always show “1” because the variable resets before each render.

How it works: st.session_state is stored server-side per WebSocket connection. The pattern if 'key' not in st.session_state: st.session_state.key = default initializes values on first load, and subsequent reruns access the stored value. st.rerun() forces an immediate script rerun after a state mutation, ensuring the UI reflects the latest state. For form inputs, st.form() batches multiple widget changes into a single submission, avoiding the “rerun per keystroke” problem and creating a more traditional form experience.

%%writefile session_state_app.py
import streamlit as st
import pandas as pd

st.title("📝 Session State Demo")

# Initialize session state
if 'counter' not in st.session_state:
    st.session_state.counter = 0

if 'todos' not in st.session_state:
    st.session_state.todos = []

# Counter example
st.header("Counter Example")
st.write(f"Current count: {st.session_state.counter}")

col1, col2, col3 = st.columns(3)
with col1:
    if st.button("Increment"):
        st.session_state.counter += 1
        st.rerun()

with col2:
    if st.button("Decrement"):
        st.session_state.counter -= 1
        st.rerun()

with col3:
    if st.button("Reset"):
        st.session_state.counter = 0
        st.rerun()

# Todo list example
st.header("Todo List Example")

with st.form("todo_form"):
    todo_input = st.text_input("Add a todo item")
    submitted = st.form_submit_button("Add")
    
    if submitted and todo_input:
        st.session_state.todos.append({
            'task': todo_input,
            'completed': False
        })

# Display todos
if st.session_state.todos:
    st.write("### Your Todos:")
    for i, todo in enumerate(st.session_state.todos):
        col1, col2, col3 = st.columns([0.1, 0.7, 0.2])
        
        with col1:
            if st.checkbox("", key=f"check_{i}", value=todo['completed']):
                st.session_state.todos[i]['completed'] = True
        
        with col2:
            if todo['completed']:
                st.markdown(f"~~{todo['task']}~~")
            else:
                st.write(todo['task'])
        
        with col3:
            if st.button("Delete", key=f"del_{i}"):
                st.session_state.todos.pop(i)
                st.rerun()
    
    if st.button("Clear All Completed"):
        st.session_state.todos = [
            t for t in st.session_state.todos if not t['completed']
        ]
        st.rerun()
else:
    st.info("No todos yet! Add one above.")

# Show session state (for debugging)
with st.expander("🔍 View Session State"):
    st.write(st.session_state)

print("To run: streamlit run session_state_app.py")

Caching for Performance: Avoiding Redundant Computation¶

Caching is the single most important optimization for Streamlit apps, because every user interaction triggers a full script rerun. Streamlit provides two caching decorators: @st.cache_data serializes return values (via pickling) and returns a fresh copy on cache hits – use this for DataFrames, arrays, and computation results where mutations should not affect the cache. @st.cache_resource returns the same object reference without copying – use this for ML models, database connections, and large objects where copying would be wasteful or break functionality.

Cache invalidation: both decorators use function arguments and source code as cache keys. If you change a function’s code, the cache automatically invalidates. You can also set ttl=3600 (time-to-live in seconds) for data that should refresh periodically, or call st.cache_data.clear() programmatically. For ML apps, the typical pattern is: cache data loading with @st.cache_data, cache model loading with @st.cache_resource, and leave prediction calls uncached since they depend on user input and are typically fast.

%%writefile caching_demo.py
import streamlit as st
import pandas as pd
import numpy as np
import time

st.title("⚡ Caching Demo")

# Without caching (slow)
def load_data_slow():
    """Simulate slow data loading without caching."""
    time.sleep(3)  # Simulate slow operation
    return pd.DataFrame({
        'A': np.random.randn(1000),
        'B': np.random.randn(1000)
    })

# With caching (fast)
@st.cache_data
def load_data_cached():
    """Simulate slow data loading WITH caching."""
    time.sleep(3)  # Simulate slow operation
    return pd.DataFrame({
        'A': np.random.randn(1000),
        'B': np.random.randn(1000)
    })

# Cache for machine learning models
@st.cache_resource
def load_model():
    """Simulate loading a ML model (use cache_resource for models)."""
    time.sleep(2)
    from sklearn.ensemble import RandomForestClassifier
    return RandomForestClassifier(n_estimators=100)

# Demo
st.header("Without Caching")
if st.button("Load Data (No Cache)"):
    with st.spinner("Loading... (3 seconds)"):
        start = time.time()
        df = load_data_slow()
        elapsed = time.time() - start
    st.success(f"Loaded in {elapsed:.2f} seconds")
    st.dataframe(df.head())
    st.warning("Click again - it will take 3 seconds every time!")

st.header("With Caching")
if st.button("Load Data (Cached)"):
    with st.spinner("Loading... (3 seconds first time, instant after)"):
        start = time.time()
        df = load_data_cached()
        elapsed = time.time() - start
    
    if elapsed < 1:
        st.success(f"Loaded from cache in {elapsed:.3f} seconds! ⚡")
    else:
        st.success(f"Loaded in {elapsed:.2f} seconds (cached for next time)")
    
    st.dataframe(df.head())
    st.info("Click again - it will be instant!")

st.header("Caching ML Models")
if st.button("Load Model"):
    with st.spinner("Loading model..."):
        start = time.time()
        model = load_model()
        elapsed = time.time() - start
    
    if elapsed < 1:
        st.success(f"Model loaded from cache in {elapsed:.3f} seconds!")
    else:
        st.success(f"Model loaded in {elapsed:.2f} seconds")
    
    st.write(f"Model: {type(model).__name__}")
    st.write(f"Parameters: {model.n_estimators} estimators")

# Caching best practices
with st.expander("📚 Caching Best Practices"):
    st.markdown("""
    ### When to use @st.cache_data:
    - Loading data from files/databases
    - Expensive computations
    - API calls
    - Data transformations
    
    ### When to use @st.cache_resource:
    - Machine learning models
    - Database connections
    - Large objects that should persist
    
    ### Key differences:
    - `cache_data`: Creates a new copy each time (safe for mutable data)
    - `cache_resource`: Returns the same object (for models, connections)
    
    ### Clear cache:
    - Press 'C' in the app
    - Or use st.cache_data.clear() or st.cache_resource.clear()
    """)

print("To run: streamlit run caching_demo.py")

Interactive Data Dashboard: Filters, KPIs, and Multi-Tab Visualization¶

A complete data dashboard brings together all Streamlit concepts: sidebar filters for data selection, st.metric() KPI cards for at-a-glance summaries, st.tabs() for organizing different visualization perspectives, and st.download_button() for data export. The dashboard below uses Plotly for interactive charts (zoom, hover, pan) instead of Matplotlib’s static images, which is the standard choice for production Streamlit apps.

Architecture pattern: the sidebar controls a global filtered_df variable that all downstream visualizations reference. This “filter once, display everywhere” pattern ensures consistency across tabs and prevents the common bug where different charts show data from different filter states. The st.plotly_chart(fig, use_container_width=True) call makes charts responsive to the browser width, and custom CSS via st.markdown() with unsafe_allow_html=True enables visual polish beyond Streamlit’s default styling.

%%writefile data_dashboard.py
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

# Page config
st.set_page_config(
    page_title="Data Dashboard",
    page_icon="📊",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Custom CSS
st.markdown("""
    <style>
    .main { padding: 0rem 1rem; }
    .stMetric { background-color: #f0f2f6; padding: 10px; border-radius: 5px; }
    </style>
""", unsafe_allow_html=True)

# Title
st.title("📊 Interactive Data Dashboard")

# Load data
@st.cache_data
def load_sample_data():
    """Generate sample sales data."""
    np.random.seed(42)
    dates = pd.date_range('2023-01-01', '2023-12-31', freq='D')
    
    df = pd.DataFrame({
        'Date': dates,
        'Sales': np.random.randint(100, 1000, len(dates)) + 
                 np.sin(np.arange(len(dates)) * 2 * np.pi / 365) * 200,
        'Customers': np.random.randint(10, 100, len(dates)),
        'Category': np.random.choice(['Electronics', 'Clothing', 'Food', 'Books'], len(dates)),
        'Region': np.random.choice(['North', 'South', 'East', 'West'], len(dates))
    })
    
    df['Revenue'] = df['Sales'] * np.random.uniform(10, 50, len(dates))
    return df

# Load data
df = load_sample_data()

# Sidebar filters
st.sidebar.header("Filters")

# Date range
date_range = st.sidebar.date_input(
    "Select Date Range",
    value=[df['Date'].min(), df['Date'].max()],
    min_value=df['Date'].min(),
    max_value=df['Date'].max()
)

# Category filter
categories = st.sidebar.multiselect(
    "Select Categories",
    options=df['Category'].unique(),
    default=df['Category'].unique()
)

# Region filter
regions = st.sidebar.multiselect(
    "Select Regions",
    options=df['Region'].unique(),
    default=df['Region'].unique()
)

# Filter data
if len(date_range) == 2:
    mask = (
        (df['Date'] >= pd.to_datetime(date_range[0])) &
        (df['Date'] <= pd.to_datetime(date_range[1])) &
        (df['Category'].isin(categories)) &
        (df['Region'].isin(regions))
    )
    filtered_df = df[mask]
else:
    filtered_df = df

# KPIs
st.header("Key Metrics")
col1, col2, col3, col4 = st.columns(4)

with col1:
    total_revenue = filtered_df['Revenue'].sum()
    st.metric("Total Revenue", f"${total_revenue:,.0f}")

with col2:
    total_sales = filtered_df['Sales'].sum()
    st.metric("Total Sales", f"{total_sales:,.0f}")

with col3:
    avg_customers = filtered_df['Customers'].mean()
    st.metric("Avg Customers/Day", f"{avg_customers:.0f}")

with col4:
    total_days = len(filtered_df)
    st.metric("Days in Range", f"{total_days}")

# Charts
tab1, tab2, tab3, tab4 = st.tabs(["📈 Trends", "📊 Distribution", "🗺️ Regional", "📋 Data"])

with tab1:
    st.subheader("Sales and Revenue Over Time")
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(
        x=filtered_df['Date'],
        y=filtered_df['Sales'],
        name='Sales',
        line=dict(color='blue')
    ))
    
    fig.add_trace(go.Scatter(
        x=filtered_df['Date'],
        y=filtered_df['Revenue'],
        name='Revenue',
        yaxis='y2',
        line=dict(color='green')
    ))
    
    fig.update_layout(
        yaxis=dict(title='Sales'),
        yaxis2=dict(title='Revenue', overlaying='y', side='right'),
        hovermode='x unified',
        height=400
    )
    
    st.plotly_chart(fig, use_container_width=True)

with tab2:
    col1, col2 = st.columns(2)
    
    with col1:
        st.subheader("Sales by Category")
        category_sales = filtered_df.groupby('Category')['Sales'].sum().reset_index()
        fig = px.pie(
            category_sales,
            values='Sales',
            names='Category',
            title='Sales Distribution by Category'
        )
        st.plotly_chart(fig, use_container_width=True)
    
    with col2:
        st.subheader("Revenue by Category")
        category_revenue = filtered_df.groupby('Category')['Revenue'].sum().reset_index()
        fig = px.bar(
            category_revenue,
            x='Category',
            y='Revenue',
            title='Revenue by Category'
        )
        st.plotly_chart(fig, use_container_width=True)

with tab3:
    st.subheader("Regional Performance")
    
    regional_data = filtered_df.groupby('Region').agg({
        'Sales': 'sum',
        'Revenue': 'sum',
        'Customers': 'mean'
    }).reset_index()
    
    fig = px.bar(
        regional_data,
        x='Region',
        y=['Sales', 'Revenue'],
        barmode='group',
        title='Sales and Revenue by Region'
    )
    st.plotly_chart(fig, use_container_width=True)
    
    # Regional metrics table
    st.dataframe(
        regional_data.style.format({
            'Sales': '{:.0f}',
            'Revenue': '${:.2f}',
            'Customers': '{:.0f}'
        }),
        use_container_width=True
    )

with tab4:
    st.subheader("Raw Data")
    
    # Search
    search = st.text_input("🔍 Search in data")
    if search:
        mask = filtered_df.astype(str).apply(
            lambda x: x.str.contains(search, case=False)
        ).any(axis=1)
        display_df = filtered_df[mask]
    else:
        display_df = filtered_df
    
    st.dataframe(
        display_df,
        use_container_width=True,
        height=400
    )
    
    # Download button
    csv = display_df.to_csv(index=False)
    st.download_button(
        label="📥 Download Data as CSV",
        data=csv,
        file_name='filtered_data.csv',
        mime='text/csv'
    )

# Footer
st.markdown("---")
st.caption("Built with Streamlit • Data is randomly generated for demonstration")

print("To run: streamlit run data_dashboard.py")

🎯 Key Takeaways¶

Streamlit is Python-native - Write apps in pure Python
Automatic reruns - UI updates when code changes
Session state - Maintain state across reruns
Caching is critical - Use @st.cache_data and @st.cache_resource
Rich layouts - Columns, tabs, expanders for organization
Interactive widgets - Easy user input collection

📝 Practice Exercises¶

Build a Stock Price Dashboard
- Fetch real stock data (yfinance)
- Show price charts
- Calculate moving averages
- Display key metrics
Create a Model Comparison App
- Train multiple ML models
- Compare accuracies
- Show confusion matrices
- Allow parameter tuning
Build a Text Analysis Dashboard
- Upload text files
- Show word frequencies
- Sentiment analysis
- Generate word clouds

🔗 Resources¶

Next: Notebook 3 - Hugging Face Spaces