# Install required packages
!pip install -q streamlit pandas plotly scikit-learn
Streamlit Basics: Python Scripts that Become Web AppsΒΆ
Streamlit takes a fundamentally different approach from Gradio: instead of wrapping a function, you write a top-to-bottom Python script that Streamlit re-executes on every user interaction. Each st.write(), st.slider(), or st.button() call both renders a UI widget and returns its current value, creating an implicit reactive data flow. The script written below with %%writefile can be launched from the terminal with streamlit run streamlit_basics.py, which starts a local server and opens the app in your browser.
Streamlit vs. Gradio tradeoffs: Streamlit excels at data dashboards with rich layouts (columns, tabs, expanders, sidebars) and interactive exploration where multiple widgets control a single visualization. Gradio excels at single-function ML demos with typed input/output contracts. Streamlitβs re-execution model means every widget interaction reruns the entire script β which is why caching (covered in Part 4) is essential for any operation that takes more than a fraction of a second.
%%writefile streamlit_basics.py
import streamlit as st
import pandas as pd
import numpy as np
# Title and headers
st.title("π My First Streamlit App")
st.header("Welcome to Streamlit!")
st.subheader("Building data apps made easy")
# Text elements
st.write("This is a simple text using st.write()")
st.markdown("**Bold** and *italic* text with Markdown")
st.caption("This is a caption")
# Display data
st.write("### Sample DataFrame")
df = pd.DataFrame({
'Column A': [1, 2, 3, 4],
'Column B': [10, 20, 30, 40]
})
st.dataframe(df) # Interactive table
# st.table(df) # Static table
# Metrics
col1, col2, col3 = st.columns(3)
col1.metric("Temperature", "70 Β°F", "1.2 Β°F")
col2.metric("Wind", "9 mph", "-8%")
col3.metric("Humidity", "86%", "4%")
# Input widgets
st.write("### Input Widgets")
text_input = st.text_input("Enter your name", "John Doe")
st.write(f"Hello, {text_input}!")
number = st.number_input("Pick a number", min_value=0, max_value=100, value=50)
st.write(f"You picked: {number}")
slider_val = st.slider("Select a range", 0, 100, 25)
st.write(f"Slider value: {slider_val}")
# Selectbox and multiselect
option = st.selectbox("Choose an option", ['Option 1', 'Option 2', 'Option 3'])
st.write(f"You selected: {option}")
options = st.multiselect(
"Choose multiple",
['A', 'B', 'C', 'D'],
default=['A', 'B']
)
st.write(f"You selected: {options}")
# Checkbox and radio
agree = st.checkbox("I agree to the terms")
if agree:
st.write("Great! You agreed.")
choice = st.radio("Pick one", ['Choice 1', 'Choice 2', 'Choice 3'])
st.write(f"You chose: {choice}")
# Button
if st.button("Click me!"):
st.write("Button was clicked!")
st.balloons() # Fun celebration!
# File uploader
uploaded_file = st.file_uploader("Upload a file", type=['csv', 'txt'])
if uploaded_file is not None:
st.write(f"Filename: {uploaded_file.name}")
print("To run this app: streamlit run streamlit_basics.py")
Running the AppΒΆ
To run the above app:
streamlit run streamlit_basics.py
This will open a browser window with your app!
Session State: Persisting Data Across RerunsΒΆ
Streamlitβs rerun-on-interaction model means local variables reset every time a user clicks a button or moves a slider. st.session_state is a dictionary-like object that persists across reruns within a single browser session, enabling counters, form submissions, multi-step workflows, and shopping-cart-like accumulation patterns. Without session state, a counter button would always show β1β because the variable resets before each render.
How it works: st.session_state is stored server-side per WebSocket connection. The pattern if 'key' not in st.session_state: st.session_state.key = default initializes values on first load, and subsequent reruns access the stored value. st.rerun() forces an immediate script rerun after a state mutation, ensuring the UI reflects the latest state. For form inputs, st.form() batches multiple widget changes into a single submission, avoiding the βrerun per keystrokeβ problem and creating a more traditional form experience.
%%writefile session_state_app.py
import streamlit as st
import pandas as pd
st.title("π Session State Demo")
# Initialize session state
if 'counter' not in st.session_state:
st.session_state.counter = 0
if 'todos' not in st.session_state:
st.session_state.todos = []
# Counter example
st.header("Counter Example")
st.write(f"Current count: {st.session_state.counter}")
col1, col2, col3 = st.columns(3)
with col1:
if st.button("Increment"):
st.session_state.counter += 1
st.rerun()
with col2:
if st.button("Decrement"):
st.session_state.counter -= 1
st.rerun()
with col3:
if st.button("Reset"):
st.session_state.counter = 0
st.rerun()
# Todo list example
st.header("Todo List Example")
with st.form("todo_form"):
todo_input = st.text_input("Add a todo item")
submitted = st.form_submit_button("Add")
if submitted and todo_input:
st.session_state.todos.append({
'task': todo_input,
'completed': False
})
# Display todos
if st.session_state.todos:
st.write("### Your Todos:")
for i, todo in enumerate(st.session_state.todos):
col1, col2, col3 = st.columns([0.1, 0.7, 0.2])
with col1:
if st.checkbox("", key=f"check_{i}", value=todo['completed']):
st.session_state.todos[i]['completed'] = True
with col2:
if todo['completed']:
st.markdown(f"~~{todo['task']}~~")
else:
st.write(todo['task'])
with col3:
if st.button("Delete", key=f"del_{i}"):
st.session_state.todos.pop(i)
st.rerun()
if st.button("Clear All Completed"):
st.session_state.todos = [
t for t in st.session_state.todos if not t['completed']
]
st.rerun()
else:
st.info("No todos yet! Add one above.")
# Show session state (for debugging)
with st.expander("π View Session State"):
st.write(st.session_state)
print("To run: streamlit run session_state_app.py")
Caching for Performance: Avoiding Redundant ComputationΒΆ
Caching is the single most important optimization for Streamlit apps, because every user interaction triggers a full script rerun. Streamlit provides two caching decorators: @st.cache_data serializes return values (via pickling) and returns a fresh copy on cache hits β use this for DataFrames, arrays, and computation results where mutations should not affect the cache. @st.cache_resource returns the same object reference without copying β use this for ML models, database connections, and large objects where copying would be wasteful or break functionality.
Cache invalidation: both decorators use function arguments and source code as cache keys. If you change a functionβs code, the cache automatically invalidates. You can also set ttl=3600 (time-to-live in seconds) for data that should refresh periodically, or call st.cache_data.clear() programmatically. For ML apps, the typical pattern is: cache data loading with @st.cache_data, cache model loading with @st.cache_resource, and leave prediction calls uncached since they depend on user input and are typically fast.
%%writefile caching_demo.py
import streamlit as st
import pandas as pd
import numpy as np
import time
st.title("β‘ Caching Demo")
# Without caching (slow)
def load_data_slow():
"""Simulate slow data loading without caching."""
time.sleep(3) # Simulate slow operation
return pd.DataFrame({
'A': np.random.randn(1000),
'B': np.random.randn(1000)
})
# With caching (fast)
@st.cache_data
def load_data_cached():
"""Simulate slow data loading WITH caching."""
time.sleep(3) # Simulate slow operation
return pd.DataFrame({
'A': np.random.randn(1000),
'B': np.random.randn(1000)
})
# Cache for machine learning models
@st.cache_resource
def load_model():
"""Simulate loading a ML model (use cache_resource for models)."""
time.sleep(2)
from sklearn.ensemble import RandomForestClassifier
return RandomForestClassifier(n_estimators=100)
# Demo
st.header("Without Caching")
if st.button("Load Data (No Cache)"):
with st.spinner("Loading... (3 seconds)"):
start = time.time()
df = load_data_slow()
elapsed = time.time() - start
st.success(f"Loaded in {elapsed:.2f} seconds")
st.dataframe(df.head())
st.warning("Click again - it will take 3 seconds every time!")
st.header("With Caching")
if st.button("Load Data (Cached)"):
with st.spinner("Loading... (3 seconds first time, instant after)"):
start = time.time()
df = load_data_cached()
elapsed = time.time() - start
if elapsed < 1:
st.success(f"Loaded from cache in {elapsed:.3f} seconds! β‘")
else:
st.success(f"Loaded in {elapsed:.2f} seconds (cached for next time)")
st.dataframe(df.head())
st.info("Click again - it will be instant!")
st.header("Caching ML Models")
if st.button("Load Model"):
with st.spinner("Loading model..."):
start = time.time()
model = load_model()
elapsed = time.time() - start
if elapsed < 1:
st.success(f"Model loaded from cache in {elapsed:.3f} seconds!")
else:
st.success(f"Model loaded in {elapsed:.2f} seconds")
st.write(f"Model: {type(model).__name__}")
st.write(f"Parameters: {model.n_estimators} estimators")
# Caching best practices
with st.expander("π Caching Best Practices"):
st.markdown("""
### When to use @st.cache_data:
- Loading data from files/databases
- Expensive computations
- API calls
- Data transformations
### When to use @st.cache_resource:
- Machine learning models
- Database connections
- Large objects that should persist
### Key differences:
- `cache_data`: Creates a new copy each time (safe for mutable data)
- `cache_resource`: Returns the same object (for models, connections)
### Clear cache:
- Press 'C' in the app
- Or use st.cache_data.clear() or st.cache_resource.clear()
""")
print("To run: streamlit run caching_demo.py")
Interactive Data Dashboard: Filters, KPIs, and Multi-Tab VisualizationΒΆ
A complete data dashboard brings together all Streamlit concepts: sidebar filters for data selection, st.metric() KPI cards for at-a-glance summaries, st.tabs() for organizing different visualization perspectives, and st.download_button() for data export. The dashboard below uses Plotly for interactive charts (zoom, hover, pan) instead of Matplotlibβs static images, which is the standard choice for production Streamlit apps.
Architecture pattern: the sidebar controls a global filtered_df variable that all downstream visualizations reference. This βfilter once, display everywhereβ pattern ensures consistency across tabs and prevents the common bug where different charts show data from different filter states. The st.plotly_chart(fig, use_container_width=True) call makes charts responsive to the browser width, and custom CSS via st.markdown() with unsafe_allow_html=True enables visual polish beyond Streamlitβs default styling.
%%writefile data_dashboard.py
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
# Page config
st.set_page_config(
page_title="Data Dashboard",
page_icon="π",
layout="wide",
initial_sidebar_state="expanded"
)
# Custom CSS
st.markdown("""
<style>
.main { padding: 0rem 1rem; }
.stMetric { background-color: #f0f2f6; padding: 10px; border-radius: 5px; }
</style>
""", unsafe_allow_html=True)
# Title
st.title("π Interactive Data Dashboard")
# Load data
@st.cache_data
def load_sample_data():
"""Generate sample sales data."""
np.random.seed(42)
dates = pd.date_range('2023-01-01', '2023-12-31', freq='D')
df = pd.DataFrame({
'Date': dates,
'Sales': np.random.randint(100, 1000, len(dates)) +
np.sin(np.arange(len(dates)) * 2 * np.pi / 365) * 200,
'Customers': np.random.randint(10, 100, len(dates)),
'Category': np.random.choice(['Electronics', 'Clothing', 'Food', 'Books'], len(dates)),
'Region': np.random.choice(['North', 'South', 'East', 'West'], len(dates))
})
df['Revenue'] = df['Sales'] * np.random.uniform(10, 50, len(dates))
return df
# Load data
df = load_sample_data()
# Sidebar filters
st.sidebar.header("Filters")
# Date range
date_range = st.sidebar.date_input(
"Select Date Range",
value=[df['Date'].min(), df['Date'].max()],
min_value=df['Date'].min(),
max_value=df['Date'].max()
)
# Category filter
categories = st.sidebar.multiselect(
"Select Categories",
options=df['Category'].unique(),
default=df['Category'].unique()
)
# Region filter
regions = st.sidebar.multiselect(
"Select Regions",
options=df['Region'].unique(),
default=df['Region'].unique()
)
# Filter data
if len(date_range) == 2:
mask = (
(df['Date'] >= pd.to_datetime(date_range[0])) &
(df['Date'] <= pd.to_datetime(date_range[1])) &
(df['Category'].isin(categories)) &
(df['Region'].isin(regions))
)
filtered_df = df[mask]
else:
filtered_df = df
# KPIs
st.header("Key Metrics")
col1, col2, col3, col4 = st.columns(4)
with col1:
total_revenue = filtered_df['Revenue'].sum()
st.metric("Total Revenue", f"${total_revenue:,.0f}")
with col2:
total_sales = filtered_df['Sales'].sum()
st.metric("Total Sales", f"{total_sales:,.0f}")
with col3:
avg_customers = filtered_df['Customers'].mean()
st.metric("Avg Customers/Day", f"{avg_customers:.0f}")
with col4:
total_days = len(filtered_df)
st.metric("Days in Range", f"{total_days}")
# Charts
tab1, tab2, tab3, tab4 = st.tabs(["π Trends", "π Distribution", "πΊοΈ Regional", "π Data"])
with tab1:
st.subheader("Sales and Revenue Over Time")
fig = go.Figure()
fig.add_trace(go.Scatter(
x=filtered_df['Date'],
y=filtered_df['Sales'],
name='Sales',
line=dict(color='blue')
))
fig.add_trace(go.Scatter(
x=filtered_df['Date'],
y=filtered_df['Revenue'],
name='Revenue',
yaxis='y2',
line=dict(color='green')
))
fig.update_layout(
yaxis=dict(title='Sales'),
yaxis2=dict(title='Revenue', overlaying='y', side='right'),
hovermode='x unified',
height=400
)
st.plotly_chart(fig, use_container_width=True)
with tab2:
col1, col2 = st.columns(2)
with col1:
st.subheader("Sales by Category")
category_sales = filtered_df.groupby('Category')['Sales'].sum().reset_index()
fig = px.pie(
category_sales,
values='Sales',
names='Category',
title='Sales Distribution by Category'
)
st.plotly_chart(fig, use_container_width=True)
with col2:
st.subheader("Revenue by Category")
category_revenue = filtered_df.groupby('Category')['Revenue'].sum().reset_index()
fig = px.bar(
category_revenue,
x='Category',
y='Revenue',
title='Revenue by Category'
)
st.plotly_chart(fig, use_container_width=True)
with tab3:
st.subheader("Regional Performance")
regional_data = filtered_df.groupby('Region').agg({
'Sales': 'sum',
'Revenue': 'sum',
'Customers': 'mean'
}).reset_index()
fig = px.bar(
regional_data,
x='Region',
y=['Sales', 'Revenue'],
barmode='group',
title='Sales and Revenue by Region'
)
st.plotly_chart(fig, use_container_width=True)
# Regional metrics table
st.dataframe(
regional_data.style.format({
'Sales': '{:.0f}',
'Revenue': '${:.2f}',
'Customers': '{:.0f}'
}),
use_container_width=True
)
with tab4:
st.subheader("Raw Data")
# Search
search = st.text_input("π Search in data")
if search:
mask = filtered_df.astype(str).apply(
lambda x: x.str.contains(search, case=False)
).any(axis=1)
display_df = filtered_df[mask]
else:
display_df = filtered_df
st.dataframe(
display_df,
use_container_width=True,
height=400
)
# Download button
csv = display_df.to_csv(index=False)
st.download_button(
label="π₯ Download Data as CSV",
data=csv,
file_name='filtered_data.csv',
mime='text/csv'
)
# Footer
st.markdown("---")
st.caption("Built with Streamlit β’ Data is randomly generated for demonstration")
print("To run: streamlit run data_dashboard.py")
π― Key TakeawaysΒΆ
Streamlit is Python-native - Write apps in pure Python
Automatic reruns - UI updates when code changes
Session state - Maintain state across reruns
Caching is critical - Use
@st.cache_dataand@st.cache_resourceRich layouts - Columns, tabs, expanders for organization
Interactive widgets - Easy user input collection
π Practice ExercisesΒΆ
Build a Stock Price Dashboard
Fetch real stock data (yfinance)
Show price charts
Calculate moving averages
Display key metrics
Create a Model Comparison App
Train multiple ML models
Compare accuracies
Show confusion matrices
Allow parameter tuning
Build a Text Analysis Dashboard
Upload text files
Show word frequencies
Sentiment analysis
Generate word clouds