05: Advanced Techniques & Applicationsยถ
โThe best way to predict the future is to create it.โ - Peter Drucker
Welcome to the advanced techniques and practical applications of time series forecasting! This notebook covers ensemble methods, uncertainty quantification, and real-world deployment considerations.
๐ฏ Learning Objectivesยถ
By the end of this notebook, youโll be able to:
Implement ensemble forecasting methods
Quantify and visualize prediction uncertainty
Handle multivariate time series forecasting
Deploy models for production use
Monitor and maintain forecasting systems
Apply time series methods to real-world problems
๐ค Ensemble Methodsยถ
Ensemble forecasting combines multiple models to improve accuracy and robustness.
Why Ensembles Work:ยถ
Bias-Variance Tradeoff: Different models have different strengths
Error Diversity: Models make different mistakes
Robustness: Less sensitive to individual model failures
Uncertainty: Better uncertainty quantification
Ensemble Types:ยถ
Simple Averaging: Equal weights for all models
Weighted Averaging: Performance-based weights
Stacking: Meta-model learns optimal combination
Bagging: Bootstrap aggregation
Boosting: Sequential model improvement
Time Series Specific:ยถ
Temporal validation: Respect time ordering
Rolling ensembles: Update with new data
Online learning: Continuous model adaptation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
import torch
import torch.nn as nn
import warnings
warnings.filterwarnings('ignore')
# Set random seeds
np.random.seed(42)
torch.manual_seed(42)
# Set up plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['font.size'] = 12]
print("Ensemble Forecasting Libraries Loaded!")
def generate_ensemble_data(n_days=500, noise_level=0.1):
"""Generate time series data for ensemble testing"""
dates = pd.date_range('2020-01-01', periods=n_days, freq='D')
t = np.arange(n_days)
# Complex trend with changes
trend = 0.002 * t + 0.000005 * t**2
trend_changes = np.zeros(n_days)
trend_changes[150:250] += 5 # Positive shock
trend_changes[350:400] -= 8 # Negative shock
# Multiple seasonal patterns
weekly = 3 * np.sin(2 * np.pi * t / 7)
monthly = 4 * np.sin(2 * np.pi * t / 30.44)
quarterly = 2 * np.sin(2 * np.pi * t / 91.25)
# External factors
external = np.random.choice([-3, -1, 0, 1, 3], n_days, p=[0.1, 0.2, 0.4, 0.2, 0.1])
external_smooth = np.convolve(external, np.ones(14)/14, mode='same')
# Combine components
y = (trend + trend_changes + weekly + monthly + quarterly + external_smooth)
# Add noise
noise = np.random.normal(0, noise_level * np.std(y), n_days)
y += noise
# Create DataFrame
df = pd.DataFrame({
'ds': dates,
'y': y,
'trend': trend,
'weekly': weekly,
'monthly': monthly,
'quarterly': quarterly,
'external': external_smooth,
'noise': noise
})
return df
# Generate ensemble data
ensemble_data = generate_ensemble_data()
print(f"Generated {len(ensemble_data)} days of ensemble test data")
print(f"Date range: {ensemble_data['ds'].min()} to {ensemble_data['ds'].max()}")
print(f"Value range: {ensemble_data['y'].min():.2f} to {ensemble_data['y'].max():.2f}")
# Plot the data
plt.figure(figsize=(15, 8))
plt.plot(ensemble_data['ds'], ensemble_data['y'], 'b-', linewidth=1.5, alpha=0.8)
plt.title('Time Series Data for Ensemble Testing')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True, alpha=0.3)
plt.show()
# Show data structure
print("\nData Structure:")
print(ensemble_data.head())
print("\nData Info:")
print(ensemble_data.info())
class TimeSeriesEnsemble:
"""Ensemble forecasting class combining multiple models"""
def __init__(self, models=None):
if models is None:
self.models = {
'arima': None,
'prophet': None,
'rf': None,
'linear': None,
'lstm': None
}
else:
self.models = models
self.scaler = MinMaxScaler()
self.is_trained = False
def _prepare_features(self, data, lookback=30):
"""Prepare features for ML models"""
df = data.copy()
# Lag features
for i in range(1, lookback + 1):
df[f'lag_{i}'] = df['y'].shift(i)
# Rolling statistics
df['rolling_mean_7'] = df['y'].rolling(7).mean()
df['rolling_std_7'] = df['y'].rolling(7).std()
df['rolling_mean_30'] = df['y'].rolling(30).mean()
df['rolling_std_30'] = df['y'].rolling(30).std()
# Time features
df['day_of_week'] = df['ds'].dt.dayofweek
df['month'] = df['ds'].dt.month
df['quarter'] = df['ds'].dt.quarter
# Drop NaN values
df = df.dropna()
return df
def fit(self, train_data):
"""Fit all models in the ensemble"""
print("Training ensemble models...")
# Prepare features for ML models
train_features = self._prepare_features(train_data)
X = train_features.drop(['ds', 'y'], axis=1)
y = train_features['y']
# Scale features
X_scaled = self.scaler.fit_transform(X)
# Train ARIMA
print(" Training ARIMA...")
try:
arima_model = ARIMA(train_data['y'], order=(5, 1, 2))
self.models['arima'] = arima_model.fit()
except:
print(" ARIMA training failed, skipping")
self.models['arima'] = None
# Train Prophet
print(" Training Prophet...")
try:
prophet_df = train_data[['ds', 'y']].copy()
prophet_model = Prophet(
changepoint_prior_scale=0.05,
seasonality_mode='additive',
daily_seasonality=False
)
prophet_model.fit(prophet_df)
self.models['prophet'] = prophet_model
except:
print(" Prophet training failed, skipping")
self.models['prophet'] = None
# Train Random Forest
print(" Training Random Forest...")
try:
rf_model = RandomForestRegressor(
n_estimators=100,
max_depth=10,
random_state=42
)
rf_model.fit(X_scaled, y)
self.models['rf'] = rf_model
except:
print(" Random Forest training failed, skipping")
self.models['rf'] = None
# Train Linear Regression
print(" Training Linear Regression...")
try:
linear_model = LinearRegression()
linear_model.fit(X_scaled, y)
self.models['linear'] = linear_model
except:
print(" Linear Regression training failed, skipping")
self.models['linear'] = None
# Placeholder for LSTM (would need proper implementation)
self.models['lstm'] = None
self.is_trained = True
print("Ensemble training completed!")
def predict(self, test_data, forecast_horizon=1):
"""Generate ensemble predictions"""
if not self.is_trained:
raise ValueError("Ensemble must be trained before prediction")
predictions = {}
# ARIMA predictions
if self.models['arima'] is not None:
try:
arima_pred = self.models['arima'].forecast(steps=len(test_data))
predictions['arima'] = arima_pred
except:
predictions['arima'] = np.full(len(test_data), np.nan)
# Prophet predictions
if self.models['prophet'] is not None:
try:
future = self.models['prophet'].make_future_dataframe(periods=len(test_data), freq='D')
prophet_forecast = self.models['prophet'].predict(future)
prophet_pred = prophet_forecast['yhat'].tail(len(test_data)).values
predictions['prophet'] = prophet_pred
except:
predictions['prophet'] = np.full(len(test_data), np.nan)
# ML model predictions
test_features = self._prepare_features(test_data)
if len(test_features) > 0:
X_test = test_features.drop(['ds', 'y'], axis=1)
X_test_scaled = self.scaler.transform(X_test)
# Random Forest
if self.models['rf'] is not None:
try:
rf_pred = self.models['rf'].predict(X_test_scaled)
predictions['rf'] = rf_pred
except:
predictions['rf'] = np.full(len(test_features), np.nan)
# Linear Regression
if self.models['linear'] is not None:
try:
linear_pred = self.models['linear'].predict(X_test_scaled)
predictions['linear'] = linear_pred
except:
predictions['linear'] = np.full(len(test_features), np.nan)
# LSTM placeholder
predictions['lstm'] = np.full(len(test_data), np.nan)
return predictions
def ensemble_predict(self, predictions, method='simple_average'):
"""Combine predictions using ensemble method"""
# Stack predictions
pred_df = pd.DataFrame(predictions)
if method == 'simple_average':
# Simple average of available predictions
valid_preds = pred_df.dropna(axis=1)
if len(valid_preds.columns) > 0:
ensemble_pred = valid_preds.mean(axis=1)
else:
ensemble_pred = np.full(len(pred_df), np.nan)
elif method == 'median':
# Median of available predictions
valid_preds = pred_df.dropna(axis=1)
if len(valid_preds.columns) > 0:
ensemble_pred = valid_preds.median(axis=1)
else:
ensemble_pred = np.full(len(pred_df), np.nan)
else:
raise ValueError(f"Unknown ensemble method: {method}")
return ensemble_pred.values
# Split data for ensemble testing
train_size = int(len(ensemble_data) * 0.7)
train_data = ensemble_data[:train_size]
test_data = ensemble_data[train_size:]
print(f"Train set: {len(train_data)} samples")
print(f"Test set: {len(test_data)} samples")
# Create and train ensemble
ensemble = TimeSeriesEnsemble()
ensemble.fit(train_data)
# Generate predictions
individual_predictions = ensemble.predict(test_data)
ensemble_pred = ensemble.ensemble_predict(individual_predictions, method='simple_average')
# Calculate accuracies
accuracies = {}
for model_name, preds in individual_predictions.items():
if not np.isnan(preds).all():
valid_idx = ~np.isnan(preds)
if np.sum(valid_idx) > 0:
mae = mean_absolute_error(test_data['y'].values[valid_idx], preds[valid_idx])
accuracies[model_name] = mae
# Ensemble accuracy
ensemble_mae = mean_absolute_error(test_data['y'], ensemble_pred)
accuracies['ensemble'] = ensemble_mae
print("\nModel Accuracies (MAE):")
for model, mae in accuracies.items():
print(f" {model}: {mae:.4f}")
# Plot ensemble results
plt.figure(figsize=(15, 8))
# Plot actual data
plt.plot(test_data['ds'], test_data['y'], 'k-', linewidth=2, label='Actual', alpha=0.8)
# Plot individual predictions
colors = ['red', 'blue', 'green', 'orange', 'purple']
for i, (model_name, preds) in enumerate(individual_predictions.items()):
if not np.isnan(preds).all():
plt.plot(test_data['ds'], preds, color=colors[i % len(colors)],
linewidth=1.5, alpha=0.7, label=f'{model_name.upper()}')
# Plot ensemble prediction
plt.plot(test_data['ds'], ensemble_pred, 'r-', linewidth=3, label='Ensemble')
plt.title('Ensemble Forecasting Results')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
๐ Uncertainty Quantificationยถ
Uncertainty quantification is crucial for reliable forecasting and decision making.
Types of Uncertainty:ยถ
Aleatoric: Inherent randomness in data (irreducible)
Epistemic: Model uncertainty (reducible with more data/knowledge)
Prediction intervals: Range of likely future values
Confidence intervals: Uncertainty in parameter estimates
Quantification Methods:ยถ
Bootstrap: Resampling-based uncertainty
Bayesian: Probabilistic parameter estimation
Ensemble: Model disagreement as uncertainty
Quantile regression: Direct quantile prediction
Conformal prediction: Distribution-free uncertainty
Applications:ยถ
Risk assessment: Decision making under uncertainty
Inventory management: Safety stock calculations
Financial planning: Portfolio optimization
Supply chain: Demand planning with uncertainty
class UncertaintyQuantifier:
"""Class for quantifying prediction uncertainty"""
def __init__(self, n_bootstraps=100, confidence_level=0.95):
self.n_bootstraps = n_bootstraps
self.confidence_level = confidence_level
self.lower_percentile = (1 - confidence_level) / 2 * 100
self.upper_percentile = (1 + confidence_level) / 2 * 100
def bootstrap_uncertainty(self, model_func, data, n_forecasts):
"""Calculate uncertainty using bootstrapping"""
predictions = []
for i in range(self.n_bootstraps):
# Bootstrap sample
bootstrap_idx = np.random.choice(len(data), len(data), replace=True)
bootstrap_sample = data.iloc[bootstrap_idx]
# Fit model and predict
try:
pred = model_func(bootstrap_sample, n_forecasts)
predictions.append(pred)
except:
continue
if len(predictions) == 0:
return None, None, None
predictions = np.array(predictions)
# Calculate statistics
mean_pred = np.mean(predictions, axis=0)
lower_bound = np.percentile(predictions, self.lower_percentile, axis=0)
upper_bound = np.percentile(predictions, self.upper_percentile, axis=0)
return mean_pred, lower_bound, upper_bound
def ensemble_uncertainty(self, predictions_dict):
"""Calculate uncertainty from ensemble predictions"""
# Stack predictions
pred_array = np.column_stack([pred for pred in predictions_dict.values()
if not np.isnan(pred).all()])
if pred_array.shape[1] == 0:
return None, None, None
# Calculate ensemble statistics
mean_pred = np.mean(pred_array, axis=1)
std_pred = np.std(pred_array, axis=1)
# Use standard deviation for uncertainty bounds
lower_bound = mean_pred - 1.96 * std_pred
upper_bound = mean_pred + 1.96 * std_pred
return mean_pred, lower_bound, upper_bound
def prediction_interval_coverage(self, actuals, lower_bounds, upper_bounds):
"""Calculate prediction interval coverage"""
coverage = np.mean((actuals >= lower_bounds) & (actuals <= upper_bounds))
return coverage
def interval_width(self, lower_bounds, upper_bounds):
"""Calculate average prediction interval width"""
width = np.mean(upper_bounds - lower_bounds)
return width
# Initialize uncertainty quantifier
uq = UncertaintyQuantifier(n_bootstraps=50, confidence_level=0.95)
# Function for ARIMA bootstrap
def arima_bootstrap_func(data, n_forecasts):
try:
model = ARIMA(data['y'], order=(5, 1, 2))
fitted = model.fit()
forecast = fitted.forecast(steps=n_forecasts)
return forecast
except:
return np.full(n_forecasts, np.nan)
# Calculate bootstrap uncertainty for ARIMA
print("Calculating bootstrap uncertainty for ARIMA...")
arima_mean, arima_lower, arima_upper = uq.bootstrap_uncertainty(
arima_bootstrap_func, train_data, len(test_data)
)
# Calculate ensemble uncertainty
print("Calculating ensemble uncertainty...")
ensemble_mean, ensemble_lower, ensemble_upper = uq.ensemble_uncertainty(individual_predictions)
# Calculate coverage and width
if arima_mean is not None:
arima_coverage = uq.prediction_interval_coverage(
test_data['y'].values, arima_lower, arima_upper
)
arima_width = uq.interval_width(arima_lower, arima_upper)
print(f"ARIMA Bootstrap - Coverage: {arima_coverage:.3f}, Width: {arima_width:.4f}")
if ensemble_mean is not None:
ensemble_coverage = uq.prediction_interval_coverage(
test_data['y'].values, ensemble_lower, ensemble_upper
)
ensemble_width = uq.interval_width(ensemble_lower, ensemble_upper)
print(f"Ensemble - Coverage: {ensemble_coverage:.3f}, Width: {ensemble_width:.4f}")
# Plot uncertainty comparison
plt.figure(figsize=(15, 10))
# Plot 1: Bootstrap uncertainty
plt.subplot(2, 1, 1)
plt.plot(test_data['ds'], test_data['y'], 'k-', linewidth=2, label='Actual')
if arima_mean is not None:
plt.plot(test_data['ds'], arima_mean, 'b-', linewidth=2, label='ARIMA Mean')
plt.fill_between(test_data['ds'], arima_lower, arima_upper,
color='blue', alpha=0.3, label='95% Bootstrap CI')
plt.title('Bootstrap Uncertainty Quantification (ARIMA)')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)
# Plot 2: Ensemble uncertainty
plt.subplot(2, 1, 2)
plt.plot(test_data['ds'], test_data['y'], 'k-', linewidth=2, label='Actual')
if ensemble_mean is not None:
plt.plot(test_data['ds'], ensemble_mean, 'r-', linewidth=2, label='Ensemble Mean')
plt.fill_between(test_data['ds'], ensemble_lower, ensemble_upper,
color='red', alpha=0.3, label='95% Ensemble CI')
plt.title('Ensemble Uncertainty Quantification')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Uncertainty analysis
print("\n=== Uncertainty Analysis ===")
if arima_mean is not None:
print(f"ARIMA Bootstrap Intervals:")
print(f" Average width: {arima_width:.4f}")
print(f" Coverage: {arima_coverage:.1%}")
print(f" Target coverage: {uq.confidence_level:.1%}")
if ensemble_mean is not None:
print(f"\nEnsemble Intervals:")
print(f" Average width: {ensemble_width:.4f}")
print(f" Coverage: {ensemble_coverage:.1%}")
print(f" Target coverage: {uq.confidence_level:.1%}")
# Reliability diagram
if ensemble_mean is not None:
plt.figure(figsize=(8, 6))
# Calculate coverage by bins
errors = np.abs(test_data['y'].values - ensemble_mean)
uncertainty = (ensemble_upper - ensemble_lower) / 2
# Bin by uncertainty
bins = np.linspace(uncertainty.min(), uncertainty.max(), 10)
bin_centers = (bins[:-1] + bins[1:]) / 2
coverage_by_bin = []
for i in range(len(bins) - 1):
mask = (uncertainty >= bins[i]) & (uncertainty < bins[i+1])
if np.sum(mask) > 0:
coverage = np.mean(errors[mask] <= uncertainty[mask])
coverage_by_bin.append(coverage)
else:
coverage_by_bin.append(0)
plt.plot(bin_centers, coverage_by_bin, 'bo-', linewidth=2, markersize=8)
plt.plot([bins.min(), bins.max()], [0.95, 0.95], 'r--', label='Target Coverage (95%)')
plt.xlabel('Uncertainty (Interval Half-Width)')
plt.ylabel('Observed Coverage')
plt.title('Uncertainty Calibration (Reliability Diagram)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
๐ Production Deploymentยถ
Deploying forecasting models to production requires careful consideration of scalability, reliability, and maintenance.
Deployment Architecture:ยถ
Batch processing: Periodic forecast generation
Real-time scoring: On-demand predictions
Model serving: REST APIs, microservices
Containerization: Docker, Kubernetes
Cloud platforms: AWS SageMaker, Azure ML, GCP AI Platform
Key Considerations:ยถ
Scalability: Handle varying loads
Latency: Response time requirements
Reliability: Fault tolerance and monitoring
Security: Data protection and access control
Cost: Computational resource optimization
MLOps Pipeline:ยถ
Data ingestion: Automated data pipelines
Model training: Scheduled retraining
Model validation: Automated testing
Model deployment: CI/CD pipelines
Monitoring: Performance tracking and alerting
class ForecastingService:
"""Production-ready forecasting service"""
def __init__(self, model, scaler=None):
self.model = model
self.scaler = scaler
self.performance_history = []
self.last_training_date = None
def predict(self, input_data, forecast_horizon=1):
"""Generate predictions with error handling"""
try:
# Validate input
if not self._validate_input(input_data):
raise ValueError("Invalid input data")
# Generate prediction
if hasattr(self.model, 'predict'):
prediction = self.model.predict(input_data)
elif hasattr(self.model, 'forecast'):
prediction = self.model.forecast(steps=forecast_horizon)
else:
raise ValueError("Model does not have prediction method")
# Log prediction
self._log_prediction(prediction, input_data)
return {
'status': 'success',
'prediction': prediction,
'timestamp': pd.Timestamp.now(),
'model_version': self._get_model_version()
}
except Exception as e:
return {
'status': 'error',
'error': str(e),
'timestamp': pd.Timestamp.now()
}
def _validate_input(self, input_data):
"""Validate input data format and quality"""
# Check if data is not empty
if input_data is None or len(input_data) == 0:
return False
# Check for required columns (if DataFrame)
if isinstance(input_data, pd.DataFrame):
required_cols = ['ds', 'y']
if not all(col in input_data.columns for col in required_cols):
return False
# Check for missing values
if input_data.isnull().any().any():
return False
# Check date column
if not pd.api.types.is_datetime64_any_dtype(input_data['ds']):
return False
return True
def _log_prediction(self, prediction, input_data):
"""Log prediction for monitoring"""
log_entry = {
'timestamp': pd.Timestamp.now(),
'input_shape': input_data.shape if hasattr(input_data, 'shape') else len(input_data),
'prediction_shape': prediction.shape if hasattr(prediction, 'shape') else len(prediction),
'model_type': type(self.model).__name__
}
self.performance_history.append(log_entry)
# Keep only last 1000 entries
if len(self.performance_history) > 1000:
self.performance_history = self.performance_history[-1000:]
def _get_model_version(self):
"""Get model version information"""
return f"v1.0_{pd.Timestamp.now().strftime('%Y%m%d')}"
def update_model(self, new_model, new_scaler=None):
"""Update model with new version"""
self.model = new_model
if new_scaler is not None:
self.scaler = new_scaler
self.last_training_date = pd.Timestamp.now()
print(f"Model updated at {self.last_training_date}")
def get_performance_stats(self):
"""Get service performance statistics"""
if not self.performance_history:
return {}
df = pd.DataFrame(self.performance_history)
stats = {
'total_predictions': len(df),
'avg_prediction_time': (df['timestamp'].max() - df['timestamp'].min()).total_seconds() / len(df) if len(df) > 1 else 0,
'last_prediction': df['timestamp'].max(),
'model_type': df['model_type'].iloc[0] if len(df) > 0 else 'Unknown'
}
return stats
def health_check(self):
"""Perform health check"""
health_status = {
'status': 'healthy',
'model_loaded': self.model is not None,
'last_training': self.last_training_date,
'performance_stats': self.get_performance_stats()
}
# Check if model is recent (within 30 days)
if self.last_training_date:
days_since_training = (pd.Timestamp.now() - self.last_training_date).days
if days_since_training > 30:
health_status['status'] = 'warning'
health_status['message'] = f'Model is {days_since_training} days old'
return health_status
# Create forecasting service
service = ForecastingService(ensemble)
# Test service with sample data
test_input = test_data[['ds', 'y']].head(10)
result = service.predict(test_input, forecast_horizon=5)
print("Service Prediction Result:")
print(f"Status: {result['status']}")
if result['status'] == 'success':
print(f"Prediction shape: {result['prediction'].shape}")
print(f"Timestamp: {result['timestamp']}")
print(f"Model version: {result['model_version']}")
# Test health check
health = service.health_check()
print(f"\nHealth Status: {health['status']}")
print(f"Model loaded: {health['model_loaded']}")
print(f"Performance stats: {health['performance_stats']}")
# Simulate multiple predictions for monitoring
print("\nSimulating service usage...")
for i in range(10):
sample_input = test_data[['ds', 'y']].sample(5, random_state=i)
service.predict(sample_input)
# Get updated performance stats
stats = service.get_performance_stats()
print(f"\nService Statistics:")
print(f"Total predictions: {stats['total_predictions']}")
print(f"Average prediction time: {stats['avg_prediction_time']:.4f} seconds")
print(f"Last prediction: {stats['last_prediction']}")
print(f"Model type: {stats['model_type']}")
๐ Real-World Applicationsยถ
Time series forecasting powers many critical business applications.
Business Applications:ยถ
Demand forecasting: Retail sales, inventory management
Financial forecasting: Stock prices, economic indicators
Supply chain: Procurement, logistics optimization
Energy: Load forecasting, renewable energy prediction
Healthcare: Patient volume, epidemic forecasting
Transportation: Traffic prediction, ride-sharing demand
Industry Examples:ยถ
Retail: Walmart uses forecasting for inventory optimization
Finance: Hedge funds use high-frequency trading forecasts
Manufacturing: Predictive maintenance scheduling
E-commerce: Amazonโs recommendation and inventory systems
Weather: Meteorological forecasting for agriculture
Success Metrics:ยถ
Accuracy improvement: Percentage reduction in forecast error
Business impact: Cost savings, revenue increase
Operational efficiency: Reduced stockouts, better resource utilization
Customer satisfaction: Better service levels
def retail_demand_forecasting():
"""Example: Retail demand forecasting application"""
print("=== Retail Demand Forecasting Example ===")
# Generate retail sales data
dates = pd.date_range('2023-01-01', periods=365, freq='D')
# Base demand with seasonality
t = np.arange(365)
base_demand = 100 + 20 * np.sin(2 * np.pi * t / 365) # Yearly seasonality
weekly_pattern = 10 * np.sin(2 * np.pi * t / 7) # Weekly pattern
# Add promotional effects
promotions = np.zeros(365)
promo_days = [30, 91, 152, 213, 274, 335] # Monthly promotions
for day in promo_days:
if day < 365:
promotions[day-2:day+3] = 50 # 5-day promotion effect
# Holiday effects
holidays = np.zeros(365)
holiday_dates = [0, 359, 360, 361] # New Year period
holidays[holiday_dates] = 80
# Combine and add noise
sales = base_demand + weekly_pattern + promotions + holidays
sales += np.random.normal(0, 5, 365) # Random variation
# Create DataFrame
retail_data = pd.DataFrame({
'ds': dates,
'y': sales,
'promotion': promotions,
'holiday': holidays
})
# Split data
train_data = retail_data[:-30]
test_data = retail_data[-30:]
# Fit Prophet model with holidays and promotions
model = Prophet(
seasonality_mode='multiplicative',
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False
)
# Add custom seasonalities
model.add_seasonality(name='monthly', period=30.44, fourier_order=5)
# Add holiday effects
holidays_df = pd.DataFrame({
'holiday': 'promotion',
'ds': dates[promotions > 0],
'lower_window': -2,
'upper_window': 2
})
holidays_df = pd.concat([holidays_df, pd.DataFrame({
'holiday': 'holiday',
'ds': dates[holidays > 0],
'lower_window': -1,
'upper_window': 1
})])
model.holidays = holidays_df
model.fit(train_data[['ds', 'y']])
# Generate forecast
future = model.make_future_dataframe(periods=30, freq='D')
forecast = model.predict(future)
# Calculate accuracy
test_forecast = forecast[forecast['ds'].isin(test_data['ds'])]
mae = mean_absolute_error(test_data['y'], test_forecast['yhat'])
mape = np.mean(np.abs((test_data['y'] - test_forecast['yhat']) / test_data['y'])) * 100
print(f"Retail Demand Forecasting Results:")
print(f"MAE: {mae:.2f} units")
print(f"MAPE: {mape:.2f}%")
print(f"Average daily sales: {retail_data['y'].mean():.2f} units")
# Plot results
plt.figure(figsize=(15, 8))
# Plot sales data
plt.plot(retail_data['ds'], retail_data['y'], 'b-', alpha=0.7, linewidth=1, label='Actual Sales')
# Plot forecast
plt.plot(forecast['ds'], forecast['yhat'], 'r-', linewidth=2, label='Forecast')
plt.fill_between(forecast['ds'], forecast['yhat_lower'], forecast['yhat_upper'],
color='red', alpha=0.2, label='95% Confidence Interval')
# Highlight promotions and holidays
promo_dates = retail_data[retail_data['promotion'] > 0]['ds']
holiday_dates = retail_data[retail_data['holiday'] > 0]['ds']
for date in promo_dates:
plt.axvline(x=date, color='green', alpha=0.5, linewidth=1)
for date in holiday_dates:
plt.axvline(x=date, color='orange', alpha=0.7, linewidth=2)
plt.axvline(x=train_data['ds'].iloc[-1], color='k', linestyle='--', alpha=0.7, label='Train/Test Split')
plt.title('Retail Demand Forecasting with Promotions and Holidays')
plt.xlabel('Date')
plt.ylabel('Sales Units')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Business impact analysis
print(f"\nBusiness Impact Analysis:")
print(f"Total forecasted sales (30 days): {test_forecast['yhat'].sum():.0f} units")
print(f"Total actual sales (30 days): {test_data['y'].sum():.0f} units")
print(f"Forecast accuracy: {(1 - mae/test_data['y'].mean()):.1%}")
# Inventory recommendations
safety_stock = test_forecast['yhat_upper'] - test_forecast['yhat']
print(f"Recommended safety stock (average): {safety_stock.mean():.1f} units")
print(f"Stockout risk: {(test_data['y'] > test_forecast['yhat_upper']).mean():.1%}")
return model, forecast
# Run retail forecasting example
retail_model, retail_forecast = retail_demand_forecasting()
๐ฏ Key Takeawaysยถ
Ensemble methods improve accuracy by combining multiple models
Uncertainty quantification is crucial for decision making
Production deployment requires robust engineering practices
Business applications drive forecasting model selection
Monitoring and maintenance ensure long-term performance
๐ When to Use Advanced Techniquesยถ
โ Good For:ยถ
High-stakes decisions: Financial forecasting, inventory management
Complex patterns: Multiple seasonalities, external factors
Large datasets: Sufficient data for ensemble methods
Production systems: Reliable, scalable deployment needed
Business impact: Direct revenue/cost implications
โ Less Ideal For:ยถ
Simple problems: Basic ARIMA may suffice
Limited data: Deep learning needs substantial training data
Real-time constraints: Some methods are computationally expensive
Interpretability required: Ensemble methods can be black-box
๐ก Pro Tipsยถ
Start simple: Use traditional methods first, add complexity as needed
Ensemble wisely: Combine diverse models for best results
Quantify uncertainty: Always provide confidence intervals
Monitor performance: Set up automated monitoring and alerts
Domain knowledge: Incorporate business rules and expert knowledge
Data quality: Clean, consistent data is more important than fancy models
Iterate quickly: Test, measure, improve in short cycles
Scale gradually: Start small, expand as confidence grows
๐ Next Stepsยถ
Now that you understand advanced techniques, youโre ready for:
Bayesian Forecasting: Probabilistic approaches with uncertainty
Multivariate Methods: Forecasting multiple related time series
Deep Time Series: Advanced neural architectures
Reinforcement Learning: Sequential decision making
MLOps for Time Series: Production deployment and monitoring
๐ Further Readingยถ
โForecasting: Principles and Practiceโ: Comprehensive forecasting textbook
โTime Series Analysis and Its Applicationsโ: Statistical foundations
MLOps Zoomcamp: Production ML engineering
โPractical Time Series Forecastingโ: Business applications
Congratulations! Youโve completed the comprehensive time series forecasting curriculum! ๐๐
From basic concepts to advanced production systems, you now have the knowledge to tackle real-world forecasting challenges. Remember: the best model is the one that solves your business problem reliably and efficiently.