01: Time Series Fundamentalsยถ
โTime is natureโs way of keeping everything from happening at once.โ - Woody Allen
Welcome to the world of time series analysis! This notebook covers the fundamental concepts that form the foundation of all time series forecasting methods.
๐ฏ Learning Objectivesยถ
By the end of this notebook, youโll understand:
What makes time series data special
Components of time series (trend, seasonality, noise)
Stationarity and why it matters
Autocorrelation and cross-correlation
Time series decomposition techniques
๐ What is Time Series Data?ยถ
Time series data is a sequence of data points collected or recorded at regular time intervals. Unlike cross-sectional data, time series data has a temporal ordering that carries important information.
Key Characteristics:ยถ
Temporal ordering: Observations are ordered by time
Dependency: Future values often depend on past values
Irregular patterns: Trends, seasonality, cycles, noise
Non-stationarity: Statistical properties may change over time
import numpy as np
,
,
,
,
,
,
,
,
,
,
,
husl")
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['font.size'] = 12
# Set random seed for reproducibility
np.random.seed(42)
print("Time Series Analysis Libraries Loaded!")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
๐ Time Series Componentsยถ
Time series can be decomposed into four main components:
Trend: Long-term movement (increasing/decreasing)
Seasonality: Regular patterns that repeat over time
Cycles: Long-term oscillations not tied to calendar
Noise: Random fluctuations
Mathematical Model:ยถ
Where:
\(y_t\): Observed value at time t
\(T_t\): Trend component
\(S_t\): Seasonal component
\(C_t\): Cyclical component
\(\epsilon_t\): Random noise
def generate_time_series(n_points=365, trend_slope=0.1, seasonal_amplitude=10, noise_std=2):
"""Generate synthetic time series with trend, seasonality, and noise"""
# Time index
t = np.arange(n_points)
# Trend component
trend = trend_slope * t
# Seasonal component (yearly cycle)
seasonality = seasonal_amplitude * np.sin(2 * np.pi * t / 365)
# Add monthly seasonality
monthly_seasonal = 3 * np.sin(2 * np.pi * t / 30)
# Cyclical component (business cycle)
cycle = 5 * np.sin(2 * np.pi * t / 180) # ~6 month cycle
# Random noise
noise = np.random.normal(0, noise_std, n_points)
# Combine components
y = 50 + trend + seasonality + monthly_seasonal + cycle + noise
# Create DataFrame
dates = pd.date_range('2020-01-01', periods=n_points, freq='D')
df = pd.DataFrame({
'date': dates,
'value': y,
'trend': 50 + trend,
'seasonality': seasonality + monthly_seasonal,
'cycle': cycle,
'noise': noise
})
return df.set_index('date')
# Generate sample time series
ts_data = generate_time_series()
# Plot components
fig, axes = plt.subplots(3, 2, figsize=(15, 12))
# Original time series
axes[0,0].plot(ts_data.index, ts_data['value'], 'b-', linewidth=1.5)
axes[0,0].set_title('Original Time Series')
axes[0,0].grid(True, alpha=0.3)
# Individual components
axes[0,1].plot(ts_data.index, ts_data['trend'], 'r-', label='Trend')
axes[0,1].set_title('Trend Component')
axes[0,1].grid(True, alpha=0.3)
axes[1,0].plot(ts_data.index, ts_data['seasonality'], 'g-', label='Seasonality')
axes[1,0].set_title('Seasonal Component')
axes[1,0].grid(True, alpha=0.3)
axes[1,1].plot(ts_data.index, ts_data['cycle'], 'orange', label='Cycle')
axes[1,1].set_title('Cyclical Component')
axes[1,1].grid(True, alpha=0.3)
axes[2,0].plot(ts_data.index, ts_data['noise'], 'purple', alpha=0.7, label='Noise')
axes[2,0].set_title('Noise Component')
axes[2,0].grid(True, alpha=0.3)
# Component breakdown
axes[2,1].plot(ts_data.index, ts_data['trend'], 'r-', alpha=0.7, label='Trend')
axes[2,1].plot(ts_data.index, ts_data['trend'] + ts_data['seasonality'], 'g-', alpha=0.7, label='Trend + Seasonality')
axes[2,1].plot(ts_data.index, ts_data['value'], 'b-', alpha=0.7, label='Full Series')
axes[2,1].set_title('Component Accumulation')
axes[2,1].legend()
axes[2,1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("Time Series Components:")
print(f"Mean: {ts_data['value'].mean():.2f}")
print(f"Std: {ts_data['value'].std():.2f}")
print(f"Trend slope: {0.1:.3f} per day")
print(f"Seasonal amplitude: {10:.1f}")
print(f"Noise std: {2:.1f}")
๐ Stationarityยถ
Stationarity is a crucial concept in time series analysis. A stationary time series has statistical properties that do not change over time.
Types of Stationarity:ยถ
Strict Stationarity: Joint distribution of any set of observations is invariant to time shifts
Weak Stationarity (Covariance Stationarity):
Constant mean: \(E[y_t] = \mu\) for all t
Constant variance: \(Var(y_t) = \sigma^2\) for all t
Constant autocovariance: \(Cov(y_t, y_{t+k}) = \gamma_k\) for all t
Why Stationarity Matters:ยถ
Many statistical methods assume stationarity
Non-stationary data can lead to spurious regressions
Stationary data is easier to model and forecast
def check_stationarity(timeseries, window=30):
"""Check stationarity using rolling statistics and Augmented Dickey-Fuller test"""
# Rolling statistics
rolling_mean = timeseries.rolling(window=window).mean()
rolling_std = timeseries.rolling(window=window).std()
# Plot rolling statistics
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
ax1.plot(timeseries.index, timeseries.values, label='Original')
ax1.plot(rolling_mean.index, rolling_mean.values, label=f'Rolling Mean ({window} periods)')
ax1.set_title('Rolling Mean')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax2.plot(timeseries.index, timeseries.values, label='Original')
ax2.plot(rolling_std.index, rolling_std.values, label=f'Rolling Std ({window} periods)', color='orange')
ax2.set_title('Rolling Standard Deviation')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Augmented Dickey-Fuller test
print('Augmented Dickey-Fuller Test:')
adf_result = adfuller(timeseries.dropna())
print(f'ADF Statistic: {adf_result[0]:.4f}')
print(f'p-value: {adf_result[1]:.4f}')
print(f'Critical Values:')
for key, value in adf_result[4].items():
print(f'\t{key}: {value:.4f}')
if adf_result[1] < 0.05:
print("\nโ
Series is stationary (reject null hypothesis)")
else:
print("\nโ Series is non-stationary (fail to reject null hypothesis)")
return adf_result
# Test stationarity on different series
print("=== Testing Stationarity on Original Series ===")
adf_original = check_stationarity(ts_data['value'])
print("\n=== Testing Stationarity on Trend Component ===")
adf_trend = check_stationarity(ts_data['trend'])
print("\n=== Testing Stationarity on Noise Component ===")
adf_noise = check_stationarity(ts_data['noise'])
๐ Making Time Series Stationaryยถ
Non-stationary time series can often be made stationary through transformations:
Common Transformations:ยถ
Differencing: Subtract consecutive observations
First difference: \(\nabla y_t = y_t - y_{t-1}\)
Second difference: \(\nabla^2 y_t = \nabla y_t - \nabla y_{t-1}\)
Log transformation: Stabilize variance
\(y_t' = \log(y_t)\)
Seasonal differencing: Remove seasonal patterns
\(y_t' = y_t - y_{t-s}\) where s is seasonal period
Detrending: Remove trend component
def make_stationary(timeseries, method='difference', lag=1):
"""Apply transformations to make time series stationary"""
if method == 'difference':
# Regular differencing
stationary = timeseries.diff(lag).dropna()
title = f'First Difference (lag={lag})'
elif method == 'log':
# Log transformation
stationary = np.log(timeseries).diff().dropna()
title = 'Log Difference'
elif method == 'seasonal_diff':
# Seasonal differencing (assuming daily data with weekly seasonality)
seasonal_lag = 7
stationary = timeseries.diff(seasonal_lag).dropna()
title = f'Seasonal Difference (lag={seasonal_lag})'
else:
stationary = timeseries
title = 'Original'
return stationary, title
# Apply different transformations
transformations = [
('difference', 1),
('seasonal_diff', 7),
('log', 1)
]
fig, axes = plt.subplots(len(transformations), 2, figsize=(15, 12))
for i, (method, lag) in enumerate(transformations):
# Apply transformation
stationary, title = make_stationary(ts_data['value'], method, lag)
# Plot transformed series
axes[i, 0].plot(stationary.index, stationary.values, 'b-', alpha=0.7)
axes[i, 0].set_title(f'{title}')
axes[i, 0].grid(True, alpha=0.3)
axes[i, 0].axhline(y=0, color='r', linestyle='--', alpha=0.5)
# Check stationarity
adf_result = adfuller(stationary)
p_value = adf_result[1]
# Plot ACF
lag_acf = acf(stationary, nlags=50)
axes[i, 1].plot(lag_acf, 'o-')
axes[i, 1].axhline(y=0, linestyle='--', color='gray')
axes[i, 1].axhline(y=1.96/np.sqrt(len(stationary)), linestyle='--', color='red', alpha=0.7)
axes[i, 1].axhline(y=-1.96/np.sqrt(len(stationary)), linestyle='--', color='red', alpha=0.7)
axes[i, 1].set_title(f'ACF - {title} (p-value: {p_value:.4f})')
axes[i, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Compare stationarity tests
print("Stationarity Test Results:")
print("-" * 50)
for method, lag in transformations:
stationary, title = make_stationary(ts_data['value'], method, lag)
adf_result = adfuller(stationary)
p_value = adf_result[1]
is_stationary = "โ
Stationary" if p_value < 0.05 else "โ Non-stationary"
print(f"{title:20} | p-value: {p_value:.4f} | {is_stationary}")
๐ Autocorrelationยถ
Autocorrelation measures the correlation between a time series and its lagged versions.
Autocorrelation Function (ACF):ยถ
Partial Autocorrelation Function (PACF):ยถ
Direct correlation between \(y_t\) and \(y_{t-k}\), removing effects of intermediate lags.
Interpretation:ยถ
ACF: Shows all correlations up to lag k
PACF: Shows direct correlation at lag k
Significance bounds: ยฑ1.96/โn (95% confidence)
Patterns: Slow decay suggests non-stationarity, sharp cutoff suggests AR process
def plot_acf_pacf(timeseries, lags=50, title=""):
"""Plot ACF and PACF with significance bounds"""
# Calculate ACF and PACF
acf_values = acf(timeseries, nlags=lags)
pacf_values = pacf(timeseries, nlags=lags)
# Confidence bounds
n = len(timeseries)
conf_bound = 1.96 / np.sqrt(n)
# Create plots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
# ACF plot
ax1.stem(range(len(acf_values)), acf_values, linefmt='b-', markerfmt='bo', basefmt='r-')
ax1.axhline(y=0, linestyle='-', color='gray', alpha=0.7)
ax1.axhline(y=conf_bound, linestyle='--', color='red', alpha=0.7, label=f'95% conf. bound (ยฑ{conf_bound:.3f})')
ax1.axhline(y=-conf_bound, linestyle='--', color='red', alpha=0.7)
ax1.set_title(f'Autocorrelation Function (ACF) - {title}')
ax1.set_xlabel('Lag')
ax1.set_ylabel('Autocorrelation')
ax1.grid(True, alpha=0.3)
ax1.legend()
# PACF plot
ax2.stem(range(len(pacf_values)), pacf_values, linefmt='g-', markerfmt='go', basefmt='r-')
ax2.axhline(y=0, linestyle='-', color='gray', alpha=0.7)
ax2.axhline(y=conf_bound, linestyle='--', color='red', alpha=0.7, label=f'95% conf. bound (ยฑ{conf_bound:.3f})')
ax2.axhline(y=-conf_bound, linestyle='--', color='red', alpha=0.7)
ax2.set_title(f'Partial Autocorrelation Function (PACF) - {title}')
ax2.set_xlabel('Lag')
ax2.set_ylabel('Partial Autocorrelation')
ax2.grid(True, alpha=0.3)
ax2.legend()
plt.tight_layout()
plt.show()
return acf_values, pacf_values
# Analyze autocorrelation for different series
print("=== ACF/PACF for Original Series ===")
acf_orig, pacf_orig = plot_acf_pacf(ts_data['value'], title="Original Series")
print("\n=== ACF/PACF for Stationary Series (First Difference) ===")
stationary_series, _ = make_stationary(ts_data['value'], 'difference', 1)
acf_stat, pacf_stat = plot_acf_pacf(stationary_series, title="Stationary Series")
print("\n=== ACF/PACF for Noise Component ===")
acf_noise, pacf_noise = plot_acf_pacf(ts_data['noise'].dropna(), title="Noise Component")
๐งฉ Time Series Decompositionยถ
Decomposition breaks down a time series into its constituent components. There are two main approaches:
1. Additive Decomposition:ยถ
2. Multiplicative Decomposition:ยถ
Methods:ยถ
Classical: Moving averages for trend, seasonal indices
STL: Seasonal and Trend decomposition using LOESS
X-13ARIMA-SEATS: Advanced seasonal adjustment
Applications:ยถ
Remove seasonality for forecasting
Identify unusual patterns
Understand component contributions
def decompose_time_series(timeseries, model='additive', period=None):
"""Decompose time series into trend, seasonal, and residual components"""
if period is None:
# Try to infer period from data
if len(timeseries) > 365:
period = 365 # Daily data with yearly seasonality
elif len(timeseries) > 30:
period = 7 # Weekly seasonality
else:
period = 12 # Monthly seasonality
# Perform decomposition
decomposition = seasonal_decompose(timeseries, model=model, period=period)
# Plot components
fig, axes = plt.subplots(4, 1, figsize=(15, 12))
axes[0].plot(timeseries.index, timeseries.values, 'b-', linewidth=1.5)
axes[0].set_title(f'Original Time Series (Model: {model})')
axes[0].grid(True, alpha=0.3)
axes[1].plot(decomposition.trend.index, decomposition.trend.values, 'r-', linewidth=2)
axes[1].set_title('Trend Component')
axes[1].grid(True, alpha=0.3)
axes[2].plot(decomposition.seasonal.index, decomposition.seasonal.values, 'g-', linewidth=2)
axes[2].set_title('Seasonal Component')
axes[2].grid(True, alpha=0.3)
axes[3].plot(decomposition.resid.index, decomposition.resid.values, 'purple', alpha=0.7)
axes[3].set_title('Residual Component')
axes[3].grid(True, alpha=0.3)
axes[3].axhline(y=0, color='r', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
# Print statistics
print(f"\nDecomposition Statistics ({model} model):")
print("-" * 40)
print(f"Trend component range: [{decomposition.trend.min():.2f}, {decomposition.trend.max():.2f}]")
print(f"Seasonal component range: [{decomposition.seasonal.min():.2f}, {decomposition.seasonal.max():.2f}]")
print(f"Residual std: {decomposition.resid.std():.4f}")
print(f"Residual autocorrelation (lag 1): {decomposition.resid.autocorr(lag=1):.4f}")
return decomposition
# Decompose the time series
print("=== Additive Decomposition ===")
decomp_add = decompose_time_series(ts_data['value'], model='additive', period=365)
# Create multiplicative series for comparison
multiplicative_series = ts_data['value'] + 100 # Ensure positive values
multiplicative_series = multiplicative_series * (1 + 0.1 * np.sin(2 * np.pi * np.arange(len(multiplicative_series)) / 365))
print("\n=== Multiplicative Decomposition ===")
decomp_mult = decompose_time_series(multiplicative_series, model='multiplicative', period=365)
๐ Cross-Correlationยถ
Cross-correlation measures the relationship between two time series at different lags.
Cross-Correlation Function (CCF):ยถ
Applications:ยถ
Leading indicators: Find variables that predict others
Causality analysis: Direction of influence
Feature selection: Identify relevant predictors
Portfolio analysis: Asset relationships
from scipy import signal
def plot_cross_correlation(x, y, max_lags=50, title=""):
"""Plot cross-correlation between two time series"""
# Calculate cross-correlation
corr = signal.correlate(x - np.mean(x), y - np.mean(y), mode='full')
lags = signal.correlation_lags(len(x), len(y), mode='full')
# Normalize
corr = corr / (np.std(x) * np.std(y) * len(x))
# Focus on reasonable lag range
center = len(lags) // 2
start_idx = center - max_lags
end_idx = center + max_lags + 1
lags_subset = lags[start_idx:end_idx]
corr_subset = corr[start_idx:end_idx]
# Plot
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
# Time series
ax1.plot(x.index, x.values, 'b-', alpha=0.7, label='Series X')
ax1.plot(y.index, y.values, 'r-', alpha=0.7, label='Series Y')
ax1.set_title(f'Time Series Comparison - {title}')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Cross-correlation
ax2.stem(lags_subset, corr_subset, linefmt='g-', markerfmt='go', basefmt='r-')
ax2.axhline(y=0, linestyle='-', color='gray', alpha=0.7)
conf_bound = 1.96 / np.sqrt(len(x))
ax2.axhline(y=conf_bound, linestyle='--', color='red', alpha=0.7, label=f'95% conf. bound (ยฑ{conf_bound:.3f})')
ax2.axhline(y=-conf_bound, linestyle='--', color='red', alpha=0.7)
ax2.set_title(f'Cross-Correlation Function - {title}')
ax2.set_xlabel('Lag')
ax2.set_ylabel('Cross-Correlation')
ax2.grid(True, alpha=0.3)
ax2.legend()
plt.tight_layout()
plt.show()
# Find maximum correlation
max_corr_idx = np.argmax(np.abs(corr_subset))
max_corr_lag = lags_subset[max_corr_idx]
max_corr_value = corr_subset[max_corr_idx]
print(f"Maximum cross-correlation: {max_corr_value:.4f} at lag {max_corr_lag}")
return corr_subset, lags_subset
# Create related time series
np.random.seed(123)
n_points = 200
# Series X: Base signal
t = np.arange(n_points)
x = 10 + 0.5 * t + 5 * np.sin(2 * np.pi * t / 50) + np.random.normal(0, 2, n_points)
# Series Y: Lagged version of X with some noise
lag = 10
y = np.roll(x, lag) + np.random.normal(0, 3, n_points)
y[:lag] = x[:lag] # Handle beginning
# Create DataFrame
cross_corr_data = pd.DataFrame({
'X': x,
'Y': y
}, index=pd.date_range('2020-01-01', periods=n_points, freq='D'))
# Plot cross-correlation
print("=== Cross-Correlation Analysis ===")
print(f"Series Y is Series X shifted by {lag} periods with added noise")
corr_values, lag_values = plot_cross_correlation(cross_corr_data['X'], cross_corr_data['Y'],
title="Lagged Relationship")
# Real-world example: Temperature vs Energy consumption
print("\n=== Real-world Example: Temperature vs Energy Consumption ===")
# Simulate temperature (seasonal) and energy consumption (lagged response)
temp = 20 + 10 * np.sin(2 * np.pi * t / 365) + np.random.normal(0, 2, n_points)
energy_lag = 7 # Energy consumption lags temperature by about a week
energy = 100 + 0.8 * np.roll(temp, energy_lag) + np.random.normal(0, 5, n_points)
energy[:energy_lag] = 100 + 0.8 * temp[:energy_lag]
weather_data = pd.DataFrame({
'temperature': temp,
'energy_consumption': energy
}, index=pd.date_range('2020-01-01', periods=n_points, freq='D'))
corr_weather, lags_weather = plot_cross_correlation(weather_data['temperature'],
weather_data['energy_consumption'],
title="Temperature vs Energy Consumption")
๐ฏ Key Takeawaysยถ
Time series components: Trend, seasonality, cycles, and noise
Stationarity: Critical for most statistical methods
Transformations: Differencing, logging, seasonal adjustment
Autocorrelation: ACF shows all correlations, PACF shows direct correlations
Decomposition: Break down series into interpretable components
Cross-correlation: Find relationships between different time series
๐ Diagnostic Checklistยถ
Before modeling any time series, ask:
โ Is the series stationary?
โ What are the main components (trend, seasonality)?
โ How strong is the autocorrelation?
โ Are there missing values or outliers?
โ Whatโs the appropriate forecasting horizon?
โ What evaluation metrics matter for this use case?
๐ Next Stepsยถ
Now that you understand time series fundamentals, youโre ready for:
Classical statistical methods (ARIMA, exponential smoothing)
Facebook Prophet for automated forecasting
Deep learning approaches (LSTM, Transformer)
Real-world applications and deployment
๐ก Pro Tipsยถ
Always visualize first: Plot the data before any analysis
Check stationarity: Most methods assume stationarity
Domain knowledge matters: Understand the business context
Start simple: Donโt jump to complex models immediately
Validate thoroughly: Use proper cross-validation for time series
Monitor assumptions: Check if model assumptions hold
๐ Further Readingยถ
โForecasting: Principles and Practiceโ by Hyndman & Athanasopoulos
โTime Series Analysis and Its Applicationsโ by Shumway & Stoffer
โPractical Time Series Forecasting with Rโ by Hyndman & Khandakar
Ready to dive into classical forecasting methods? Letโs move to the next notebook! ๐๐ฎ