01: Time Series Fundamentalsยถ

โ€œTime is natureโ€™s way of keeping everything from happening at once.โ€ - Woody Allen

Welcome to the world of time series analysis! This notebook covers the fundamental concepts that form the foundation of all time series forecasting methods.

๐ŸŽฏ Learning Objectivesยถ

By the end of this notebook, youโ€™ll understand:

  • What makes time series data special

  • Components of time series (trend, seasonality, noise)

  • Stationarity and why it matters

  • Autocorrelation and cross-correlation

  • Time series decomposition techniques

๐Ÿ“Š What is Time Series Data?ยถ

Time series data is a sequence of data points collected or recorded at regular time intervals. Unlike cross-sectional data, time series data has a temporal ordering that carries important information.

Key Characteristics:ยถ

  • Temporal ordering: Observations are ordered by time

  • Dependency: Future values often depend on past values

  • Irregular patterns: Trends, seasonality, cycles, noise

  • Non-stationarity: Statistical properties may change over time

import numpy as np
,
,
,
,
,
,
,
,
,
,
,
husl")
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['font.size'] = 12

# Set random seed for reproducibility
np.random.seed(42)

print("Time Series Analysis Libraries Loaded!")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")

๐Ÿ” Time Series Componentsยถ

Time series can be decomposed into four main components:

  1. Trend: Long-term movement (increasing/decreasing)

  2. Seasonality: Regular patterns that repeat over time

  3. Cycles: Long-term oscillations not tied to calendar

  4. Noise: Random fluctuations

Mathematical Model:ยถ

\[y_t = T_t + S_t + C_t + \epsilon_t\]

Where:

  • \(y_t\): Observed value at time t

  • \(T_t\): Trend component

  • \(S_t\): Seasonal component

  • \(C_t\): Cyclical component

  • \(\epsilon_t\): Random noise

def generate_time_series(n_points=365, trend_slope=0.1, seasonal_amplitude=10, noise_std=2):
    """Generate synthetic time series with trend, seasonality, and noise"""
    
    # Time index
    t = np.arange(n_points)
    
    # Trend component
    trend = trend_slope * t
    
    # Seasonal component (yearly cycle)
    seasonality = seasonal_amplitude * np.sin(2 * np.pi * t / 365)
    
    # Add monthly seasonality
    monthly_seasonal = 3 * np.sin(2 * np.pi * t / 30)
    
    # Cyclical component (business cycle)
    cycle = 5 * np.sin(2 * np.pi * t / 180)  # ~6 month cycle
    
    # Random noise
    noise = np.random.normal(0, noise_std, n_points)
    
    # Combine components
    y = 50 + trend + seasonality + monthly_seasonal + cycle + noise
    
    # Create DataFrame
    dates = pd.date_range('2020-01-01', periods=n_points, freq='D')
    df = pd.DataFrame({
        'date': dates,
        'value': y,
        'trend': 50 + trend,
        'seasonality': seasonality + monthly_seasonal,
        'cycle': cycle,
        'noise': noise
    })
    
    return df.set_index('date')

# Generate sample time series
ts_data = generate_time_series()

# Plot components
fig, axes = plt.subplots(3, 2, figsize=(15, 12))

# Original time series
axes[0,0].plot(ts_data.index, ts_data['value'], 'b-', linewidth=1.5)
axes[0,0].set_title('Original Time Series')
axes[0,0].grid(True, alpha=0.3)

# Individual components
axes[0,1].plot(ts_data.index, ts_data['trend'], 'r-', label='Trend')
axes[0,1].set_title('Trend Component')
axes[0,1].grid(True, alpha=0.3)

axes[1,0].plot(ts_data.index, ts_data['seasonality'], 'g-', label='Seasonality')
axes[1,0].set_title('Seasonal Component')
axes[1,0].grid(True, alpha=0.3)

axes[1,1].plot(ts_data.index, ts_data['cycle'], 'orange', label='Cycle')
axes[1,1].set_title('Cyclical Component')
axes[1,1].grid(True, alpha=0.3)

axes[2,0].plot(ts_data.index, ts_data['noise'], 'purple', alpha=0.7, label='Noise')
axes[2,0].set_title('Noise Component')
axes[2,0].grid(True, alpha=0.3)

# Component breakdown
axes[2,1].plot(ts_data.index, ts_data['trend'], 'r-', alpha=0.7, label='Trend')
axes[2,1].plot(ts_data.index, ts_data['trend'] + ts_data['seasonality'], 'g-', alpha=0.7, label='Trend + Seasonality')
axes[2,1].plot(ts_data.index, ts_data['value'], 'b-', alpha=0.7, label='Full Series')
axes[2,1].set_title('Component Accumulation')
axes[2,1].legend()
axes[2,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Time Series Components:")
print(f"Mean: {ts_data['value'].mean():.2f}")
print(f"Std: {ts_data['value'].std():.2f}")
print(f"Trend slope: {0.1:.3f} per day")
print(f"Seasonal amplitude: {10:.1f}")
print(f"Noise std: {2:.1f}")

๐Ÿ“ˆ Stationarityยถ

Stationarity is a crucial concept in time series analysis. A stationary time series has statistical properties that do not change over time.

Types of Stationarity:ยถ

  1. Strict Stationarity: Joint distribution of any set of observations is invariant to time shifts

  2. Weak Stationarity (Covariance Stationarity):

    • Constant mean: \(E[y_t] = \mu\) for all t

    • Constant variance: \(Var(y_t) = \sigma^2\) for all t

    • Constant autocovariance: \(Cov(y_t, y_{t+k}) = \gamma_k\) for all t

Why Stationarity Matters:ยถ

  • Many statistical methods assume stationarity

  • Non-stationary data can lead to spurious regressions

  • Stationary data is easier to model and forecast

def check_stationarity(timeseries, window=30):
    """Check stationarity using rolling statistics and Augmented Dickey-Fuller test"""
    
    # Rolling statistics
    rolling_mean = timeseries.rolling(window=window).mean()
    rolling_std = timeseries.rolling(window=window).std()
    
    # Plot rolling statistics
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
    
    ax1.plot(timeseries.index, timeseries.values, label='Original')
    ax1.plot(rolling_mean.index, rolling_mean.values, label=f'Rolling Mean ({window} periods)')
    ax1.set_title('Rolling Mean')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    ax2.plot(timeseries.index, timeseries.values, label='Original')
    ax2.plot(rolling_std.index, rolling_std.values, label=f'Rolling Std ({window} periods)', color='orange')
    ax2.set_title('Rolling Standard Deviation')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Augmented Dickey-Fuller test
    print('Augmented Dickey-Fuller Test:')
    adf_result = adfuller(timeseries.dropna())
    
    print(f'ADF Statistic: {adf_result[0]:.4f}')
    print(f'p-value: {adf_result[1]:.4f}')
    print(f'Critical Values:')
    for key, value in adf_result[4].items():
        print(f'\t{key}: {value:.4f}')
    
    if adf_result[1] < 0.05:
        print("\nโœ… Series is stationary (reject null hypothesis)")
    else:
        print("\nโŒ Series is non-stationary (fail to reject null hypothesis)")
    
    return adf_result

# Test stationarity on different series
print("=== Testing Stationarity on Original Series ===")
adf_original = check_stationarity(ts_data['value'])

print("\n=== Testing Stationarity on Trend Component ===")
adf_trend = check_stationarity(ts_data['trend'])

print("\n=== Testing Stationarity on Noise Component ===")
adf_noise = check_stationarity(ts_data['noise'])

๐Ÿ”„ Making Time Series Stationaryยถ

Non-stationary time series can often be made stationary through transformations:

Common Transformations:ยถ

  1. Differencing: Subtract consecutive observations

    • First difference: \(\nabla y_t = y_t - y_{t-1}\)

    • Second difference: \(\nabla^2 y_t = \nabla y_t - \nabla y_{t-1}\)

  2. Log transformation: Stabilize variance

    • \(y_t' = \log(y_t)\)

  3. Seasonal differencing: Remove seasonal patterns

    • \(y_t' = y_t - y_{t-s}\) where s is seasonal period

  4. Detrending: Remove trend component

def make_stationary(timeseries, method='difference', lag=1):
    """Apply transformations to make time series stationary"""
    
    if method == 'difference':
        # Regular differencing
        stationary = timeseries.diff(lag).dropna()
        title = f'First Difference (lag={lag})'
    elif method == 'log':
        # Log transformation
        stationary = np.log(timeseries).diff().dropna()
        title = 'Log Difference'
    elif method == 'seasonal_diff':
        # Seasonal differencing (assuming daily data with weekly seasonality)
        seasonal_lag = 7
        stationary = timeseries.diff(seasonal_lag).dropna()
        title = f'Seasonal Difference (lag={seasonal_lag})'
    else:
        stationary = timeseries
        title = 'Original'
    
    return stationary, title

# Apply different transformations
transformations = [
    ('difference', 1),
    ('seasonal_diff', 7),
    ('log', 1)
]

fig, axes = plt.subplots(len(transformations), 2, figsize=(15, 12))

for i, (method, lag) in enumerate(transformations):
    # Apply transformation
    stationary, title = make_stationary(ts_data['value'], method, lag)
    
    # Plot transformed series
    axes[i, 0].plot(stationary.index, stationary.values, 'b-', alpha=0.7)
    axes[i, 0].set_title(f'{title}')
    axes[i, 0].grid(True, alpha=0.3)
    axes[i, 0].axhline(y=0, color='r', linestyle='--', alpha=0.5)
    
    # Check stationarity
    adf_result = adfuller(stationary)
    p_value = adf_result[1]
    
    # Plot ACF
    lag_acf = acf(stationary, nlags=50)
    axes[i, 1].plot(lag_acf, 'o-')
    axes[i, 1].axhline(y=0, linestyle='--', color='gray')
    axes[i, 1].axhline(y=1.96/np.sqrt(len(stationary)), linestyle='--', color='red', alpha=0.7)
    axes[i, 1].axhline(y=-1.96/np.sqrt(len(stationary)), linestyle='--', color='red', alpha=0.7)
    axes[i, 1].set_title(f'ACF - {title} (p-value: {p_value:.4f})')
    axes[i, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Compare stationarity tests
print("Stationarity Test Results:")
print("-" * 50)

for method, lag in transformations:
    stationary, title = make_stationary(ts_data['value'], method, lag)
    adf_result = adfuller(stationary)
    p_value = adf_result[1]
    is_stationary = "โœ… Stationary" if p_value < 0.05 else "โŒ Non-stationary"
    print(f"{title:20} | p-value: {p_value:.4f} | {is_stationary}")

๐Ÿ”— Autocorrelationยถ

Autocorrelation measures the correlation between a time series and its lagged versions.

Autocorrelation Function (ACF):ยถ

\[\rho_k = \frac{\gamma_k}{\gamma_0} = \frac{Cov(y_t, y_{t-k})}{Var(y_t)}\]

Partial Autocorrelation Function (PACF):ยถ

Direct correlation between \(y_t\) and \(y_{t-k}\), removing effects of intermediate lags.

Interpretation:ยถ

  • ACF: Shows all correlations up to lag k

  • PACF: Shows direct correlation at lag k

  • Significance bounds: ยฑ1.96/โˆšn (95% confidence)

  • Patterns: Slow decay suggests non-stationarity, sharp cutoff suggests AR process

def plot_acf_pacf(timeseries, lags=50, title=""):
    """Plot ACF and PACF with significance bounds"""
    
    # Calculate ACF and PACF
    acf_values = acf(timeseries, nlags=lags)
    pacf_values = pacf(timeseries, nlags=lags)
    
    # Confidence bounds
    n = len(timeseries)
    conf_bound = 1.96 / np.sqrt(n)
    
    # Create plots
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
    
    # ACF plot
    ax1.stem(range(len(acf_values)), acf_values, linefmt='b-', markerfmt='bo', basefmt='r-')
    ax1.axhline(y=0, linestyle='-', color='gray', alpha=0.7)
    ax1.axhline(y=conf_bound, linestyle='--', color='red', alpha=0.7, label=f'95% conf. bound (ยฑ{conf_bound:.3f})')
    ax1.axhline(y=-conf_bound, linestyle='--', color='red', alpha=0.7)
    ax1.set_title(f'Autocorrelation Function (ACF) - {title}')
    ax1.set_xlabel('Lag')
    ax1.set_ylabel('Autocorrelation')
    ax1.grid(True, alpha=0.3)
    ax1.legend()
    
    # PACF plot
    ax2.stem(range(len(pacf_values)), pacf_values, linefmt='g-', markerfmt='go', basefmt='r-')
    ax2.axhline(y=0, linestyle='-', color='gray', alpha=0.7)
    ax2.axhline(y=conf_bound, linestyle='--', color='red', alpha=0.7, label=f'95% conf. bound (ยฑ{conf_bound:.3f})')
    ax2.axhline(y=-conf_bound, linestyle='--', color='red', alpha=0.7)
    ax2.set_title(f'Partial Autocorrelation Function (PACF) - {title}')
    ax2.set_xlabel('Lag')
    ax2.set_ylabel('Partial Autocorrelation')
    ax2.grid(True, alpha=0.3)
    ax2.legend()
    
    plt.tight_layout()
    plt.show()
    
    return acf_values, pacf_values

# Analyze autocorrelation for different series
print("=== ACF/PACF for Original Series ===")
acf_orig, pacf_orig = plot_acf_pacf(ts_data['value'], title="Original Series")

print("\n=== ACF/PACF for Stationary Series (First Difference) ===")
stationary_series, _ = make_stationary(ts_data['value'], 'difference', 1)
acf_stat, pacf_stat = plot_acf_pacf(stationary_series, title="Stationary Series")

print("\n=== ACF/PACF for Noise Component ===")
acf_noise, pacf_noise = plot_acf_pacf(ts_data['noise'].dropna(), title="Noise Component")

๐Ÿงฉ Time Series Decompositionยถ

Decomposition breaks down a time series into its constituent components. There are two main approaches:

1. Additive Decomposition:ยถ

\[y_t = T_t + S_t + R_t\]

2. Multiplicative Decomposition:ยถ

\[y_t = T_t \times S_t \times R_t\]

Methods:ยถ

  • Classical: Moving averages for trend, seasonal indices

  • STL: Seasonal and Trend decomposition using LOESS

  • X-13ARIMA-SEATS: Advanced seasonal adjustment

Applications:ยถ

  • Remove seasonality for forecasting

  • Identify unusual patterns

  • Understand component contributions

def decompose_time_series(timeseries, model='additive', period=None):
    """Decompose time series into trend, seasonal, and residual components"""
    
    if period is None:
        # Try to infer period from data
        if len(timeseries) > 365:
            period = 365  # Daily data with yearly seasonality
        elif len(timeseries) > 30:
            period = 7   # Weekly seasonality
        else:
            period = 12  # Monthly seasonality
    
    # Perform decomposition
    decomposition = seasonal_decompose(timeseries, model=model, period=period)
    
    # Plot components
    fig, axes = plt.subplots(4, 1, figsize=(15, 12))
    
    axes[0].plot(timeseries.index, timeseries.values, 'b-', linewidth=1.5)
    axes[0].set_title(f'Original Time Series (Model: {model})')
    axes[0].grid(True, alpha=0.3)
    
    axes[1].plot(decomposition.trend.index, decomposition.trend.values, 'r-', linewidth=2)
    axes[1].set_title('Trend Component')
    axes[1].grid(True, alpha=0.3)
    
    axes[2].plot(decomposition.seasonal.index, decomposition.seasonal.values, 'g-', linewidth=2)
    axes[2].set_title('Seasonal Component')
    axes[2].grid(True, alpha=0.3)
    
    axes[3].plot(decomposition.resid.index, decomposition.resid.values, 'purple', alpha=0.7)
    axes[3].set_title('Residual Component')
    axes[3].grid(True, alpha=0.3)
    axes[3].axhline(y=0, color='r', linestyle='--', alpha=0.5)
    
    plt.tight_layout()
    plt.show()
    
    # Print statistics
    print(f"\nDecomposition Statistics ({model} model):")
    print("-" * 40)
    print(f"Trend component range: [{decomposition.trend.min():.2f}, {decomposition.trend.max():.2f}]")
    print(f"Seasonal component range: [{decomposition.seasonal.min():.2f}, {decomposition.seasonal.max():.2f}]")
    print(f"Residual std: {decomposition.resid.std():.4f}")
    print(f"Residual autocorrelation (lag 1): {decomposition.resid.autocorr(lag=1):.4f}")
    
    return decomposition

# Decompose the time series
print("=== Additive Decomposition ===")
decomp_add = decompose_time_series(ts_data['value'], model='additive', period=365)

# Create multiplicative series for comparison
multiplicative_series = ts_data['value'] + 100  # Ensure positive values
multiplicative_series = multiplicative_series * (1 + 0.1 * np.sin(2 * np.pi * np.arange(len(multiplicative_series)) / 365))

print("\n=== Multiplicative Decomposition ===")
decomp_mult = decompose_time_series(multiplicative_series, model='multiplicative', period=365)

๐Ÿ“Š Cross-Correlationยถ

Cross-correlation measures the relationship between two time series at different lags.

Cross-Correlation Function (CCF):ยถ

\[\rho_{xy}(k) = \frac{\gamma_{xy}(k)}{\sqrt{\gamma_{xx}(0)\gamma_{yy}(0)}}\]

Applications:ยถ

  • Leading indicators: Find variables that predict others

  • Causality analysis: Direction of influence

  • Feature selection: Identify relevant predictors

  • Portfolio analysis: Asset relationships

from scipy import signal

def plot_cross_correlation(x, y, max_lags=50, title=""):
    """Plot cross-correlation between two time series"""
    
    # Calculate cross-correlation
    corr = signal.correlate(x - np.mean(x), y - np.mean(y), mode='full')
    lags = signal.correlation_lags(len(x), len(y), mode='full')
    
    # Normalize
    corr = corr / (np.std(x) * np.std(y) * len(x))
    
    # Focus on reasonable lag range
    center = len(lags) // 2
    start_idx = center - max_lags
    end_idx = center + max_lags + 1
    
    lags_subset = lags[start_idx:end_idx]
    corr_subset = corr[start_idx:end_idx]
    
    # Plot
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
    
    # Time series
    ax1.plot(x.index, x.values, 'b-', alpha=0.7, label='Series X')
    ax1.plot(y.index, y.values, 'r-', alpha=0.7, label='Series Y')
    ax1.set_title(f'Time Series Comparison - {title}')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Cross-correlation
    ax2.stem(lags_subset, corr_subset, linefmt='g-', markerfmt='go', basefmt='r-')
    ax2.axhline(y=0, linestyle='-', color='gray', alpha=0.7)
    conf_bound = 1.96 / np.sqrt(len(x))
    ax2.axhline(y=conf_bound, linestyle='--', color='red', alpha=0.7, label=f'95% conf. bound (ยฑ{conf_bound:.3f})')
    ax2.axhline(y=-conf_bound, linestyle='--', color='red', alpha=0.7)
    ax2.set_title(f'Cross-Correlation Function - {title}')
    ax2.set_xlabel('Lag')
    ax2.set_ylabel('Cross-Correlation')
    ax2.grid(True, alpha=0.3)
    ax2.legend()
    
    plt.tight_layout()
    plt.show()
    
    # Find maximum correlation
    max_corr_idx = np.argmax(np.abs(corr_subset))
    max_corr_lag = lags_subset[max_corr_idx]
    max_corr_value = corr_subset[max_corr_idx]
    
    print(f"Maximum cross-correlation: {max_corr_value:.4f} at lag {max_corr_lag}")
    
    return corr_subset, lags_subset

# Create related time series
np.random.seed(123)
n_points = 200

# Series X: Base signal
t = np.arange(n_points)
x = 10 + 0.5 * t + 5 * np.sin(2 * np.pi * t / 50) + np.random.normal(0, 2, n_points)

# Series Y: Lagged version of X with some noise
lag = 10
y = np.roll(x, lag) + np.random.normal(0, 3, n_points)
y[:lag] = x[:lag]  # Handle beginning

# Create DataFrame
cross_corr_data = pd.DataFrame({
    'X': x,
    'Y': y
}, index=pd.date_range('2020-01-01', periods=n_points, freq='D'))

# Plot cross-correlation
print("=== Cross-Correlation Analysis ===")
print(f"Series Y is Series X shifted by {lag} periods with added noise")
corr_values, lag_values = plot_cross_correlation(cross_corr_data['X'], cross_corr_data['Y'], 
                                                title="Lagged Relationship")

# Real-world example: Temperature vs Energy consumption
print("\n=== Real-world Example: Temperature vs Energy Consumption ===")
# Simulate temperature (seasonal) and energy consumption (lagged response)
temp = 20 + 10 * np.sin(2 * np.pi * t / 365) + np.random.normal(0, 2, n_points)
energy_lag = 7  # Energy consumption lags temperature by about a week
energy = 100 + 0.8 * np.roll(temp, energy_lag) + np.random.normal(0, 5, n_points)
energy[:energy_lag] = 100 + 0.8 * temp[:energy_lag]

weather_data = pd.DataFrame({
    'temperature': temp,
    'energy_consumption': energy
}, index=pd.date_range('2020-01-01', periods=n_points, freq='D'))

corr_weather, lags_weather = plot_cross_correlation(weather_data['temperature'], 
                                                   weather_data['energy_consumption'],
                                                   title="Temperature vs Energy Consumption")

๐ŸŽฏ Key Takeawaysยถ

  1. Time series components: Trend, seasonality, cycles, and noise

  2. Stationarity: Critical for most statistical methods

  3. Transformations: Differencing, logging, seasonal adjustment

  4. Autocorrelation: ACF shows all correlations, PACF shows direct correlations

  5. Decomposition: Break down series into interpretable components

  6. Cross-correlation: Find relationships between different time series

๐Ÿ” Diagnostic Checklistยถ

Before modeling any time series, ask:

  • โœ… Is the series stationary?

  • โœ… What are the main components (trend, seasonality)?

  • โœ… How strong is the autocorrelation?

  • โœ… Are there missing values or outliers?

  • โœ… Whatโ€™s the appropriate forecasting horizon?

  • โœ… What evaluation metrics matter for this use case?

๐Ÿš€ Next Stepsยถ

Now that you understand time series fundamentals, youโ€™re ready for:

  • Classical statistical methods (ARIMA, exponential smoothing)

  • Facebook Prophet for automated forecasting

  • Deep learning approaches (LSTM, Transformer)

  • Real-world applications and deployment

๐Ÿ’ก Pro Tipsยถ

  1. Always visualize first: Plot the data before any analysis

  2. Check stationarity: Most methods assume stationarity

  3. Domain knowledge matters: Understand the business context

  4. Start simple: Donโ€™t jump to complex models immediately

  5. Validate thoroughly: Use proper cross-validation for time series

  6. Monitor assumptions: Check if model assumptions hold

๐Ÿ“š Further Readingยถ

  • โ€œForecasting: Principles and Practiceโ€ by Hyndman & Athanasopoulos

  • โ€œTime Series Analysis and Its Applicationsโ€ by Shumway & Stoffer

  • โ€œPractical Time Series Forecasting with Rโ€ by Hyndman & Khandakar

Ready to dive into classical forecasting methods? Letโ€™s move to the next notebook! ๐Ÿ“ˆ๐Ÿ”ฎ