Run this notebook: Open in Colab Open in Kaggle

# Install all required packages
!pip install -q gradio streamlit pycaret flaml pandas plotly scikit-learn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import gradio as gr
import pickle
from pathlib import Path

# Set style
sns.set_style('whitegrid')
np.random.seed(42)

Data Preparation: Building a Realistic Churn Dataset¶

Customer churn prediction is one of the most common ML applications in industry because retaining an existing customer costs 5-7x less than acquiring a new one. The synthetic dataset below encodes realistic churn dynamics: month-to-month contracts have higher churn (no lock-in), short-tenure customers leave more often (haven’t built loyalty), and frequent customer service calls signal dissatisfaction. These rule-based probabilities create a dataset where the signal is learnable but noisy – mimicking real-world conditions where churn depends on observable features plus unobservable factors like competitor offers.

Exploratory Data Analysis (EDA) before modeling reveals which features carry the strongest churn signal, guiding both feature engineering and model interpretation. The visualizations below confirm the expected patterns (contract type, tenure, and service calls are strong predictors), establishing domain-knowledge priors that we can validate against the AutoML model’s feature importance rankings later.

def generate_churn_data(n_samples=2000):
    """
    Generate synthetic customer churn data.
    """
    np.random.seed(42)
    
    # Generate features
    data = {
        'tenure': np.random.randint(1, 73, n_samples),  # months
        'monthly_charges': np.random.uniform(20, 120, n_samples),
        'total_charges': np.random.uniform(100, 8000, n_samples),
        'contract': np.random.choice(['Month-to-month', 'One year', 'Two year'], n_samples, p=[0.5, 0.3, 0.2]),
        'payment_method': np.random.choice(['Electronic check', 'Mailed check', 'Bank transfer', 'Credit card'], n_samples),
        'internet_service': np.random.choice(['DSL', 'Fiber optic', 'No'], n_samples, p=[0.4, 0.4, 0.2]),
        'tech_support': np.random.choice(['Yes', 'No'], n_samples),
        'online_security': np.random.choice(['Yes', 'No'], n_samples),
        'customer_service_calls': np.random.poisson(2, n_samples),
        'streaming_tv': np.random.choice(['Yes', 'No'], n_samples),
        'paperless_billing': np.random.choice(['Yes', 'No'], n_samples, p=[0.6, 0.4])
    }
    
    df = pd.DataFrame(data)
    
    # Generate target based on logical rules
    churn_prob = 0.2  # base probability
    
    # Adjust probability based on features
    churn_prob = np.where(df['contract'] == 'Month-to-month', churn_prob + 0.3, churn_prob)
    churn_prob = np.where(df['tenure'] < 12, churn_prob + 0.25, churn_prob)
    churn_prob = np.where(df['tech_support'] == 'No', churn_prob + 0.15, churn_prob)
    churn_prob = np.where(df['customer_service_calls'] > 3, churn_prob + 0.2, churn_prob)
    churn_prob = np.where(df['monthly_charges'] > 80, churn_prob + 0.1, churn_prob)
    
    # Clip probabilities
    churn_prob = np.clip(churn_prob, 0, 1)
    
    # Generate binary outcome
    df['churn'] = (np.random.random(n_samples) < churn_prob).astype(int)
    
    return df

# Generate dataset
df = generate_churn_data(2000)

print(f"Dataset shape: {df.shape}")
print(f"\nChurn distribution:")
print(df['churn'].value_counts(normalize=True))
print(f"\nFirst few rows:")
df.head()

# Save dataset
df.to_csv('customer_churn.csv', index=False)
print("Dataset saved as 'customer_churn.csv'")

# Exploratory Data Analysis
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Churn by contract type
contract_churn = df.groupby('contract')['churn'].mean().sort_values()
axes[0, 0].barh(contract_churn.index, contract_churn.values, color='steelblue')
axes[0, 0].set_xlabel('Churn Rate')
axes[0, 0].set_title('Churn Rate by Contract Type')

# Tenure distribution
df[df['churn']==0]['tenure'].hist(bins=30, ax=axes[0, 1], alpha=0.7, label='No Churn', color='green')
df[df['churn']==1]['tenure'].hist(bins=30, ax=axes[0, 1], alpha=0.7, label='Churn', color='red')
axes[0, 1].set_xlabel('Tenure (months)')
axes[0, 1].set_ylabel('Count')
axes[0, 1].set_title('Tenure Distribution by Churn')
axes[0, 1].legend()

# Monthly charges
df[df['churn']==0]['monthly_charges'].hist(bins=30, ax=axes[1, 0], alpha=0.7, label='No Churn', color='green')
df[df['churn']==1]['monthly_charges'].hist(bins=30, ax=axes[1, 0], alpha=0.7, label='Churn', color='red')
axes[1, 0].set_xlabel('Monthly Charges ($)')
axes[1, 0].set_ylabel('Count')
axes[1, 0].set_title('Monthly Charges Distribution')
axes[1, 0].legend()

# Customer service calls
service_churn = df.groupby('customer_service_calls')['churn'].mean()
axes[1, 1].plot(service_churn.index, service_churn.values, marker='o', color='coral', linewidth=2)
axes[1, 1].set_xlabel('Customer Service Calls')
axes[1, 1].set_ylabel('Churn Rate')
axes[1, 1].set_title('Churn Rate vs Service Calls')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Model Training with AutoML: From Data to Tuned Model in Minutes¶

PyCaret’s classification workflow automates the full model development pipeline. The setup() call handles one-hot encoding of categorical features (contract, payment method, internet service), normalization of numerical features, and train/test splitting. Setting fix_imbalance=True applies SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic examples of the minority class (churned customers), preventing the model from achieving high accuracy by simply predicting “no churn” for everyone.

The compare_models -> tune_model -> finalize_model workflow represents PyCaret’s three-stage pipeline: (1) broad search across 15+ algorithms to identify the best model family, (2) focused hyperparameter optimization on the winner using Bayesian search, and (3) retraining on the full dataset (train + test) for maximum predictive power at deployment. The finalize_model() step is critical and often forgotten – without it, the deployed model was trained on only 80% of available data.

from pycaret.classification import *

# Setup PyCaret
print("Setting up PyCaret environment...")
clf_setup = setup(
    data=df,
    target='churn',
    session_id=42,
    verbose=False,
    normalize=True,
    train_size=0.8,
    fix_imbalance=True  # Handle class imbalance
)

print("\n✅ Setup complete!")

# Compare models
print("Comparing models (this takes a minute)...\n")
best_models = compare_models(n_select=3, sort='AUC')

print("\n✅ Top 3 models identified!")

# Tune the best model
best_model = best_models[0]
print(f"Tuning {best_model}...\n")

tuned_model = tune_model(best_model, optimize='AUC', n_iter=10)

print("\n✅ Model tuned!")

# Evaluate model
print("Evaluating model...\n")
evaluate_model(tuned_model)

# Create final model (trained on full dataset)
final_model = finalize_model(tuned_model)

# Save model
save_model(final_model, 'churn_model')
print("\n✅ Model saved as 'churn_model.pkl'")

Build Gradio Interface: Making the Model Accessible¶

The Gradio interface translates the trained model into a tool that business users (customer success managers, retention teams) can use without writing code. Each input widget maps to a feature in the training data: sliders for continuous values (tenure, charges), dropdowns for categorical values (contract type, payment method). The prediction function constructs a single-row DataFrame matching the training schema, calls predict_model(), and formats the output as an actionable recommendation rather than a raw probability.

UX design for ML interfaces: presenting a churn probability alone (e.g., “73.2%”) is less useful than a risk categorization with specific next steps (“HIGH CHURN RISK – offer retention incentives, improve service”). The examples parameter pre-loads two contrasting profiles: a high-risk customer (short tenure, month-to-month, no tech support) and a low-risk customer (long tenure, two-year contract, full support), letting users immediately understand the model’s behavior before entering their own data.

# Load model
loaded_model = load_model('churn_model')

def predict_churn(tenure, monthly_charges, total_charges, contract, 
                 payment_method, internet_service, tech_support, 
                 online_security, customer_service_calls, 
                 streaming_tv, paperless_billing):
    """
    Predict customer churn probability.
    """
    # Create input dataframe
    input_data = pd.DataFrame([{
        'tenure': tenure,
        'monthly_charges': monthly_charges,
        'total_charges': total_charges,
        'contract': contract,
        'payment_method': payment_method,
        'internet_service': internet_service,
        'tech_support': tech_support,
        'online_security': online_security,
        'customer_service_calls': customer_service_calls,
        'streaming_tv': streaming_tv,
        'paperless_billing': paperless_billing
    }])
    
    # Make prediction
    prediction = predict_model(loaded_model, data=input_data)
    
    # Get probability
    churn_prob = prediction['prediction_score'].iloc[0]
    prediction_label = prediction['prediction_label'].iloc[0]
    
    # Format output
    if prediction_label == 1:
        result = f"⚠️ **HIGH CHURN RISK** ({churn_prob:.1%})"
        recommendation = """\n\n**Recommended Actions:**
        1. Offer retention incentives
        2. Improve customer service
        3. Consider contract upgrade offers
        """
    else:
        result = f"✅ **LOW CHURN RISK** ({churn_prob:.1%})"
        recommendation = "\n\n**Status:** Customer likely to stay."
    
    return result + recommendation

# Create Gradio interface
demo = gr.Interface(
    fn=predict_churn,
    inputs=[
        gr.Slider(1, 72, value=12, step=1, label="Tenure (months)"),
        gr.Slider(20, 120, value=50, step=5, label="Monthly Charges ($)"),
        gr.Slider(100, 8000, value=1000, step=100, label="Total Charges ($)"),
        gr.Dropdown(['Month-to-month', 'One year', 'Two year'], label="Contract Type", value="Month-to-month"),
        gr.Dropdown(['Electronic check', 'Mailed check', 'Bank transfer', 'Credit card'], label="Payment Method", value="Electronic check"),
        gr.Dropdown(['DSL', 'Fiber optic', 'No'], label="Internet Service", value="Fiber optic"),
        gr.Dropdown(['Yes', 'No'], label="Tech Support", value="No"),
        gr.Dropdown(['Yes', 'No'], label="Online Security", value="No"),
        gr.Slider(0, 10, value=2, step=1, label="Customer Service Calls"),
        gr.Dropdown(['Yes', 'No'], label="Streaming TV", value="Yes"),
        gr.Dropdown(['Yes', 'No'], label="Paperless Billing", value="Yes")
    ],
    outputs=gr.Markdown(label="Prediction"),
    title="🎯 Customer Churn Predictor",
    description="Predict the likelihood of customer churn based on their profile",
    examples=[
        [6, 85, 500, 'Month-to-month', 'Electronic check', 'Fiber optic', 'No', 'No', 4, 'Yes', 'Yes'],
        [48, 45, 2000, 'Two year', 'Bank transfer', 'DSL', 'Yes', 'Yes', 1, 'No', 'No']
    ],
    theme=gr.themes.Soft()
)

# Launch
demo.launch()

Create Deployable App Files: From Notebook to Production¶

Transitioning from a notebook prototype to a deployable application requires extracting the model loading, prediction, and UI code into standalone files. The app.py below uses gr.Blocks for a two-column layout (customer info on the left, service details on the right) that is more organized than a single vertical stack of inputs. The model file (churn_model.pkl) is copied into the app directory so the entire folder can be deployed as a self-contained unit to Hugging Face Spaces.

The Blocks layout pattern: gr.Row() places components side by side, gr.Column() stacks them vertically within a row, and gr.Examples provides clickable preset inputs. The variant="primary" argument on the predict button gives it visual prominence. This structured layout transforms a technical ML output into a professional-looking application suitable for stakeholder demos and internal tools.

!mkdir -p churn_predictor_app

%%writefile churn_predictor_app/app.py
import gradio as gr
import pandas as pd
from pycaret.classification import load_model, predict_model

# Load model
model = load_model('churn_model')

def predict_churn(tenure, monthly_charges, total_charges, contract, 
                 payment_method, internet_service, tech_support, 
                 online_security, customer_service_calls, 
                 streaming_tv, paperless_billing):
    """
    Predict customer churn probability.
    """
    # Create input dataframe
    input_data = pd.DataFrame([{
        'tenure': tenure,
        'monthly_charges': monthly_charges,
        'total_charges': total_charges,
        'contract': contract,
        'payment_method': payment_method,
        'internet_service': internet_service,
        'tech_support': tech_support,
        'online_security': online_security,
        'customer_service_calls': customer_service_calls,
        'streaming_tv': streaming_tv,
        'paperless_billing': paperless_billing
    }])
    
    # Make prediction
    prediction = predict_model(model, data=input_data)
    
    # Get probability
    churn_prob = prediction['prediction_score'].iloc[0]
    prediction_label = prediction['prediction_label'].iloc[0]
    
    # Format output
    if prediction_label == 1:
        result = f"⚠️ **HIGH CHURN RISK** ({churn_prob:.1%})"
        recommendation = """\n\n**Recommended Actions:**
        1. Offer retention incentives
        2. Improve customer service experience
        3. Consider contract upgrade offers
        4. Provide tech support access
        """
    else:
        result = f"✅ **LOW CHURN RISK** ({churn_prob:.1%})"
        recommendation = "\n\n**Status:** Customer likely to stay. Continue excellent service!"
    
    return result + recommendation

# Create interface
with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("# 🎯 Customer Churn Prediction System")
    gr.Markdown("Predict customer churn risk and get actionable recommendations")
    
    with gr.Row():
        with gr.Column():
            gr.Markdown("### Customer Information")
            tenure = gr.Slider(1, 72, value=12, step=1, label="Tenure (months)")
            monthly_charges = gr.Slider(20, 120, value=50, step=5, label="Monthly Charges ($)")
            total_charges = gr.Slider(100, 8000, value=1000, step=100, label="Total Charges ($)")
            customer_service_calls = gr.Slider(0, 10, value=2, step=1, label="Customer Service Calls")
        
        with gr.Column():
            gr.Markdown("### Service Details")
            contract = gr.Dropdown(['Month-to-month', 'One year', 'Two year'], 
                                   label="Contract Type", value="Month-to-month")
            payment_method = gr.Dropdown(['Electronic check', 'Mailed check', 'Bank transfer', 'Credit card'], 
                                        label="Payment Method", value="Electronic check")
            internet_service = gr.Dropdown(['DSL', 'Fiber optic', 'No'], 
                                          label="Internet Service", value="Fiber optic")
            tech_support = gr.Dropdown(['Yes', 'No'], label="Tech Support", value="No")
            online_security = gr.Dropdown(['Yes', 'No'], label="Online Security", value="No")
            streaming_tv = gr.Dropdown(['Yes', 'No'], label="Streaming TV", value="Yes")
            paperless_billing = gr.Dropdown(['Yes', 'No'], label="Paperless Billing", value="Yes")
    
    predict_btn = gr.Button("Predict Churn Risk", variant="primary")
    output = gr.Markdown(label="Prediction Result")
    
    # Examples
    gr.Examples(
        examples=[
            [6, 85, 500, 'Month-to-month', 'Electronic check', 'Fiber optic', 'No', 'No', 4, 'Yes', 'Yes'],
            [48, 45, 2000, 'Two year', 'Bank transfer', 'DSL', 'Yes', 'Yes', 1, 'No', 'No'],
            [12, 95, 1200, 'One year', 'Credit card', 'Fiber optic', 'No', 'Yes', 3, 'Yes', 'Yes']
        ],
        inputs=[tenure, monthly_charges, total_charges, contract, payment_method, 
               internet_service, tech_support, online_security, customer_service_calls,
               streaming_tv, paperless_billing]
    )
    
    predict_btn.click(
        fn=predict_churn,
        inputs=[tenure, monthly_charges, total_charges, contract, payment_method, 
               internet_service, tech_support, online_security, customer_service_calls,
               streaming_tv, paperless_billing],
        outputs=output
    )
    
    gr.Markdown("""\n---\n**Model Info:** Trained on 2000+ customer records using AutoML (PyCaret)""")

if __name__ == "__main__":
    demo.launch()

%%writefile churn_predictor_app/requirements.txt
gradio
pycaret
pandas
scikit-learn

%%writefile churn_predictor_app/README.md
---
title: Customer Churn Predictor
emoji: 🎯
colorFrom: red
colorTo: orange
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
---

# Customer Churn Prediction System

Predict customer churn risk using machine learning and get actionable recommendations.

## Features
- Real-time churn prediction
- Risk assessment with confidence scores
- Actionable retention recommendations
- Built with AutoML (PyCaret)

## Usage
1. Enter customer information
2. Click "Predict Churn Risk"
3. Review risk level and recommendations

## Model
- Trained on 2000+ customer records
- AutoML model selection and tuning
- Optimized for ROC-AUC

## Deployment
Ready to deploy on Hugging Face Spaces!

# Copy model to app directory
import shutil
shutil.copy('churn_model.pkl', 'churn_predictor_app/churn_model.pkl')
print("✅ Model copied to app directory")
print("\nApp is ready for deployment!")
print("\nTo deploy to Hugging Face Spaces:")
print("1. Create a new Space at huggingface.co/new-space")
print("2. Choose Gradio SDK")
print("3. Upload all files from churn_predictor_app/")
print("4. Wait for build to complete")
print("5. Share your app!")

Production Considerations: From Demo to Reliable System¶

Deploying an ML application to production introduces concerns beyond model accuracy: input validation prevents crashes from unexpected data (null values, out-of-range numbers, wrong types), error handling provides meaningful messages instead of stack traces, and monitoring tracks prediction distributions to detect model drift. A model that was 95% accurate at launch can degrade silently as customer behavior evolves, making continuous monitoring essential.

The production checklist below covers five domains: model quality (test edge cases, document limitations), reliability (validate inputs, catch prediction errors), user experience (fast response times, mobile-friendly layouts), security (rate limiting, no sensitive data exposure), and maintenance (versioning strategy, retraining schedule, performance dashboards). Even for internal tools, addressing these concerns prevents the common failure mode where a successful demo becomes an unreliable “production” system that nobody trusts.

print("""
Production Checklist for ML Apps:

✅ Model Development
  • Train on representative data
  • Validate performance metrics
  • Test edge cases
  • Document model limitations

✅ Error Handling
  • Validate all inputs
  • Handle missing values
  • Catch prediction errors
  • Provide meaningful error messages

✅ User Experience
  • Clear input labels
  • Example inputs provided
  • Fast response time (< 2s)
  • Mobile-friendly interface

✅ Security
  • Input validation
  • Rate limiting
  • No sensitive data exposure
  • Secure API endpoints

✅ Monitoring
  • Track prediction distribution
  • Monitor error rates
  • Log user interactions
  • Alert on anomalies

✅ Documentation
  • README with usage
  • Model card with metrics
  • API documentation
  • Troubleshooting guide

✅ Maintenance
  • Versioning strategy
  • Retraining schedule
  • Performance monitoring
  • Feedback collection

Common Pitfalls to Avoid:

❌ No input validation
❌ Slow model loading
❌ No error handling
❌ Overfitting to training data
❌ Ignoring model drift
❌ Poor mobile experience
❌ Unclear predictions
❌ No monitoring
""")

🎯 Key Takeaways¶

Complete workflow - Data → AutoML → Interface → Deployment
Low-code power - Built entire app with minimal custom code
AutoML baseline - Quick model development with PyCaret
User-friendly UI - Gradio makes complex models accessible
Production-ready - Proper error handling and documentation
Easy deployment - Ready for Hugging Face Spaces

📝 Project Extensions¶

Add Streamlit Dashboard
- Historical churn trends
- Feature importance visualization
- Batch predictions
- Model performance metrics
Improve Model
- Feature engineering
- Ensemble methods
- Hyperparameter tuning
- Cross-validation
Add Features
- Batch upload (CSV)
- Download predictions
- Confidence intervals
- SHAP explanations
Deploy Variations
- Streamlit version
- REST API
- Docker container
- Cloud deployment

🎓 Final Exercise¶

Build Your Own Complete ML App:

Choose a dataset (classification or regression)
Train models with AutoML
Create Gradio interface
Build Streamlit dashboard
Deploy to Hugging Face Spaces
Share with the community!

Suggested Projects:

House price predictor
Sentiment analyzer
Image classifier
Fraud detector
Recommendation system

🏆 Congratulations!¶

You’ve completed Phase 17: Low-Code AI Tools!

You now know how to:

✅ Build ML demos with Gradio
✅ Create dashboards with Streamlit
✅ Deploy to Hugging Face Spaces
✅ Use AutoML for rapid prototyping
✅ Build production-ready ML applications

Next Steps:

Deploy your own apps
Build a portfolio of projects
Share with the ML community
Continue to advanced deployment topics

Phase 17 Complete! 🎉