Run this notebook: Open in Colab Open in Kaggle

Group by and Aggregating¶

The split-apply-combine pattern is one of the most powerful concepts in data analysis. Pandas implements it through .groupby(), which splits a DataFrame into groups based on one or more columns, applies an aggregation function (like mean(), sum(), count()), and combines the results back into a new DataFrame.

This notebook demonstrates grouping on a flavor ratings dataset: computing single aggregations with .mean(), .count(), and .sum(); applying multiple aggregations simultaneously with .agg() and a dictionary of column-to-function mappings; grouping by multiple columns for hierarchical analysis; and using .describe() for a comprehensive statistical summary per group. These techniques are the Pandas equivalent of SQL’s GROUP BY and are essential for summarizing data before visualization or model training.

import pandas as pd

df = pd.read_csv(r"C:\Users\alexf\OneDrive\Documents\Pandas Tutorial\Flavors.csv")
df

group_by_frame = df.groupby('Base Flavor')

group_by_frame.mean()

df.groupby('Base Flavor').mean()

df.groupby('Base Flavor').count()

df.groupby('Base Flavor').sum()

df.groupby('Base Flavor').agg({'Flavor Rating': ['mean','max','count','sum'], 'Texture Rating':['mean','max','count','sum'] })

df.groupby(['Base Flavor','Liked']).agg({'Flavor Rating': ['mean','max','count','sum']})

df.groupby('Base Flavor').describe()