Merge, Join, and ConcatenateΒΆ

Combining data from multiple sources is a fundamental data engineering task. Pandas provides three primary methods: merge() for SQL-style joins on shared key columns, join() for index-based combination, and concat() for stacking DataFrames along an axis.

This notebook demonstrates all major join types using Lord of the Rings character data: inner joins (only matching records), outer joins (all records with NaN for gaps), left and right joins (preserving one side completely), cross joins (Cartesian product), index-based joining with join() and suffix handling, horizontal concatenation with concat(), and row appending with append(). Mastering these operations is essential for assembling analytical datasets from normalized database tables or disparate data sources.

import pandas as pd
df1 = pd.read_csv(r"C:\Users\alexf\OneDrive\Documents\Pandas Tutorial\LOTR.csv")
df1
df2 = pd.read_csv(r"C:\Users\alexf\OneDrive\Documents\Pandas Tutorial\LOTR 2.csv")
df2
df1.merge(df2, how = 'inner', on = ['FellowshipID', 'FirstName'])
df1.merge(df2, how = 'outer')
df1.merge(df2, how = 'left')
df1.merge(df2, how = 'right')
df1.merge(df2, how = 'cross')
df1.join(df2, on = 'FellowshipID', how = 'outer', lsuffix = '_Left',rsuffix = '_Right')
df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'), lsuffix = '_Left',rsuffix = '_Right', how = 'outer')
df4
pd.concat([df1,df2], join = 'outer', axis = 1)
df1.append(df2)

image.png