Run this notebook: Open in Colab Open in Kaggle

Merge, Join, and Concatenate¶

Combining data from multiple sources is a cornerstone of data engineering and analysis. Pandas provides three primary methods: merge() for SQL-style joins on shared columns, join() for index-based combinations, and concat() for stacking DataFrames vertically or horizontally.

This notebook demonstrates all major join types (inner, outer, left, right, cross) using merge(), index-based joining with join() including suffix handling for overlapping column names, horizontal concatenation with concat(), and row appending with append(). Understanding the differences between these methods – and when to use each – is critical for building data pipelines that pull together information from databases, APIs, CSV files, and other disparate sources into a unified analytical dataset.

import pandas as pd

df1 = pd.read_csv(r"C:\Users\alexf\OneDrive\Documents\Pandas Tutorial\LOTR.csv")
df1

df2 = pd.read_csv(r"C:\Users\alexf\OneDrive\Documents\Pandas Tutorial\LOTR 2.csv")
df2

df1.merge(df2, how = 'inner', on = ['FellowshipID', 'FirstName'])

df1.merge(df2, how = 'outer')

df1.merge(df2, how = 'left')

df1.merge(df2, how = 'right')

df1.merge(df2, how = 'cross')

df1.join(df2, on = 'FellowshipID', how = 'outer', lsuffix = '_Left',rsuffix = '_Right')

df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'), lsuffix = '_Left',rsuffix = '_Right', how = 'outer')
df4

pd.concat([df1,df2], join = 'outer', axis = 1)

df1.append(df2)