Merge, Join, and ConcatenateΒΆ

Combining data from multiple sources is a cornerstone of data engineering and analysis. Pandas provides three primary methods: merge() for SQL-style joins on shared columns, join() for index-based combinations, and concat() for stacking DataFrames vertically or horizontally.

This notebook demonstrates all major join types (inner, outer, left, right, cross) using merge(), index-based joining with join() including suffix handling for overlapping column names, horizontal concatenation with concat(), and row appending with append(). Understanding the differences between these methods – and when to use each – is critical for building data pipelines that pull together information from databases, APIs, CSV files, and other disparate sources into a unified analytical dataset.

import pandas as pd
df1 = pd.read_csv(r"C:\Users\alexf\OneDrive\Documents\Pandas Tutorial\LOTR.csv")
df1
df2 = pd.read_csv(r"C:\Users\alexf\OneDrive\Documents\Pandas Tutorial\LOTR 2.csv")
df2
df1.merge(df2, how = 'inner', on = ['FellowshipID', 'FirstName'])
df1.merge(df2, how = 'outer')
df1.merge(df2, how = 'left')
df1.merge(df2, how = 'right')
df1.merge(df2, how = 'cross')
df1.join(df2, on = 'FellowshipID', how = 'outer', lsuffix = '_Left',rsuffix = '_Right')
df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'), lsuffix = '_Left',rsuffix = '_Right', how = 'outer')
df4
pd.concat([df1,df2], join = 'outer', axis = 1)
df1.append(df2)

image.png