Loading data into PandasΒΆ

There are different ways to load data into Pandas. The library is extremely flexible allowing you to work with different popular data formats. This notebook will show you how to load from different sources which include both local and remote.

Create a Pandas Frame from an online CSVΒΆ

Note that this requires a publicly available repository in Github (for this example). A private repo will require auth.

import pandas as pd
csv_url = "https://raw.githubusercontent.com/paiml/wine-ratings/main/wine-ratings.csv"
# set index_col to 0 to tell pandas that the first column is the index
df = pd.read_csv(csv_url, index_col=0)
df.head(10)

Load a CSV from a local fileΒΆ

Reading data from local files is the most common starting point for data analysis. The pd.read_csv() function handles delimiter detection, header parsing, and type inference automatically. For local files, simply pass the filename or relative path as a string.

import pandas as pd
df = pd.read_csv("world-championship-qualifier.csv")
print(df)

Load JSON from a local fileΒΆ

JSON (JavaScript Object Notation) is widely used in web APIs and configuration files. The pd.read_json() function parses JSON arrays or objects into a DataFrame, automatically mapping keys to column names. This makes it straightforward to load API responses or NoSQL database exports directly into Pandas for analysis.

df = pd.read_json("world-championship-qualifier.json")
df

You can read from many formatsΒΆ

The pd object allows you to read from various different formats including your clipboard!

  • read_clipboard

  • read_csv

  • read_excel

  • read_feather

  • read_fwf

  • read_gbq

  • read_hdf

  • read_html

  • read_json

  • read_orc

  • read_parquet

  • read_pickle

  • read_sas

  • read_spss

  • read_sql

  • read_sql_query

  • read_sql_table

  • read_stata

  • read_table

  • read_xml