Loading data into PandasΒΆ
There are different ways to load data into Pandas. The library is extremely flexible allowing you to work with different popular data formats. This notebook will show you how to load from different sources which include both local and remote.
Create a Pandas Frame from an online CSVΒΆ
Note that this requires a publicly available repository in Github (for this example). A private repo will require auth.
import pandas as pd
csv_url = "https://raw.githubusercontent.com/paiml/wine-ratings/main/wine-ratings.csv"
# set index_col to 0 to tell pandas that the first column is the index
df = pd.read_csv(csv_url, index_col=0)
df.head(10)
Load a CSV from a local fileΒΆ
Reading data from local files is the most common starting point for data analysis. The pd.read_csv() function handles delimiter detection, header parsing, and type inference automatically. For local files, simply pass the filename or relative path as a string.
import pandas as pd
df = pd.read_csv("world-championship-qualifier.csv")
print(df)
Load JSON from a local fileΒΆ
JSON (JavaScript Object Notation) is widely used in web APIs and configuration files. The pd.read_json() function parses JSON arrays or objects into a DataFrame, automatically mapping keys to column names. This makes it straightforward to load API responses or NoSQL database exports directly into Pandas for analysis.
df = pd.read_json("world-championship-qualifier.json")
df
You can read from many formatsΒΆ
The pd object allows you to read from various different formats including your clipboard!
read_clipboard
read_csv
read_excel
read_feather
read_fwf
read_gbq
read_hdf
read_html
read_json
read_orc
read_parquet
read_pickle
read_sas
read_spss
read_sql
read_sql_query
read_sql_table
read_stata
read_table
read_xml