Create a DataFrame

A DataFrame is an object that stores data as rows and columns. You can think of a DataFrame as a spreadsheet or as a SQL table. You can manually create a DataFrame or fill it with data from a CSV, an Excel spreadsheet, or a SQL query.

DataFrames have rows and columns. Each column has a name, which is a string. Each row has an index, which is an integer. DataFrames can contain many different data types: strings, ints, floats, tuples, etc.

Convert from list

We can add data using lists. For example, we can pass in a list of lists, where each one represents a row of data. Use the keyword argument columns to pass a list of column names.

df2 = pd.DataFrame([
    ['John Smith', '123 Main St.', 34],
    ['Jane Doe', '456 Maple Ave.', 28],
    ['Joe Schmo', '789 Broadway', 51]
    ],
    columns=['name', 'address', 'age'])

This command produces a DataFrame df2 that looks like this:

name

address

age

John Smith

123 Main St.

34

Jane Doe

456 Maple Ave.

28

Joe Schmo

789 Broadway

51

In this example, we were able to control the ordering of the columns because we used lists.

Convert from dictionary

We can pass in a dictionary to pd.DataFrame(). Each key is a column name and each value is a list of column values. The columns must all be the same length or you will get an error. Here’s an example:

df1 = pd.DataFrame({
    'name': ['John Smith', 'Jane Doe', 'Joe Schmo'],
    'address': ['123 Main St.', '456 Maple Ave.', '789 Broadway'],
    'age': [34, 28, 51]})

This command creates a DataFrame called df1 that looks like this:

address

age

name

123 Main St.

34

John Smith

456 Maple Ave.

28

Jane Doe

789 Broadway

51

Joe Schmo

Note that the columns will appear in alphabetical order because dictionaries don’t have any inherent order for columns.

Read in CSV file

When you have data in a CSV, you can load it into a DataFrame in Pandas using .read_csv():

pd.read_csv('my-csv-file.csv')

We can also save data to a CSV, using .to_csv().

df.to_csv('new-csv-file.csv')

In the example above, the .to_csv() method is called on df (which represents a DataFrame object). The name of the CSV file is passed in as an argument (new-csv-file.csv). By default, this method will save the CSV file in your current directory.

Last updated

Was this helpful?