Create a DataFrame
A DataFrame is an object that stores data as rows and columns. You can think of a DataFrame as a spreadsheet or as a SQL table. You can manually create a DataFrame or fill it with data from a CSV, an Excel spreadsheet, or a SQL query.
DataFrames have rows and columns. Each column has a name, which is a string. Each row has an index, which is an integer. DataFrames can contain many different data types: strings, ints, floats, tuples, etc.
Convert from list
We can add data using lists. For example, we can pass in a list of lists, where each one represents a row of data. Use the keyword argument columns
to pass a list of column names.
This command produces a DataFrame df2
that looks like this:
name
address
age
John Smith
123 Main St.
34
Jane Doe
456 Maple Ave.
28
Joe Schmo
789 Broadway
51
In this example, we were able to control the ordering of the columns because we used lists.
Convert from dictionary
We can pass in a dictionary to pd.DataFrame()
. Each key is a column name and each value is a list of column values. The columns must all be the same length or you will get an error. Here’s an example:
This command creates a DataFrame called df1
that looks like this:
address
age
name
123 Main St.
34
John Smith
456 Maple Ave.
28
Jane Doe
789 Broadway
51
Joe Schmo
Note that the columns will appear in alphabetical order because dictionaries don’t have any inherent order for columns.
Read in CSV file
When you have data in a CSV, you can load it into a DataFrame in Pandas using .read_csv()
:
We can also save data to a CSV, using .to_csv()
.
In the example above, the .to_csv()
method is called on df
(which represents a DataFrame object). The name of the CSV file is passed in as an argument (new-csv-file.csv
). By default, this method will save the CSV file in your current directory.
Last updated
Was this helpful?