Select in DataFrame
Select Columns
Select a Column
There are two possible syntaxes for selecting all values from a column:
table_name['column_name']
table_name.column_name
, if the name of a column follows all of the rules for a variable name (doesn’t start with a number, doesn’t contain spaces or special characters, etc.).
When we select a single column, the result is called a Series.
Selecting Multiple Columns
To select two or more columns from a DataFrame, we use a list of the column names.
When we select multiple columns, the result is a new DataFrame.
Select Rows
Select Rows by Index
We can select a row by its index.
When we select a single row, the result is a Series. When we select multiple rows, the result is a new DataFrame.
Select Rows with Logic
You can select a subset of a DataFrame by using logical statements:
Logical statements include:
Equal to,
==
Greater Than,
>
Less Than,
<
Not Equal,
!=
We can also combine multiple logical statements, as long as each statement is in parentheses.
We could use the isin
command to check that df.name
is one of a list of values:
Setting indices
When we select a subset of a DataFrame using logic, we end up with non-consecutive indices. This is inelegant and makes it hard to use .iloc()
.
We can fix this using the method .reset_index()
. For example, here is a DataFrame called df
with non-consecutive indices:
First Name
Last Name
0
John
Smith
4
Jane
Doe
7
Joe
Schmo
If we use the command df.reset_index()
, we get a new DataFrame with a new set of indices:
index
First Name
Last Name
0
0
John
Smith
1
4
Jane
Doe
2
7
Joe
Schmo
Note that the old indices have been moved into a new column called 'index'
. Unless you need those values for something special, it’s probably better to use the keyword drop=True
so that you don’t end up with that extra column. If we run the command df.reset_index(drop=True)
, we get a new DataFrame that looks like this:
First Name
Last Name
0
John
Smith
1
Jane
Doe
2
Joe
Schmo
Using .reset_index()
will return a new DataFrame, but we usually just want to modify our existing DataFrame. If we use the keyword inplace=True
we can just modify our existing DataFrame.
Last updated
Was this helpful?