Modify a DataFrames
Adding a Column
One way that we can add a new column is by giving a list of the same length as the existing DataFrame.
df = pd.DataFrame([
[1, '3 inch screw', 0.5, 0.75],
[2, '2 inch nail', 0.10, 0.25],
[3, 'hammer', 3.00, 5.50],
[4, 'screwdriver', 2.50, 3.00]
],
columns=['Product ID', 'Description',
'Cost to Manufacture', 'Price']
)
# Add one column
df['Sold in Bulk?'] = ['Yes', 'Yes', 'No', 'No']
We can also add a new column that is the same for all rows in the DataFrame.
df = pd.DataFrame([
[1, '3 inch screw', 0.5, 0.75],
[2, '2 inch nail', 0.10, 0.25],
[3, 'hammer', 3.00, 5.50],
[4, 'screwdriver', 2.50, 3.00]
],
columns=['Product ID', 'Description',
'Cost to Manufacture', 'Price']
)
# Add one column
df['Sold in Bulk?'] = 'Yes'
Finally, you can add a new column by performing a function on the existing columns.
df = pd.DataFrame([
[1, '3 inch screw', 0.5, 0.75],
[2, '2 inch nail', 0.10, 0.25],
[3, 'hammer', 3.00, 5.50],
[4, 'screwdriver', 2.50, 3.00]
],
columns=['Product ID', 'Description',
'Cost to Manufacture', 'Price']
)
# Add column here
df['Margin'] = df['Price'] - df['Cost to Manufacture']
Note: when adding a new column, we can only use df['new_column_name']
to refer to the new column, and df.new_column_name
will not work.
Adding a column using apply() function
The Pandas apply()
function can be used to apply a function on every value in a column or row of a DataFrame, and transform that column or row to the resulting values. The function used in apply()
originally only work for one element, but because of apply()
it will apply the function to all values of a column. To perform it on a row instead, you can specify the argument axis=1
in the apply()
function call.
# This function doubles the input value
def double(x):
return 2*x
# Apply this function to double every value in a specified column
df.column1 = df.column1.apply(double)
# Lambda functions can also be supplied to `apply()`
df.column2 = df.column2.apply(lambda x : 3*x)
# Applying to a row requires it to be called on the entire DataFrame
df['newColumn'] = df.apply(lambda row:
row['column1'] * 1.5 + row['column2'],
axis=1
)
Renaming Columns
Rename All Columns
We can change all of the column names at once by setting the .columns
property to a different list. This is great when you need to change all of the column names at once.
df = pd.DataFrame({
'name': ['John', 'Jane', 'Sue', 'Fred'],
'age': [23, 29, 21, 18]
})
df.columns = ['First Name', 'Age']
Renaming Individual Columns
You also can rename individual columns by using the .rename
method. Pass a dictionary like the one below to the columns
keyword argument:
{'old_column_name1': 'new_column_name1',
'old_column_name2': 'new_column_name2'}
Here’s an example:
df = pd.DataFrame({
'name': ['John', 'Jane', 'Sue', 'Fred'],
'age': [23, 29, 21, 18]
})
df.rename(columns={
'name': 'First Name',
'age': 'Age'},
inplace=True)
The code above will rename name
to First Name
and age
to Age
.
Using rename
with only the columns
keyword will create a new DataFrame, leaving your original DataFrame unchanged. That’s why we also passed in the keyword argument inplace=True
. Using inplace=True
lets us edit the original DataFrame.
There are several reasons why .rename
is preferable to .columns
:
You can rename just one column
You can be specific about which column names are getting changed (with
.column
you can accidentally switch column names if you’re not careful)
Note: If you misspell one of the original column names, this command won’t fail. It just won’t change anything.
Last updated
Was this helpful?