Modify a DataFrames

Adding a Column

One way that we can add a new column is by giving a list of the same length as the existing DataFrame.

df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 
           'Cost to Manufacture', 'Price']
)

# Add one column
df['Sold in Bulk?'] = ['Yes', 'Yes', 'No', 'No']

We can also add a new column that is the same for all rows in the DataFrame.

df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 
           'Cost to Manufacture', 'Price']
)

# Add one column
df['Sold in Bulk?'] = 'Yes'

Finally, you can add a new column by performing a function on the existing columns.

df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 
           'Cost to Manufacture', 'Price']
)

# Add column here
df['Margin'] = df['Price'] - df['Cost to Manufacture']

Note: when adding a new column, we can only use df['new_column_name'] to refer to the new column, and df.new_column_name will not work.

Adding a column using apply() function

The Pandas apply() function can be used to apply a function on every value in a column or row of a DataFrame, and transform that column or row to the resulting values. The function used in apply() originally only work for one element, but because of apply() it will apply the function to all values of a column. To perform it on a row instead, you can specify the argument axis=1 in the apply() function call.

# This function doubles the input value
def double(x):
  return 2*x

# Apply this function to double every value in a specified column
df.column1 = df.column1.apply(double)

# Lambda functions can also be supplied to `apply()`
df.column2 = df.column2.apply(lambda x : 3*x)

# Applying to a row requires it to be called on the entire DataFrame
df['newColumn'] = df.apply(lambda row: 
  row['column1'] * 1.5 + row['column2'],
  axis=1
)

Renaming Columns

Rename All Columns

We can change all of the column names at once by setting the .columns property to a different list. This is great when you need to change all of the column names at once.

df = pd.DataFrame({
    'name': ['John', 'Jane', 'Sue', 'Fred'],
    'age': [23, 29, 21, 18]
})
df.columns = ['First Name', 'Age']

Renaming Individual Columns

You also can rename individual columns by using the .rename method. Pass a dictionary like the one below to the columns keyword argument:

{'old_column_name1': 'new_column_name1', 
'old_column_name2': 'new_column_name2'}

Here’s an example:

df = pd.DataFrame({
    'name': ['John', 'Jane', 'Sue', 'Fred'],
    'age': [23, 29, 21, 18]
})
df.rename(columns={
    'name': 'First Name',
    'age': 'Age'},
    inplace=True)

The code above will rename name to First Name and age to Age.

Using rename with only the columns keyword will create a new DataFrame, leaving your original DataFrame unchanged. That’s why we also passed in the keyword argument inplace=True. Using inplace=True lets us edit the original DataFrame.

There are several reasons why .rename is preferable to .columns:

  • You can rename just one column

  • You can be specific about which column names are getting changed (with .column you can accidentally switch column names if you’re not careful)

Note: If you misspell one of the original column names, this command won’t fail. It just won’t change anything.

Last updated

Was this helpful?