How to delete a Pandas DataFrame column
For getting, setting, and deleting columns from a pandas DataFrame
, you can treat a DataFrame
like a dict-like collection of Series
objects. So, it’s possible to delete columns using the familiar del
and pop
operations, just like a regular Python dictionary. Note that both modify the DataFrame
in-place.
You can use del
to delete a column by name:
import pandas as pd
data = pd.DataFrame({'one': [1, 2, 3], 'two': [10, 20, 30], 'three': [100, 200, 300]})
del data["two"]
data
Or, if you want to return the deleted column, use pop
:
import pandas as pd
data = pd.DataFrame({'one': [1, 2, 3], 'two': [10, 20, 30], 'three': [100, 200, 300]})
two_out = data.pop("two")
data
A third way to delete columns uses the Pandas method drop()
. drop()
allows you to easily delete one or more columns by either name or position. It also allows you to either modify your DataFrame
in place or to create a new copy. Use axis=1
or axis='columns'
to specify columns rather than rows.
import pandas as pd
data = pd.DataFrame({'one': [1, 2, 3], 'two': [10, 20, 30], 'three': [100, 200, 300]})
#drop one column by name, create a new object
data_new = data.drop('one', axis=1)
#drop two columns by name, create a new object
data_three = data.drop(['one', 'two'], axis=1)
#drop columns by index in-place
data.drop(data.columns[[0, 1]], axis=1, inplace=True)
Another advantage of drop() is the ability to easily handle errors. To only drop columns that exist without raising an error when they don’t, you can set the errors
argument to 'ignore'
:
import pandas as pd
data = pd.DataFrame({'one': [1, 2, 3], 'two': [10, 20, 30], 'three': [100, 200, 300]})
#raises KeyError by default if the column does not exist
data.drop("four", axis=1, inplace=True)
#suppress error message if column does not exist
data.drop("four", axis=1, inplace=True, errors='ignore')
To wrap up, there are several ways to delete columns from a pandas DataFrame
. The del
and pop
operations are easy to use, especially for users already familiar with dictionary operations. As an alternative, drop()
allows the flexibility of deleting multiple columns at once, performing the operation in-place or not, and choosing how to handle errors for non-existent columns. Additionally, it has the advantage of being part of the pandas API, with corresponding documentation.
Additional Resources:
How to drop Pandas DataFrame rows with NAs in a specific column
How to drop Pandas DataFrame rows with NAs in a specific column
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.