How to drop Pandas DataFrame rows with NAs in a specific column
During the data cleaning process, you may find that you need to discard rows from your pandas DataFrame
based on whether or not they have NA values in a certain column. While this task is slightly more complex than dropping rows containing any NA values, there are some quick and easy ways to go about it.
The first is to manually subset your DataFrame
, keeping only rows where your column of interest contains non-null values using DataFrame.notna()
:
import pandas as pd
import numpy as np
data = pd.DataFrame({'Gene': ["MITF", "MITF", "KIT", "KIT", "KIT"],
'Allele': ["A", "G", "A", "TA", np.nan],
'Count': [2, 7, np.nan, 8, 2]})
data = data[data['Count'].notna()]
data
While this does exactly what we want, consider using the Pandas function DataFrame.dropna()
instead. This method is a more explicit way to handle missing data in Pandas, and provides a variety of useful options. You can use dropna()
to discard rows with any or all NA values, with a certain number of NA values, or by a specific subset. You can also use the axis
parameter to discard columns by NA value instead. For our purposes, we can use the subset
parameter:
import pandas as pd
import [numpy](https://saturncloud.io/glossary/numpy) as np
data = pd.DataFrame({'Gene': ["MITF", "MITF", "KIT", "KIT", "KIT"],
'Allele': ["A", "G", "A", "TA", np.nan],
'Count': [2, 7, np.nan, 8, 2]})
data = data.dropna(subset = ['Count'])
data
Note that by default, dropna()
returns a copy of your data. If you’d instead like to modify your data in-place, you can use:
data.dropna(subset = ['Count'], inplace = True)
To wrap up, there are several simple strategies for dropping DataFrame
rows depending on NA values in a certain column. While you can certainly manually subset your data, dropna()
provides flexibility and speed for this use case and a variety of others.
Additional Resources:
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.