How to Filter Pandas DataFrames by Column of Strings
How to Filter Pandas DataFrames by Column of Strings
Pandas is a popular library in Python that is used extensively in data science and software engineering. It provides data structures and tools for data manipulation, analysis, and visualization. In this article, we will discuss how to filter Pandas DataFrames by a column of strings.
Introduction
Pandas DataFrames are two-dimensional labeled data structures that can hold data of different types, including strings. Filtering DataFrames by a column of strings is a common task in data science and software engineering. This can be achieved using the str
attribute of a DataFrame column. The str
attribute provides a set of string methods that can be used to filter, search, and manipulate strings in a DataFrame column.
Filtering by a Single String
To filter a DataFrame by a single string value in a column, we can use the str.contains()
method. The str.contains()
method returns a Boolean mask that can be used to select the rows that contain the specified string value in the column.
import pandas as pd
# Create a DataFrame
data = {"Name": ["John", "Jane", "Mary", "Adam"],
"City": ["New York", "Los Angeles", "Chicago", "Houston"]}
df = pd.DataFrame(data)
# Filter the DataFrame by a string value in the "City" column
filtered_df = df[df["City"].str.contains("Los Angeles")]
print(filtered_df)
The output should be:
Name City
1 Jane Los Angeles
In this example, we filtered the DataFrame by the string value “Los Angeles” in the “City” column using the str.contains()
method.
Filtering by Multiple Strings
To filter a DataFrame by multiple string values in a column, we can use the str.contains()
method with a regular expression. The regular expression can be used to match multiple strings separated by the or
(|
) operator.
import pandas as pd
# Create a DataFrame
data = {"Name": ["John", "Jane", "Mary", "Adam"],
"City": ["New York", "Los Angeles", "Chicago", "Houston"]}
df = pd.DataFrame(data)
# Filter the DataFrame by multiple string values in the "City" column
filtered_df = df[df["City"].str.contains("Los Angeles|Chicago")]
print(filtered_df)
The output should be:
Name City
1 Jane Los Angeles
2 Mary Chicago
In this example, we filtered the DataFrame by the string values Los Angeles
and Chicago
in the City
column using the str.contains()
method with a regular expression.
Filtering by a List of Strings
To filter a DataFrame by a list of string values in a column, we can use the isin()
method. The isin()
method returns a Boolean mask that can be used to select the rows that contain any of the specified string values in the column.
import pandas as pd
# Create a DataFrame
data = {"Name": ["John", "Jane", "Mary", "Adam"],
"City": ["New York", "Los Angeles", "Chicago", "Houston"]}
df = pd.DataFrame(data)
# Filter the DataFrame by a list of string values in the "City" column
filtered_df = df[df["City"].isin(["Los Angeles", "Chicago"])]
print(filtered_df)
The output should be:
Name City
1 Jane Los Angeles
2 Mary Chicago
In this example, we filtered the DataFrame by the string values Los Angeles
and Chicago
in the City
column using the isin()
method.
Conclusion
Filtering Pandas DataFrames by a column of strings is a common task in data science and software engineering. In this article, we discussed how to filter DataFrames by a single string value, multiple string values using a regular expression, and a list of string values using the str
attribute and the isin()
method. These methods can be used to select the rows that meet certain criteria based on the string values in a DataFrame column.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.