How to count rows in Pandas
You can count the number of rows in a pandas DataFrame using len()
or DataFrame.shape
. Here’s a quick example:
import pandas as pd
data = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [10, 20, 30, 40, 50]})
#three different ways to count rows
len(data)
len(data.index)
data.shape[0]
All three commands above return a row count. If you’re looking to shave milliseconds off of your computation time, len(data.index)
is the fastest of the three, but the difference is negligible in most cases as all are constant time operations. The same methods can also be used to count columns using len(data.columns)
or data.shape[1]
.
If you want to only the number of non-null entries, use DataFrame.count()
. This method will not count values None
, NaN
, NaT
, and optionally numpy.inf
, so if you need true row counts stick with the options outlined above. Because not every column will necessarily contain the same number of non-null values, count()
returns a DataFrame
with a value for each column:
import pandas as pd
import numpy as np
data = pd.DataFrame({'a': [1, np.nan, 3, 4, 5], 'b': [10, 20, 30, 40, 50]})
data.count()
To count non-null entries per row, you can use data.count(axis=1)
or data.count(axis='columns')
.
Finally, if you’d like to count rows by condition, you can use DataFrameGroupBy.size()
or DataFrameGroupBy.count()
. Here, size()
returns a Series
of true row counts per group, while count()
returns a DataFrame
of counts of non-null values per group:
import pandas as pd
import [numpy](https://saturncloud.io/glossary/numpy) as np
data = pd.DataFrame({'a': [1, np.nan, 3, 4, 5], 'b': [10, 20, 30, 40, 50], 'c': "X X X Y Y".split()})
data.groupby('c').size()
data.groupby('c').count()
In summary, len()
or DataFrame.shape
are usually go-to options for counting rows in Pandas. DataFrame.count()
is useful when you need to count non-null values in each column.
Additional Resources:
How to drop Pandas DataFrame rows with NAs in a specific column
How to drop Pandas DataFrame rows with NAs in a specific column
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.