How to reorder columns in Pandas
There are several ways to change the order of columns in a pandas DataFrame
; which to choose will depend on the size of your dataset and the transformation you want to perform.
If you have a relatively small dataset and/or need to specify a custom column order, you can simply reassign columns in the order you want them (note the double brackets):
import pandas as pd
data = pd.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30], 'c': [100, 200, 300], 'd': [4, 5, 6]})
data = data[['b', 'd', 'c', 'a']]
data
You can also accomplish the same thing using the built-in reindex
method, which can give a slight performance boost over the solution above:
data = data.reindex(columns=['b', 'd', 'c', 'a'])
While this is manageable for datasets with only a few columns, for larger datasets, manually writing out all column names can be cumbersome. If you need to rearrange all columns into a custom order outside of simple sorting, there’s not really a good way around this. However, if you just need to move a single column in your dataset, the solution is much simpler. Here is one way to go about it:
#move column 'd' to the beginning
data = data[['d'] + [col for col in data.columns if col != 'd']]
#move column 'd' to the end
data = data[[col for col in data.columns if col != 'd'] + ['d']]
To move a column to a particular index, you can use pop()
and insert()
; note that insert()
modifies the DataFrame
in-place.
#move column 'd' to be second from the left (index 1)
col = data.pop('d')
data.insert(1, col.name, col)
In cases where you simply want to sort columns by name, you can use reindex
from above with the axis
parameter:
data = data.reindex(sorted(data.columns), axis=1)
In summary, depending on the size of your dataset, you may be able to simply reassign or reindex DataFrame
columns to the desired order. For larger datasets, you may be better off choosing a solution that doesn’t require writing out all column names.
Additional Resources:
How to drop Pandas DataFrame rows with NAs in a specific column
How to drop Pandas DataFrame rows with NAs in a specific column
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.