Transform vs. aggregate in Pandas

consider the dataframe df

df = pd.DataFrame(dict(A=list('aabb'), B=[1, 2, 3, 4], C=[0, 9, 0, 9]))

enter image description here


groupby is the standard use aggregater

df.groupby('A').mean()

enter image description here


maybe you want these values broadcast across the whole group and return something with the same index as what you started with.
use transform

df.groupby('A').transform('mean')

enter image description here

df.set_index('A').groupby(level="A").transform('mean')

enter image description here


agg is used when you have specific things you want to run for different columns or more than one thing run on the same column.

df.groupby('A').agg(['mean', 'std'])

enter image description here

df.groupby('A').agg(dict(B='sum', C=['mean', 'prod']))

enter image description here

Leave a Comment