Transform vs. aggregate in Pandas

consider the dataframe df df = pd.DataFrame(dict(A=list(‘aabb’), B=[1, 2, 3, 4], C=[0, 9, 0, 9])) groupby is the standard use aggregater df.groupby(‘A’).mean() maybe you want these values broadcast across the whole group and return something with the same index as what you started with. use transform df.groupby(‘A’).transform(‘mean’) df.set_index(‘A’).groupby(level=”A”).transform(‘mean’) agg is used when you have specific … Read more

Python Pandas Conditional Sum with Groupby

First groupby the key1 column: In [11]: g = df.groupby(‘key1′) and then for each group take the subDataFrame where key2 equals ‘one’ and sum the data1 column: In [12]: g.apply(lambda x: x[x[‘key2’] == ‘one’][‘data1′].sum()) Out[12]: key1 a 0.093391 b 1.468194 dtype: float64 To explain what’s going on let’s look at the ‘a’ group: In [21]: … Read more

Python Pandas Group by date using datetime data

You can use groupby by dates of column Date_Time by dt.date: df = df.groupby([df[‘Date_Time’].dt.date]).mean() Sample: df = pd.DataFrame({‘Date_Time’: pd.date_range(’10/1/2001 10:00:00′, periods=3, freq=’10H’), ‘B’:[4,5,6]}) print (df) B Date_Time 0 4 2001-10-01 10:00:00 1 5 2001-10-01 20:00:00 2 6 2001-10-02 06:00:00 print (df[‘Date_Time’].dt.date) 0 2001-10-01 1 2001-10-01 2 2001-10-02 Name: Date_Time, dtype: object df = df.groupby([df[‘Date_Time’].dt.date])[‘B’].mean() print(df) … Read more

How to drop duplicates based on two or more subsets criteria in Pandas data-frame

Your syntax is wrong. Here’s the correct way: df.drop_duplicates(subset=[‘bio’, ‘center’, ‘outcome’]) Or in this specific case, just simply: df.drop_duplicates() Both return the following: bio center outcome 0 1 one f 2 1 two f 3 4 three f Take a look at the df.drop_duplicates documentation for syntax details. subset should be a sequence of column … Read more

Group dataframe and get sum AND count?

try this: In [110]: (df.groupby(‘Company Name’) …..: .agg({‘Organisation Name’:’count’, ‘Amount’: ‘sum’}) …..: .reset_index() …..: .rename(columns={‘Organisation Name’:’Organisation Count’}) …..: ) Out[110]: Company Name Amount Organisation Count 0 Vifor Pharma UK Ltd 4207.93 5 or if you don’t want to reset index: df.groupby(‘Company Name’)[‘Amount’].agg([‘sum’,’count’]) or df.groupby(‘Company Name’).agg({‘Amount’: [‘sum’,’count’]}) Demo: In [98]: df.groupby(‘Company Name’)[‘Amount’].agg([‘sum’,’count’]) Out[98]: sum count Company … Read more