How to get number of groups in a groupby object in pandas?

Simple, Fast, and Pandaic: ngroups Newer versions of the groupby API (pandas >= 0.23) provide this (undocumented) attribute which stores the number of groups in a GroupBy object. # setup df = pd.DataFrame({‘A’: list(‘aabbcccd’)}) dfg = df.groupby(‘A’) # call `.ngroups` on the GroupBy object dfg.ngroups # 4 Note that this is different from GroupBy.groups which … Read more

Rolling OLS Regressions and Predictions by Group

You should be able to achieve what you want using the groupby / apply pattern. The below code should be helpful. Create example data: from statsmodels.regression.rolling import RollingOLS from statsmodels.tools.tools import add_constant import pandas as pd import numpy as np # make some toy data race_dates = pd.to_datetime([‘2020-06-09’]*3 + [‘2020-12-01’]*4 + [‘2021-01-21’]*4 + [‘2021-05-04’]*5) distance … Read more

Bar graph from dataframe groupby

Copy Data from OP and run df = pd.read_clipboard() Plot using pandas.DataFrame.plot Updated to pandas v1.2.4 and matplotlib v3.3.4 then using your code df = df.replace(np.nan, 0) dfg = df.groupby([‘home_team’])[‘arrests’].mean() dfg.plot(kind=’bar’, title=”Arrests”, ylabel=”Mean Arrests”, xlabel=”Home Team”, figsize=(6, 5))

pandas groupby where you get the max of one column and the min of another column

Use groupby + agg by dict, so then is necessary order columns by subset or reindex_axis. Last add reset_index for convert index to column if necessary. df = a.groupby(‘user’).agg({‘num1′:’min’, ‘num2′:’max’})[[‘num1′,’num2’]].reset_index() print (df) user num1 num2 0 a 1 3 1 b 4 5 What is same as: df = a.groupby(‘user’).agg({‘num1′:’min’, ‘num2′:’max’}) .reindex_axis([‘num1′,’num2’], axis=1) .reset_index() print … Read more

Transform vs. aggregate in Pandas

consider the dataframe df df = pd.DataFrame(dict(A=list(‘aabb’), B=[1, 2, 3, 4], C=[0, 9, 0, 9])) groupby is the standard use aggregater df.groupby(‘A’).mean() maybe you want these values broadcast across the whole group and return something with the same index as what you started with. use transform df.groupby(‘A’).transform(‘mean’) df.set_index(‘A’).groupby(level=”A”).transform(‘mean’) agg is used when you have specific … Read more

Python Pandas Conditional Sum with Groupby

First groupby the key1 column: In [11]: g = df.groupby(‘key1′) and then for each group take the subDataFrame where key2 equals ‘one’ and sum the data1 column: In [12]: g.apply(lambda x: x[x[‘key2’] == ‘one’][‘data1′].sum()) Out[12]: key1 a 0.093391 b 1.468194 dtype: float64 To explain what’s going on let’s look at the ‘a’ group: In [21]: … Read more