Python Pandas Group by date using datetime data

You can use groupby by dates of column Date_Time by dt.date: df = df.groupby([df[‘Date_Time’].dt.date]).mean() Sample: df = pd.DataFrame({‘Date_Time’: pd.date_range(’10/1/2001 10:00:00′, periods=3, freq=’10H’), ‘B’:[4,5,6]}) print (df) B Date_Time 0 4 2001-10-01 10:00:00 1 5 2001-10-01 20:00:00 2 6 2001-10-02 06:00:00 print (df[‘Date_Time’].dt.date) 0 2001-10-01 1 2001-10-01 2 2001-10-02 Name: Date_Time, dtype: object df = df.groupby([df[‘Date_Time’].dt.date])[‘B’].mean() print(df) … Read more

How to drop duplicates based on two or more subsets criteria in Pandas data-frame

Your syntax is wrong. Here’s the correct way: df.drop_duplicates(subset=[‘bio’, ‘center’, ‘outcome’]) Or in this specific case, just simply: df.drop_duplicates() Both return the following: bio center outcome 0 1 one f 2 1 two f 3 4 three f Take a look at the df.drop_duplicates documentation for syntax details. subset should be a sequence of column … Read more

Group dataframe and get sum AND count?

try this: In [110]: (df.groupby(‘Company Name’) …..: .agg({‘Organisation Name’:’count’, ‘Amount’: ‘sum’}) …..: .reset_index() …..: .rename(columns={‘Organisation Name’:’Organisation Count’}) …..: ) Out[110]: Company Name Amount Organisation Count 0 Vifor Pharma UK Ltd 4207.93 5 or if you don’t want to reset index: df.groupby(‘Company Name’)[‘Amount’].agg([‘sum’,’count’]) or df.groupby(‘Company Name’).agg({‘Amount’: [‘sum’,’count’]}) Demo: In [98]: df.groupby(‘Company Name’)[‘Amount’].agg([‘sum’,’count’]) Out[98]: sum count Company … Read more

Use pandas.shift() within a group

Pandas’ grouped objects have a groupby.DataFrameGroupBy.shift method, which will shift a specified column in each group n periods, just like the regular dataframe’s shift method: df[‘prev_value’] = df.groupby(‘object’)[‘value’].shift() For the following example dataframe: print(df) object period value 0 1 1 24 1 1 2 67 2 1 4 89 3 2 4 5 4 2 … Read more

How to access pandas groupby dataframe by key

You can use the get_group method: In [21]: gb.get_group(‘foo’) Out[21]: A B C 0 foo 1.624345 5 2 foo -0.528172 11 4 foo 0.865408 14 Note: This doesn’t require creating an intermediary dictionary / copy of every subdataframe for every group, so will be much more memory-efficient than creating the naive dictionary with dict(iter(gb)). This … Read more