dataframe
Pandas DataFrame merge summing column
This solution works also if you want to sum more than one column. Assume data frames >>> df1 id name weight height 0 1 A 0 5 1 2 B 10 10 2 3 C 10 15 >>> df2 id name weight height 0 2 B 25 20 1 3 C 20 30 You can … Read more
Find unique values in a Pandas dataframe, irrespective of row or column location
In [1]: df = DataFrame(np.random.randint(0,10,size=100).reshape(10,10)) In [2]: df Out[2]: 0 1 2 3 4 5 6 7 8 9 0 2 2 3 2 6 1 9 9 3 3 1 1 2 5 8 5 2 5 0 6 3 2 0 7 0 7 5 5 9 1 0 3 3 5 3 … Read more
Check if rows in one dataframe exist in another dataframe
You can use merge with parameter indicator, then remove column Rating and use numpy.where: df = pd.merge(df1, df2, on=[‘User’,’Movie’], how=’left’, indicator=”Exist”) df.drop(‘Rating’, inplace=True, axis=1) df[‘Exist’] = np.where(df.Exist == ‘both’, True, False) print (df) User Movie Exist 0 1 333 False 1 1 1193 True 2 1 3 False 3 2 433 False 4 3 54 … Read more
Python pandas: how to remove nan and -inf values
Use pd.DataFrame.isin and check for rows that have any with pd.DataFrame.any. Finally, use the boolean array to slice the dataframe. df[~df.isin([np.nan, np.inf, -np.inf]).any(1)] time X Y X_t0 X_tp0 X_t1 X_tp1 X_t2 X_tp2 4 0.037389 3 10 3 0.333333 2.0 0.500000 1.0 1.000000 5 0.037393 4 10 4 0.250000 3.0 0.333333 2.0 0.500000 1030308 9.962213 256 … Read more
Store numpy.array in cells of a Pandas.DataFrame
Use a wrapper around the numpy array i.e. pass the numpy array as list a = np.array([5, 6, 7, 8]) df = pd.DataFrame({“a”: [a]}) Output: a 0 [5, 6, 7, 8] Or you can use apply(np.array) by creating the tuples i.e. if you have a dataframe df = pd.DataFrame({‘id’: [1, 2, 3, 4], ‘a’: [‘on’, … Read more
Sorting by absolute value without changing the data
UPDATE Since 0.17.0 order and sort have been deprecated (thanks @Ruggero Turra), you can use sort_values to achieve this now: In[16]: df.reindex(df.b.abs().sort_values().index) Out[16]: a b 2 3 -1 3 4 2 0 1 -3 1 2 5 4 5 -9