dataframe – Page 2 – Make Me Engineer

Check whether values in one data frame column exist in a second data frame

June 12, 2023 by Tarik

Pandas DataFrame merge summing column

June 12, 2023 by Tarik

This solution works also if you want to sum more than one column. Assume data frames >>> df1 id name weight height 0 1 A 0 5 1 2 B 10 10 2 3 C 10 15 >>> df2 id name weight height 0 2 B 25 20 1 3 C 20 30 You can … Read more

Find unique values in a Pandas dataframe, irrespective of row or column location

June 11, 2023 by Tarik

In [1]: df = DataFrame(np.random.randint(0,10,size=100).reshape(10,10)) In [2]: df Out[2]: 0 1 2 3 4 5 6 7 8 9 0 2 2 3 2 6 1 9 9 3 3 1 1 2 5 8 5 2 5 0 6 3 2 0 7 0 7 5 5 9 1 0 3 3 5 3 … Read more

Adding column if it does not exist

June 11, 2023 by Tarik

Check if rows in one dataframe exist in another dataframe

June 11, 2023 by Tarik

You can use merge with parameter indicator, then remove column Rating and use numpy.where: df = pd.merge(df1, df2, on=[‘User’,’Movie’], how=’left’, indicator=”Exist”) df.drop(‘Rating’, inplace=True, axis=1) df[‘Exist’] = np.where(df.Exist == ‘both’, True, False) print (df) User Movie Exist 0 1 333 False 1 1 1193 True 2 1 3 False 3 2 433 False 4 3 54 … Read more

Python pandas: how to remove nan and -inf values

June 11, 2023 by Tarik

Use pd.DataFrame.isin and check for rows that have any with pd.DataFrame.any. Finally, use the boolean array to slice the dataframe. df[~df.isin([np.nan, np.inf, -np.inf]).any(1)] time X Y X_t0 X_tp0 X_t1 X_tp1 X_t2 X_tp2 4 0.037389 3 10 3 0.333333 2.0 0.500000 1.0 1.000000 5 0.037393 4 10 4 0.250000 3.0 0.333333 2.0 0.500000 1030308 9.962213 256 … Read more

Selecting columns in R data frame based on those not in a vector

June 10, 2023 by Tarik

Store numpy.array in cells of a Pandas.DataFrame

June 10, 2023 by Tarik

Use a wrapper around the numpy array i.e. pass the numpy array as list a = np.array([5, 6, 7, 8]) df = pd.DataFrame({“a”: [a]}) Output: a 0 [5, 6, 7, 8] Or you can use apply(np.array) by creating the tuples i.e. if you have a dataframe df = pd.DataFrame({‘id’: [1, 2, 3, 4], ‘a’: [‘on’, … Read more

write.csv for large data.table

June 10, 2023 by Tarik

Sorting by absolute value without changing the data

June 10, 2023 by Tarik

UPDATE Since 0.17.0 order and sort have been deprecated (thanks @Ruggero Turra), you can use sort_values to achieve this now: In[16]: df.reindex(df.b.abs().sort_values().index) Out[16]: a b 2 3 -1 3 4 2 0 1 -3 1 2 5 4 5 -9