dataframe – Make Me Engineer

pandas extract year from datetime: df[‘year’] = df[‘date’].year is not working

June 19, 2023 by Tarik

If you’re running a recent-ish version of pandas then you can use the datetime accessor dt to access the datetime components: In [6]: df[‘date’] = pd.to_datetime(df[‘date’]) df[‘year’], df[‘month’] = df[‘date’].dt.year, df[‘date’].dt.month df Out[6]: date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 … Read more

How to stream DataFrame using FastAPI without saving the data to csv file?

June 18, 2023 by Tarik

Approach 1 (recommended) As mentioned in this answer, as well as here and here, when the entire data (a DataFrame in your case) is already loaded into memory, there is no need to use StreamingResponse. StreamingResponse makes sense when you want to transfer real-time data and when you don’t know the size of your output … Read more

DataFrame object has no attribute append

June 18, 2023 by Tarik

As of pandas 2.0, append (previously deprecated) was removed. You need to use concat instead (for most applications): df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True) As noted by @cottontail, it’s also possible to use loc, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if … Read more

Select last non-NA value in a row, by row

June 18, 2023 by Tarik

How to calculate mean values grouped on another column

June 17, 2023 by Tarik

You could groupby on StationID and then take mean() on BiasTemp. To output Dataframe, use as_index=False In [4]: df.groupby(‘StationID’, as_index=False)[‘BiasTemp’].mean() Out[4]: StationID BiasTemp 0 BB 5.0 1 KEOPS 2.5 2 SS0279 15.0 Without as_index=False, it returns a Series instead In [5]: df.groupby(‘StationID’)[‘BiasTemp’].mean() Out[5]: StationID BB 5.0 KEOPS 2.5 SS0279 15.0 Name: BiasTemp, dtype: float64 Read … Read more

How can I change XTS to data.frame and keep Index?

June 14, 2023 by Tarik

Remove row with null value from pandas data frame

June 13, 2023 by Tarik

This should do the work: df = df.dropna(how=’any’,axis=0) It will erase every row (axis=0) that has “any” Null value in it. EXAMPLE: #Recreate random DataFrame with Nan values df = pd.DataFrame(index = pd.date_range(‘2017-01-01’, ‘2017-01-10′, freq=’1d’)) # Average speed in miles per hour df[‘A’] = np.random.randint(low=198, high=205, size=len(df.index)) df[‘B’] = np.random.random(size=len(df.index))*2 #Create dummy NaN value on … Read more

Pretty print a pandas dataframe in VS Code

June 13, 2023 by Tarik

As of the January 2021 release of the python extension, you can now view pandas dataframes with the built-in data viewer when debugging native python programs. When the program is halted at a breakpoint, right-click the dataframe variable in the variables list and select “View Value in Data Viewer”

Filtering pandas dataframe with multiple Boolean columns

June 13, 2023 by Tarik

In [82]: d Out[82]: A B C D 0 John Doe 45 True False 1 Jane Smith 32 False False 2 Alan Holmes 55 False True 3 Eric Lamar 29 True True Solution 1: In [83]: d.loc[d.C | d.D] Out[83]: A B C D 0 John Doe 45 True False 2 Alan Holmes 55 False … Read more

Adding a column in pandas df using a function

June 12, 2023 by Tarik

In general, you can use the apply function. If your function requires only one column, you can use: df[‘price’] = df[‘Symbol’].apply(getquotetoday) as @EdChum suggested. If your function requires multiple columns, you can use something like: df[‘new_column_name’] = df.apply(lambda x: my_function(x[‘value_1’], x[‘value_2’]), axis=1)