I want to plot changes in monthly values from dataset spanning over few years with matplot.pylib, pandas

It is much easier if you split Year/month columns to separate series for each year import pandas as pd import matplotlib.pyplot as plt fig, axes = plt.subplots(figsize=(6,4)) df = pd.read_csv(“data.csv”) df2 = pd.pivot_table(df, index=”Month”, columns=[“Year”]) df2 = df2.reindex([‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘July’, ‘August’, ‘September’, ‘October’, ‘November’, ‘December’]) df2.plot(ax=axes) fig.savefig(“plot.png”)

Find difference between two data frames

By using drop_duplicates pd.concat([df1,df2]).drop_duplicates(keep=False) Update : The above method only works for those data frames that don’t already have duplicates themselves. For example: df1=pd.DataFrame({‘A’:[1,2,3,3],’B’:[2,3,4,4]}) df2=pd.DataFrame({‘A’:[1],’B’:[2]}) It will output like below , which is wrong Wrong Output : pd.concat([df1, df2]).drop_duplicates(keep=False) Out[655]: A B 1 2 3 Correct Output Out[656]: A B 1 2 3 2 3 … Read more

How can repetitive rows of data be collected in a single row in pandas?

You can groupby and use agg to get the mean. For the non numeric columns, let’s take the first value: df.groupby(‘Player’).agg({k: ‘mean’ if v in (‘int64’, ‘float64’) else ‘first’ for k,v in df.dtypes[1:].items()}) output: Pos Age Tm G GS MP FG Player Jarrett Allen C 22 TOT 18.666667 6.666667 26.266667 4.333333 NB. content of the … Read more

pandas 0.21.0 Timestamp compatibility issue with matplotlib

There is an issue with pandas datetimes and matplotlib coming from the recent release of pandas 0.21, which does not register its converters any more at import. Once you use those converters once (within pandas) they’ll be registered and automatically used by matplotlib as well. A workaround would be to register them manually, import pandas.plotting._converter … Read more

What techniques can be used to measure performance of pandas/numpy solutions

They might not classify as “simple frameworks” because they are third-party modules that need to be installed but there are two frameworks I often use: simple_benchmark (I’m the author of that package) perfplot For example the simple_benchmark library allows to decorate the functions to benchmark: from simple_benchmark import BenchmarkBuilder b = BenchmarkBuilder() import pandas as … Read more

Specifying date format when converting with pandas.to_datetime

You can use the parse_dates option from read_csv to do the conversion directly while reading you data. The trick here is to use dayfirst=True to indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html When your dates have to be the index: >>> import pandas as … Read more

Pandas – replacing column values

Yes, you are using it incorrectly, Series.replace() is not inplace operation by default, it returns the replaced dataframe/series, you need to assign it back to your dataFrame/Series for its effect to occur. Or if you need to do it inplace, you need to specify the inplace keyword argument as True Example – data[‘sex’].replace(0, ‘Female’,inplace=True) data[‘sex’].replace(1, … Read more