Resampling Within a Pandas MultiIndex

pd.Grouper allows you to specify a “groupby instruction for a target object”. In particular, you can use it to group by dates even if df.index is not a DatetimeIndex: df.groupby(pd.Grouper(freq=’2D’, level=-1)) The level=-1 tells pd.Grouper to look for the dates in the last level of the MultiIndex. Moreover, you can use this in conjunction with … Read more

Storing time-series data, relational or non?

Definitely Relational. Unlimited flexibility and expansion. Two corrections, both in concept and application, followed by an elevation. Correction It is not “filtering out the un-needed data”; it is selecting only the needed data. Yes, of course, if you have an Index to support the columns identified in the WHERE clause, it is very fast, and … Read more

Pandas compare next row

Looks like you want to use the Series.shift method. Using this method, you can generate new columns which are offset to the original columns. Like this: df[‘qty_s’] = df[‘qty’].shift(-1) df[‘t_s’] = df[‘t’].shift(-1) df[‘z_s’] = df[‘z’].shift(-1) Now you can compare these: df[‘is_something’] = (df[‘qty’] == df[‘qty_s’]) & (df[‘t’] < df[‘t_s’]) & (df[‘z’] == df[‘z_s’]) Here is … Read more

Annotate Time Series plot in Matplotlib

Matplotlib uses an internal floating point format for dates. You just need to convert your date to that format (using matplotlib.dates.date2num or matplotlib.dates.datestr2num) and then use annotate as usual. As a somewhat excessively fancy example: import datetime as dt import matplotlib.pyplot as plt import matplotlib.dates as mdates x = [dt.datetime(2009, 05, 01), dt.datetime(2010, 06, 01), … Read more

Filling gaps in timeseries Spark

If input DataFrame has following structure: root |– periodstart: timestamp (nullable = true) |– usage: long (nullable = true) Scala Determine min / max: val (minp, maxp) = df .select(min($”periodstart”).cast(“bigint”), max($”periodstart”.cast(“bigint”))) .as[(Long, Long)] .first Set step, for example for 15 minutes: val step: Long = 15 * 60 Generate reference range: val reference = spark … Read more

Forecasting time series data

Here’s what I did: x$Date = as.Date(x$Date,format=”%m/%d/%Y”) x = xts(x=x$Used, order.by=x$Date) # To get the start date (305) # > as.POSIXlt(x = “2011-11-01″, origin=”2011-11-01”)$yday ## [1] 304 # Add one since that starts at “0” x.ts = ts(x, freq=365, start=c(2011, 305)) plot(forecast(ets(x.ts), 10)) Resulting in: What can we learn from this: Many of your steps … Read more