time-series – Page 2 – Make Me Engineer

Resampling Within a Pandas MultiIndex

May 9, 2023 by Tarik

pd.Grouper allows you to specify a “groupby instruction for a target object”. In particular, you can use it to group by dates even if df.index is not a DatetimeIndex: df.groupby(pd.Grouper(freq=’2D’, level=-1)) The level=-1 tells pd.Grouper to look for the dates in the last level of the MultiIndex. Moreover, you can use this in conjunction with … Read more

How do you plot a vertical line on a time series plot in Pandas?

May 8, 2023 by Tarik

plt.axvline(x_position) It takes the standard plot formatting options (linestlye, color, ect) (doc) If you have a reference to your axes object: ax.axvline(x, color=”k”, linestyle=”–“)

Storing time-series data, relational or non?

May 7, 2023 by Tarik

Definitely Relational. Unlimited flexibility and expansion. Two corrections, both in concept and application, followed by an elevation. Correction It is not “filtering out the un-needed data”; it is selecting only the needed data. Yes, of course, if you have an Index to support the columns identified in the WHERE clause, it is very fast, and … Read more

Converting a data frame to xts

April 28, 2023 by Tarik

Pandas compare next row

April 18, 2023 by Tarik

Looks like you want to use the Series.shift method. Using this method, you can generate new columns which are offset to the original columns. Like this: df[‘qty_s’] = df[‘qty’].shift(-1) df[‘t_s’] = df[‘t’].shift(-1) df[‘z_s’] = df[‘z’].shift(-1) Now you can compare these: df[‘is_something’] = (df[‘qty’] == df[‘qty_s’]) & (df[‘t’] < df[‘t_s’]) & (df[‘z’] == df[‘z_s’]) Here is … Read more

Annotate Time Series plot in Matplotlib

April 18, 2023 by Tarik

Matplotlib uses an internal floating point format for dates. You just need to convert your date to that format (using matplotlib.dates.date2num or matplotlib.dates.datestr2num) and then use annotate as usual. As a somewhat excessively fancy example: import datetime as dt import matplotlib.pyplot as plt import matplotlib.dates as mdates x = [dt.datetime(2009, 05, 01), dt.datetime(2010, 06, 01), … Read more

auto.arima() equivalent for python

November 28, 2022 by Tarik

You can implement a number of approaches: ARIMAResults include aic and bic. By their definition, (see here and here), these criteria penalize for the number of parameters in the model. So you may use these numbers to compare the models. Also scipy has optimize.brute which does grid search on the specified parameters space. So a … Read more

How to make gradient color filled timeseries plot in R

November 22, 2022 by Tarik

And here’s an approach in base R, where we fill the entire plot area with rectangles of graduated colour, and subsequently fill the inverse of the area of interest with white. shade <- function(x, y, col, n=500, xlab=’x’, ylab=’y’, …) { # x, y: the x and y coordinates # col: a vector of colours … Read more

Filling gaps in timeseries Spark

November 22, 2022 by Tarik

If input DataFrame has following structure: root |– periodstart: timestamp (nullable = true) |– usage: long (nullable = true) Scala Determine min / max: val (minp, maxp) = df .select(min($”periodstart”).cast(“bigint”), max($”periodstart”.cast(“bigint”))) .as[(Long, Long)] .first Set step, for example for 15 minutes: val step: Long = 15 * 60 Generate reference range: val reference = spark … Read more

Forecasting time series data

November 20, 2022 by Tarik

Here’s what I did: x$Date = as.Date(x$Date,format=”%m/%d/%Y”) x = xts(x=x$Used, order.by=x$Date) # To get the start date (305) # > as.POSIXlt(x = “2011-11-01″, origin=”2011-11-01”)$yday ## [1] 304 # Add one since that starts at “0” x.ts = ts(x, freq=365, start=c(2011, 305)) plot(forecast(ets(x.ts), 10)) Resulting in: What can we learn from this: Many of your steps … Read more