statsmodels – Make Me Engineer

Rolling OLS Regressions and Predictions by Group

June 5, 2023 by Tarik

You should be able to achieve what you want using the groupby / apply pattern. The below code should be helpful. Create example data: from statsmodels.regression.rolling import RollingOLS from statsmodels.tools.tools import add_constant import pandas as pd import numpy as np # make some toy data race_dates = pd.to_datetime([‘2020-06-09’]*3 + [‘2020-12-01’]*4 + [‘2021-01-21’]*4 + [‘2021-05-04’]*5) distance … Read more

Time Series Analysis – unevenly spaced measures – pandas + statsmodels

May 16, 2023 by Tarik

seasonal_decompose() requires a freq that is either provided as part of the DateTimeIndex meta information, can be inferred by pandas.Index.inferred_freq or else by the user as an integer that gives the number of periods per cycle. e.g., 12 for monthly (from docstring for seasonal_mean): def seasonal_decompose(x, model=”additive”, filt=None, freq=None): “”” Parameters ———- x : array-like … Read more

auto.arima() equivalent for python

November 28, 2022 by Tarik

You can implement a number of approaches: ARIMAResults include aic and bic. By their definition, (see here and here), these criteria penalize for the number of parameters in the model. So you may use these numbers to compare the models. Also scipy has optimize.brute which does grid search on the specified parameters space. So a … Read more

confidence and prediction intervals with StatsModels

November 9, 2022 by Tarik

For test data you can try to use the following. predictions = result.get_prediction(out_of_sample_df) predictions.summary_frame(alpha=0.05) I found the summary_frame() method buried here and you can find the get_prediction() method here. You can change the significance level of the confidence interval and prediction interval by modifying the “alpha” parameter. I am posting this here because this was … Read more

scikit-learn & statsmodels – which R-squared is correct?

August 7, 2022 by Tarik

Arguably, the real challenge in such cases is to be sure that you compare apples to apples. And in your case, it seems that you don’t. Our best friend is always the relevant documentation, combined with simple experiments. So… Although scikit-learn’s LinearRegression() (i.e. your 1st R-squared) is fitted by default with fit_intercept=True (docs), this is … Read more

Weighted standard deviation in NumPy

July 25, 2022 by Tarik

How about the following short “manual calculation”? def weighted_avg_and_std(values, weights): “”” Return the weighted average and standard deviation. values, weights — Numpy ndarrays with the same shape. “”” average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt(variance))

ValueError: numpy.dtype has the wrong size, try recompiling

July 7, 2022 by Tarik

(to expand a bit on my comment) Numpy developers follow in general a policy of keeping a backward compatible binary interface (ABI). However, the ABI is not forward compatible. What that means: A package, that uses numpy in a compiled extension, is compiled against a specific version of numpy. Future version of numpy will be … Read more