regression – Make Me Engineer

Java-R integration?

June 11, 2023 by Tarik

R: lm() result differs when using `weights` argument and when using manually reweighted data

June 3, 2023 by Tarik

Ordering of points in R lines plot

June 2, 2023 by Tarik

How `poly()` generates orthogonal polynomials? How to understand the “coefs” returned?

June 1, 2023 by Tarik

get x-value given y-value: general root finding for linear / non-linear interpolation function

May 18, 2023 by Tarik

Show confidence limits and prediction limits in scatter plot

May 13, 2023 by Tarik

Here’s what I put together. I tried to closely emulate your screenshot. Given import numpy as np import scipy as sp import scipy.stats as stats import matplotlib.pyplot as plt %matplotlib inline # Raw Data heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65]) weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45]) Two detailed options to plot confidence intervals: def plot_ci_manual(t, s_err, n, x, x2, y2, ax=None): … Read more

fitting data with numpy

May 1, 2023 by Tarik

Unfortunately, np.polynomial.polynomial.polyfit returns the coefficients in the opposite order of that for np.polyfit and np.polyval (or, as you used np.poly1d). To illustrate: In [40]: np.polynomial.polynomial.polyfit(x, y, 4) Out[40]: array([ 84.29340848, -100.53595376, 44.83281408, -8.85931101, 0.65459882]) In [41]: np.polyfit(x, y, 4) Out[41]: array([ 0.65459882, -8.859311 , 44.83281407, -100.53595375, 84.29340846]) In general: np.polynomial.polynomial.polyfit returns coefficients [A, B, C] … Read more

scikit-learn cross validation, negative values with mean squared error

May 1, 2023 by Tarik

Trying to close this out, so am providing the answer that David and larsmans have eloquently described in the comments section: Yes, this is supposed to happen. The actual MSE is simply the positive version of the number you’re getting. The unified scoring API always maximizes the score, so scores which need to be minimized … Read more

predict.lm() with an unknown factor level in test data

December 1, 2022 by Tarik

You have to remove the extra levels before any calculation, like: > id <- which(!(foo.new$predictor %in% levels(foo$predictor))) > foo.new$predictor[id] <- NA > predict(model,newdata=foo.new) 1 2 3 4 -0.1676941 -0.6454521 0.4524391 NA This is a more general way of doing it, it will set all levels that do not occur in the original data to NA. … Read more

What does the capital letter “I” in R linear regression formula mean?

December 1, 2022 by Tarik

I isolates or insulates the contents of I( … ) from the gaze of R’s formula parsing code. It allows the standard R operators to work as they would if you used them outside of a formula, rather than being treated as special formula operators. For example: y ~ x + x^2 would, to R, … Read more