statistics – Page 6 – Make Me Engineer

Select k random elements from a list whose elements have weights

May 29, 2022 by Tarik

If the sampling is with replacement, you can use this algorithm (implemented here in Python): import random items = [(10, “low”), (100, “mid”), (890, “large”)] def weighted_sample(items, n): total = float(sum(w for w, v in items)) i = 0 w, v = items[0] while n: x = total * (1 – random.random() ** (1.0 / … Read more

Workflow for statistical analysis and report writing

May 29, 2022 by Tarik

I generally break my projects into 4 pieces: load.R clean.R func.R do.R load.R: Takes care of loading in all the data required. Typically this is a short file, reading in data from files, URLs and/or ODBC. Depending on the project at this point I’ll either write out the workspace using save() or just keep things … Read more

How to find the statistical mode?

May 17, 2022 by Tarik

One more solution, which works for both numeric & character/factor data: Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second. If your data set might have multiple modes, the above solution takes the … Read more

Simple way to calculate median with MySQL

May 4, 2022 by Tarik

Simple way to calculate median with MySQL

Fitting empirical distribution to theoretical ones with Scipy (Python)?

May 3, 2022 by Tarik

Distribution Fitting with Sum of Square Error (SSE) This is an update and modification to Saullo’s answer, that uses the full list of the current scipy.stats distributions and returns the distribution with the least SSE between the distribution’s histogram and the data’s histogram. Example Fitting Using the El Niño dataset from statsmodels, the distributions are … Read more

Advice on storing statistics in firebase firestore [closed]

April 11, 2022 by Tarik