Calculating the averages for each KEY in a Pairwise (K,V) RDD in Spark with Python

Now a much better way to do this is to use the rdd.aggregateByKey() method. Because this method is so poorly documented in the Apache Spark with Python documentation — and is why I wrote this Q&A — until recently I had been using the above code sequence. But again, it’s less efficient, so avoid doing … Read more

Mysql Average on time column?

Try this: SELECT SEC_TO_TIME(AVG(TIME_TO_SEC(`login`))) FROM Table1; Test data: CREATE TABLE `login` (duration TIME NOT NULL); INSERT INTO `login` (duration) VALUES (’00:00:20′), (’00:01:10′), (’00:20:15′), (’00:06:50′); Result: 00:07:09

SQL query with avg and group by

If I understand what you need, try this: SELECT id, pass, AVG(val) AS val_1 FROM data_r1 GROUP BY id, pass; Or, if you want just one row for every id, this: SELECT d1.id, (SELECT IFNULL(ROUND(AVG(d2.val), 4) ,0) FROM data_r1 d2 WHERE d2.id = d1.id AND pass = 1) as val_1, (SELECT IFNULL(ROUND(AVG(d2.val), 4) ,0) FROM … Read more

Finding moving average from data points in Python

As numpy.convolve is pretty slow, those who need a fast performing solution might prefer an easier to understand cumsum approach. Here is the code: cumsum_vec = numpy.cumsum(numpy.insert(data, 0, 0)) ma_vec = (cumsum_vec[window_width:] – cumsum_vec[:-window_width]) / window_width where data contains your data, and ma_vec will contain moving averages of window_width length. On average, cumsum is about … Read more

Calculate cumulative average (mean)

In analogy to the cumulative sum of a list I propose this: The cumulative average avg of a vector x would contain the averages from 1st position till position i. One method is just to compute the the mean for each position by summing over all previous values and dividing by their number. By rewriting … Read more

ArrayFormula of Average on Infinite Truly Dynamic Range in Google Sheets

QUERY level 1: if all 5 cells in range C2:G have values: =QUERY(QUERY(C2:G, “select (C+D+E+F+G)/5”), “offset 1”, ) if not, then rows are skipped: if empty cells are considered as zeros: =INDEX(QUERY(QUERY({C2:G*1}, “select (Col1+Col2+Col3+Col4+Col5)/5”), “offset 1”, )) to remove zero values we use IFERROR(1/(1/…)) wrapping: =INDEX(IFERROR(1/(1/QUERY(QUERY({C2:G*1}, “select (Col1+Col2+Col3+Col4+Col5)/5”), “offset 1”, )))) to make Col references … Read more