aggregate – Make Me Engineer

Calculating the averages for each KEY in a Pairwise (K,V) RDD in Spark with Python

June 13, 2023 by Tarik

Now a much better way to do this is to use the rdd.aggregateByKey() method. Because this method is so poorly documented in the Apache Spark with Python documentation — and is why I wrote this Q&A — until recently I had been using the above code sequence. But again, it’s less efficient, so avoid doing … Read more

Aggregating (x,y) coordinate point clouds in PostgreSQL

June 4, 2023 by Tarik

Use the often overlooked built-in function width_bucket() in combination with your aggregation: If your coordinates run from, say, 0 to 2000 and you want to consolidate everything within squares of 5 to single points, I would lay out a grid of 10 (5*2) like this: SELECT device_id , width_bucket(pos_x, 0, 2000, 2000/10) * 10 AS … Read more

Returning first row of group

June 1, 2023 by Tarik

SQL Server : SUM() of multiple rows including where clauses

May 28, 2023 by Tarik

This will bring back totals per property and type SELECT PropertyID, TYPE, SUM(Amount) FROM yourTable GROUP BY PropertyID, TYPE This will bring back only active values SELECT PropertyID, TYPE, SUM(Amount) FROM yourTable WHERE EndDate IS NULL GROUP BY PropertyID, TYPE and this will bring back totals for properties SELECT PropertyID, SUM(Amount) FROM yourTable WHERE EndDate … Read more

Pass percentiles to pandas agg function

May 21, 2023 by Tarik

Perhaps not super efficient, but one way would be to create a function yourself: def percentile(n): def percentile_(x): return np.percentile(x, n) percentile_.__name__ = ‘percentile_%s’ % n return percentile_ Then include this in your agg: In [11]: column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max, percentile(50), percentile(95)]) Out[11]: sum mean std median var amin amax percentile_50 percentile_95 … Read more

aggregate() vs annotate() in Django

May 20, 2023 by Tarik

I would focus on the example queries rather than your quote from the documentation. Aggregate calculates values for the entire queryset. Annotate calculates summary values for each item in the queryset. Aggregation >>> Book.objects.aggregate(average_price=Avg(‘price’)) {‘average_price’: 34.35} Returns a dictionary containing the average price of all books in the queryset. Annotation >>> q = Book.objects.annotate(num_authors=Count(‘authors’)) >>> … Read more

R sum a variable by two groups [duplicate]

May 18, 2023 by Tarik

Linq to Objects – return pairs of numbers from list of numbers

May 17, 2023 by Tarik

None of the default linq methods can do this lazily and with a single scan. Zipping the sequence with itself does 2 scans and grouping is not entirely lazy. Your best bet is to implement it directly: public static IEnumerable<T[]> Partition<T>(this IEnumerable<T> sequence, int partitionSize) { Contract.Requires(sequence != null) Contract.Requires(partitionSize > 0) var buffer = … Read more

data.frame Group By column [duplicate]

May 10, 2023 by Tarik