python histogram one-liner [duplicate]

Python 3.x does have reduce, you just have to do a from functools import reduce. It also has “dict comprehensions”, which have exactly the syntax in your example. Python 2.7 and 3.x also have a Counter class which does exactly what you want: from collections import Counter cnt = Counter(“abracadabra”) In Python 2.6 or earlier, … Read more

Scala : fold vs foldLeft

The method fold (originally added for parallel computation) is less powerful than foldLeft in terms of types it can be applied to. Its signature is: def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1 This means that the type over which the folding is done has to be a supertype of the collection element … Read more

Spark groupByKey alternative

groupByKey is fine for the case when we want a “smallish” collection of values per key, as in the question. TL;DR The “do not use” warning on groupByKey applies for two general cases: 1) You want to aggregate over the values: DON’T: rdd.groupByKey().mapValues(_.sum) DO: rdd.reduceByKey(_ + _) In this case, groupByKey will waste resouces materializing … Read more

In Stream reduce method, must the identity always be 0 for sum and 1 for multiplication?

The identity value is a value, such that x op identity = x. This is a concept which is not unique to Java Streams, see for example on Wikipedia. It lists some examples of identity elements, some of them can be directly expressed in Java code, e.g. reduce(“”, String::concat) reduce(true, (a,b) -> a&&b) reduce(false, (a,b) … Read more

What is the ‘pythonic’ equivalent to the ‘fold’ function from functional programming?

The Pythonic way of summing an array is using sum. For other purposes, you can sometimes use some combination of reduce (from the functools module) and the operator module, e.g.: def product(xs): return reduce(operator.mul, xs, 1) Be aware that reduce is actually a foldl, in Haskell terms. There is no special syntax to perform folds, … Read more

Group by, and sum, and generate an object for each array in JavaScript

let data =[ {“id”:”2018″, “name”:”test”, “total”:1200}, {“id”:”2019″, “name”:”wath”, “total”:1500}, {“id”:”2019″, “name”:”wath”, “total”:1800}, {“id”:”2020″, “name”:”zooi”, “total”:1000}, ]; let map = data.reduce((prev, next) =>{ if (next.id in prev) { prev[next.id].total += next.total; } else { prev[next.id] = next; } return prev; }, {}); let result = Object.keys(map).map(id => map[id]); console.log(result);

Why is the final reduce step extremely slow in this MapReduce? (HiveQL, HDFS MapReduce)

If final reducer is a join then it looks like skew in join key. First of all check two things: check that b.f1 join key has no duplicates: select b.f1, count(*) cnt from B b group by b.f1 having count(*)>1 order by cnt desc; check the distribution of a.f1: select a.f1, count(*) cnt from A … Read more

Java Stream: divide into two lists by boolean predicate

Collectors.partitioningBy: Map<Boolean, List<Employee>> partitioned = listOfEmployees.stream().collect( Collectors.partitioningBy(Employee::isActive)); The resulting map contains two lists, corresponding to whether or not the predicate was matched: List<Employee> activeEmployees = partitioned.get(true); List<Employee> formerEmployees = partitioned.get(false); There are a couple of reasons to use partitioningBy over groupingBy (as suggested by Juan Carlos Mendoza): Firstly, the parameter of groupingBy is a Function<Employee, … Read more