How to dynamically do filtering in Java 8?

Interesting problem. There are several things going on here. No doubt this could be solved in less than half a page of Haskell or Lisp, but this is Java, so here we go….

One issue is that we have a variable number of filters, whereas most of the examples that have been shown illustrate fixed pipelines.

Another issue is that some of the OP’s “filters” are context sensitive, such as “top 50% by a certain order”. This can’t be done with a simple filter(predicate) construct on a stream.

The key is to realize that, while lambdas allow functions to be passed as arguments (to good effect) it also means that they can be stored in data structures and computations can be performed on them. The most common computation is to take multiple functions and compose them.

Assume that the values being operated on are instances of Widget, which is a POJO that has some obvious getters:

class Widget {
    String name() { ... }
    int length() { ... }
    double weight() { ... }

    // constructors, fields, toString(), etc.
}

Let’s start off with the first issue and figure out how to operate with a variable number of simple predicates. We can create a list of predicates like this:

List<Predicate<Widget>> allPredicates = Arrays.asList(
    w -> w.length() >= 10,
    w -> w.weight() > 40.0,
    w -> w.name().compareTo("c") > 0);

Given this list, we can permute them (probably not useful, since they’re order independent) or select any subset we want. Let’s say we just want to apply all of them. How do we apply a variable number of predicates to a stream? There is a Predicate.and() method that will take two predicates and combine them using a logical and, returning a single predicate. So we could take the first predicate and write a loop that combines it with the successive predicates to build up a single predicate that’s a composite and of them all:

Predicate<Widget> compositePredicate = allPredicates.get(0);
for (int i = 1; i < allPredicates.size(); i++) {
    compositePredicate = compositePredicate.and(allPredicates.get(i));
}

This works, but it fails if the list is empty, and since we’re doing functional programming now, mutating a variable in a loop is declassé. But lo! This is a reduction! We can reduce all the predicates over the and operator get a single composite predicate, like this:

Predicate<Widget> compositePredicate =
    allPredicates.stream()
                 .reduce(w -> true, Predicate::and);

(Credit: I learned this technique from @venkat_s. If you ever get a chance, go see him speak at a conference. He’s good.)

Note the use of w -> true as the identity value of the reduction. (This could also be used as the initial value of compositePredicate for the loop, which would fix the zero-length list case.)

Now that we have our composite predicate, we can write out a short pipeline that simply applies the composite predicate to the widgets:

widgetList.stream()
          .filter(compositePredicate)
          .forEach(System.out::println);

Context Sensitive Filters

Now let’s consider what I referred to as a “context sensitive” filter, which is represented by the example like “top 50% in a certain order”, say the top 50% of widgets by weight. “Context sensitive” isn’t the best term for this but it’s what I’ve got at the moment, and it is somewhat descriptive in that it’s relative to the number of elements in the stream up to this point.

How would we implement something like this using streams? Unless somebody comes up with something really clever, I think we have to collect the elements somewhere first (say, in a list) before we can emit the first element to the output. It’s kind of like sorted() in a pipeline which can’t tell which is the first element to output until it has read every single input element and has sorted them.

The straightforward approach to finding the top 50% of widgets by weight, using streams, would look something like this:

List<Widget> temp =
    list.stream()
        .sorted(comparing(Widget::weight).reversed())
        .collect(toList());
temp.stream()
    .limit((long)(temp.size() * 0.5))
    .forEach(System.out::println);

This isn’t complicated, but it’s a bit cumbersome as we have to collect the elements into a list and assign it to a variable, in order to use the list’s size in the 50% computation.

This is limiting, though, in that it’s a “static” representation of this kind of filtering. How would we chain this into a stream with a variable number of elements (other filters or criteria) like we did with the predicates?

A important observation is that this code does its actual work in between the consumption of a stream and the emitting of a stream. It happens to have a collector in the middle, but if you chain a stream to its front and chain stuff off its back end, nobody is the wiser. In fact, the standard stream pipeline operations like map and filter each take a stream as input and emit a stream as output. So we can write a function kind of like this ourselves:

Stream<Widget> top50PercentByWeight(Stream<Widget> stream) {
    List<Widget> temp =
        stream.sorted(comparing(Widget::weight).reversed())
              .collect(toList());
    return temp.stream()
               .limit((long)(temp.size() * 0.5));
}

A similar example might be to find the shortest three widgets:

Stream<Widget> shortestThree(Stream<Widget> stream) {
    return stream.sorted(comparing(Widget::length))
                 .limit(3);
}

Now we can write something that combines these stateful filters with ordinary stream operations:

shortestThree(
    top50PercentByWeight(
        widgetList.stream()
                  .filter(w -> w.length() >= 10)))
.forEach(System.out::println);

This works, but is kind of lousy because it reads “inside-out” and backwards. The stream source is widgetList which is streamed and filtered through an ordinary predicate. Now, going backwards, the top 50% filter is applied, then the shortest-three filter is applied, and finally the stream operation forEach is applied at the end. This works but is quite confusing to read. And it’s still static. What we really want is to have a way to put these new filters inside a data structure that we can manipulate, for example, to run all the permutations, as in the original question.

A key insight at this point is that these new kinds of filters are really just functions, and we have functional interface types in Java which let us represent functions as objects, to manipulate them, store them in data structures, compose them, etc. The functional interface type that takes an argument of some type and returns a value of the same type is UnaryOperator. The argument and return type in this case is Stream<Widget>. If we were to take method references such as this::shortestThree or this::top50PercentByWeight, the types of the resulting objects would be

UnaryOperator<Stream<Widget>>

If we were to put these into a list, the type of that list would be

List<UnaryOperator<Stream<Widget>>>

Ugh! Three levels of nested generics is too much for me. (But Aleksey Shipilev did once show me some code that used four levels of nested generics.) The solution for too much generics is to define our own type. Let’s call one of our new things a Criterion. It turns out that there’s little value to be gained by making our new functional interface type be related to UnaryOperator, so our definition can simply be:

@FunctionalInterface
public interface Criterion {
    Stream<Widget> apply(Stream<Widget> s);
}

Now we can create a list of criteria like this:

List<Criterion> criteria = Arrays.asList(
    this::shortestThree,
    this::lengthGreaterThan20
);

(We’ll figure out how to use this list below.) This is a step forward, since we can now manipulate the list dynamically, but it’s still somewhat limiting. First, it can’t be combined with ordinary predicates. Second, there’s a lot of hard-coded values here, such as the shortest three: how about two or four? How about a different criterion than length? What we really want is a function that creates these Criterion objects for us. This is easy with lambdas.

This creates a criterion that selects the top N widgets, given a comparator:

Criterion topN(Comparator<Widget> cmp, long n) {
    return stream -> stream.sorted(cmp).limit(n);
}

This creates a criterion that selects the top p percent of widgets, given a comparator:

Criterion topPercent(Comparator<Widget> cmp, double pct) {
    return stream -> {
        List<Widget> temp =
            stream.sorted(cmp).collect(toList());
        return temp.stream()
                   .limit((long)(temp.size() * pct));
    };
}

And this creates a criterion from an ordinary predicate:

Criterion fromPredicate(Predicate<Widget> pred) {
    return stream -> stream.filter(pred);
}

Now we have a very flexible way of creating criteria and putting them into a list, where they can be subsetted or permuted or whatever:

List<Criterion> criteria = Arrays.asList(
    fromPredicate(w -> w.length() > 10),                    // longer than 10
    topN(comparing(Widget::length), 4L),                    // longest 4
    topPercent(comparing(Widget::weight).reversed(), 0.50)  // heaviest 50%
);

Once we have a list of Criterion objects, we need to figure out a way to apply all of them. Once again, we can use our friend reduce to combine all of them into a single Criterion object:

Criterion allCriteria =
    criteria.stream()
            .reduce(c -> c, (c1, c2) -> (s -> c2.apply(c1.apply(s))));

The identity function c -> c is clear, but the second arg is a bit tricky. Given a stream s we first apply Criterion c1, then Criterion c2, and this is wrapped in a lambda that takes two Criterion objects c1 and c2 and returns a lambda that applies the composition of c1 and c2 to a stream and returns the resulting stream.

Now that we’ve composed all the criteria, we can apply it to a stream of widgets like so:

allCriteria.apply(widgetList.stream())
           .forEach(System.out::println);

This is still a bit inside-out, but it’s fairly well controlled. Most importantly, it addresses the original question, which is how to combine criteria dynamically. Once the Criterion objects are in a data structure, they can be selected, subsetted, permuted, or whatever as necessary, and they can all be combined in a single criterion and applied to a stream using the above techniques.

The functional programming gurus are probably saying “He just reinvented … !” which is probably true. I’m sure this has probably been invented somewhere already, but it’s new to Java, because prior to lambda, it just wasn’t feasible to write Java code that uses these techniques.

Update 2014-04-07

I’ve cleaned up and posted the complete sample code in a gist.

Leave a Comment