How to add elements of a Java8 stream into an existing List

NOTE: nosid’s answer shows how to add to an existing collection using forEachOrdered(). This is a useful and effective technique for mutating existing collections. My answer addresses why you shouldn’t use a Collector to mutate an existing collection.

The short answer is no, at least, not in general, you shouldn’t use a Collector to modify an existing collection.

The reason is that collectors are designed to support parallelism, even over collections that aren’t thread-safe. The way they do this is to have each thread operate independently on its own collection of intermediate results. The way each thread gets its own collection is to call the Collector.supplier() which is required to return a new collection each time.

These collections of intermediate results are then merged, again in a thread-confined fashion, until there is a single result collection. This is the final result of the collect() operation.

A couple answers from Balder and assylias have suggested using Collectors.toCollection() and then passing a supplier that returns an existing list instead of a new list. This violates the requirement on the supplier, which is that it return a new, empty collection each time.

This will work for simple cases, as the examples in their answers demonstrate. However, it will fail, particularly if the stream is run in parallel. (A future version of the library might change in some unforeseen way that will cause it to fail, even in the sequential case.)

Let’s take a simple example:

List<String> destList = new ArrayList<>(Arrays.asList("foo"));
List<String> newList = Arrays.asList("0", "1", "2", "3", "4", "5");
newList.parallelStream()
       .collect(Collectors.toCollection(() -> destList));
System.out.println(destList);

When I run this program, I often get an ArrayIndexOutOfBoundsException. This is because multiple threads are operating on ArrayList, a thread-unsafe data structure. OK, let’s make it synchronized:

List<String> destList =
    Collections.synchronizedList(new ArrayList<>(Arrays.asList("foo")));

This will no longer fail with an exception. But instead of the expected result:

[foo, 0, 1, 2, 3]

it gives weird results like this:

[foo, 2, 3, foo, 2, 3, 1, 0, foo, 2, 3, foo, 2, 3, 1, 0, foo, 2, 3, foo, 2, 3, 1, 0, foo, 2, 3, foo, 2, 3, 1, 0]

This is the result of the thread-confined accumulation/merging operations I described above. With a parallel stream, each thread calls the supplier to get its own collection for intermediate accumulation. If you pass a supplier that returns the same collection, each thread appends its results to that collection. Since there is no ordering among the threads, results will be appended in some arbitrary order.

Then, when these intermediate collections are merged, this basically merges the list with itself. Lists are merged using List.addAll(), which says that the results are undefined if the source collection is modified during the operation. In this case, ArrayList.addAll() does an array-copy operation, so it ends up duplicating itself, which is sort-of what one would expect, I guess. (Note that other List implementations might have completely different behavior.) Anyway, this explains the weird results and duplicated elements in the destination.

You might say, “I’ll just make sure to run my stream sequentially” and go ahead and write code like this

stream.collect(Collectors.toCollection(() -> existingList))

anyway. I’d recommend against doing this. If you control the stream, sure, you can guarantee that it won’t run in parallel. I expect that a style of programming will emerge where streams get handed around instead of collections. If somebody hands you a stream and you use this code, it’ll fail if the stream happens to be parallel. Worse, somebody might hand you a sequential stream and this code will work fine for a while, pass all tests, etc. Then, some arbitrary amount of time later, code elsewhere in the system might change to use parallel streams which will cause your code to break.

OK, then just make sure to remember to call sequential() on any stream before you use this code:

stream.sequential().collect(Collectors.toCollection(() -> existingList))

Of course, you’ll remember to do this every time, right? 🙂 Let’s say you do. Then, the performance team will be wondering why all their carefully crafted parallel implementations aren’t providing any speedup. And once again they’ll trace it down to your code which is forcing the entire stream to run sequentially.

Don’t do it.

Leave a Comment

tech