Is there an IEnumerable implementation that only iterates over it’s source (e.g. LINQ) once?

A fun challenge so I have to provide my own solution. So fun in fact that my solution now is in version 3. Version 2 was a simplification I made based on feedback from Servy. I then realized that my solution had huge drawback. If the first enumeration of the cached enumerable didn’t complete no caching would be done. Many LINQ extensions like First and Take will only enumerate enough of the enumerable to get the job done and I had to update to version 3 to make this work with caching.

The question is about subsequent enumerations of the enumerable which does not involve concurrent access. Nevertheless I have decided to make my solution thread safe. It adds some complexity and a bit of overhead but should allow the solution to be used in all scenarios.

public static class EnumerableExtensions {

  public static IEnumerable<T> Cached<T>(this IEnumerable<T> source) {
    if (source == null)
      throw new ArgumentNullException("source");
    return new CachedEnumerable<T>(source);
  }

}

class CachedEnumerable<T> : IEnumerable<T> {

  readonly Object gate = new Object();

  readonly IEnumerable<T> source;

  readonly List<T> cache = new List<T>();

  IEnumerator<T> enumerator;

  bool isCacheComplete;

  public CachedEnumerable(IEnumerable<T> source) {
    this.source = source;
  }

  public IEnumerator<T> GetEnumerator() {
    lock (this.gate) {
      if (this.isCacheComplete)
        return this.cache.GetEnumerator();
      if (this.enumerator == null)
        this.enumerator = source.GetEnumerator();
    }
    return GetCacheBuildingEnumerator();
  }

  public IEnumerator<T> GetCacheBuildingEnumerator() {
    var index = 0;
    T item;
    while (TryGetItem(index, out item)) {
      yield return item;
      index += 1;
    }
  }

  bool TryGetItem(Int32 index, out T item) {
    lock (this.gate) {
      if (!IsItemInCache(index)) {
        // The iteration may have completed while waiting for the lock.
        if (this.isCacheComplete) {
          item = default(T);
          return false;
        }
        if (!this.enumerator.MoveNext()) {
          item = default(T);
          this.isCacheComplete = true;
          this.enumerator.Dispose();
          return false;
        }
        this.cache.Add(this.enumerator.Current);
      }
      item = this.cache[index];
      return true;
    }
  }

  bool IsItemInCache(Int32 index) {
    return index < this.cache.Count;
  }

  IEnumerator IEnumerable.GetEnumerator() {
    return GetEnumerator();
  }

}

The extension is used like this (sequence is an IEnumerable<T>):

var cachedSequence = sequence.Cached();

// Pulling 2 items from the sequence.
foreach (var item in cachedSequence.Take(2))
  // ...

// Pulling 2 items from the cache and the rest from the source.
foreach (var item in cachedSequence)
  // ...

// Pulling all items from the cache.
foreach (var item in cachedSequence)
  // ...

There is slight leak if only part of the enumerable is enumerated (e.g. cachedSequence.Take(2).ToList(). The enumerator that is used by ToList will be disposed but the underlying source enumerator is not disposed. This is because the first 2 items are cached and the source enumerator is kept alive should requests for subsequent items be made. In that case the source enumerator is only cleaned up when eligigble for garbage Collection (which will be the same time as the possibly large cache).

Leave a Comment