21 minute read

Have you ever considered to go into the rabbit hole and look what’s inside LINQ? No? You are probably sane.
Unfortunately, one day I’ve asked myself some questions about internals of LINQ - How it was built, and if I could i make faster?
In previous years of work and education - I was told about ‘slowness’ of LINQ, and to try to avoid it where performance matters.
So, I decided to take a closer look inside. To make the task harder, I want to use as little as possible allocations to heap.
There is a library that realizes the same approach - ZLinq, so it’s probably a good starting point aside of Linq’s source code.

Enumerators, Iterators and structs

IEnumerator<T> is an interface exposed through IEnumerable.GetEnumerator() and allows us to iterate through the collection.
This interface (in their generic version) inherit concrete IEnumerator, exposing property, and two methods - T Current (object Current for IEnumerator), bool MoveNext(), void Reset().
When we use foreach-loop in our code over a collection, it’s get translated to IEnumerable and IEnumerator.
Let’s look:

    //Code written by us

    var people = new List<Person>()
    {
        new Person() { Name = "Bob", Age = 25 },
        new Person() { Name = "Jane", Age = 23 },
        new Person() { Name = "John", Age = 21 }
    };

    foreach (Person person in people)
    {
        Console.WriteLine(person.Name);
    }
    //Generated

    List<Person> personList = new List<Person>();
    Person person1 = new Person();
    person1.Name = "Bob";
    person1.Age = 25;
    personList.Add(person1);
    Person person2 = new Person();
    person2.Name = "Jane";
    person2.Age = 23;
    personList.Add(person2);
    Person person3 = new Person();
    person3.Name = "John";
    person3.Age = 21;
    personList.Add(person3);
    List<Person>.Enumerator enumerator = personList.GetEnumerator();
    try
    {
      while (enumerator.MoveNext())
      {
        Person person4 = enumerator.Current;
        DefaultInterpolatedStringHandler interpolatedStringHandler = new DefaultInterpolatedStringHandler(3, 2);
        interpolatedStringHandler.AppendFormatted(person4.Name);
        interpolatedStringHandler.AppendLiteral(" - ");
        interpolatedStringHandler.AppendFormatted<int>(person4.Age);
        Console.WriteLine(interpolatedStringHandler.ToStringAndClear());
      }
    }
    finally
    {
      enumerator.Dispose();
    }

Why is that?
List<T> implements IList<T>, which inherits IEnumerable<T>. We know right now that IEnumerable<T> exposes IEnumerator, so after calling foreach (Person person in people), we also called the constructor of the Enumerator struct.
To go through each item in a list, you need to use the bool MoveNext() method, until it returns false.

Why to use a struct here, rather than a class or a record?
The most important is the fact, that structure type was created to encapsulate data and related functionality. Enumerator fulfills this idea right on the spot.
Secondly, structs are stored on stack most of the time (until They are part of other object e.g class, or They are static field)
Lastly - they are smaller and can be passed by value (by default, as they are value-type) and reference.

Okay, so where is Iterator?
This is the point where System.Linq steps in.
Writing code using Linq’s methods is like building a chain. It’s like a domino effect.
When You write:

    new int[] { 1, 2, 3 }.Select(x => (x * 2).ToString()).ToArray();

it gets casted to (IEnumerable<int>) then Select extension method creates a new ArraySelectIterator<int, string>.
The ArraySelectIterator inherits from generic Iterator<TResult> (It’s an abstract class!). Value of a Iterator is immutable,
It means that represented operation cannot be changed. Iterator serves as its own enumerator, so the state of the iterator may change as it is being enumerated.
However, if you make something like this:

    new int[] { 1, 2, 3, 4, 5, 5, 6, 7, 10 }.Select(x => (x * 2).ToString())
    .Where(x => x[0] == '1').ToArray();

You’ll end up casting (again!) to (IEnumerable<int>) then invoke the new ArraySelectIterator<int, string>, and finally (thanks to the .Where(x => x[0] == '1') method) the new IEnumerableWhereIterator<string>.

Can you access it like a normal array?
No, To do this, you have to call the .ToArray() extension method (or .ToList(), .ToDictionary(), etc.). The .ToArray() executes a whole chain, by using the bool MoveNext() method of enumerators (Where calls Select calls source (IEnumerable<T>)). Inside we can found struct named SegmentedArrayBuilder<T>.
The array builder is being allocated on the stack, and operates on segments - when we add an element it checks if _currentSegment is long enough to add (if yes, then just adds under next index).
If _currentSegment is not long enough it calculates new size of a new segment length (currentSegmentLength * 2), then rents an array from ArrayPool<T> and allocates it to _currentSegment and the _segments array. At the end _segmentsCount is incremented to keep track of all allocated segments in _segments array.
So, when you call the .ToArray() it has to iterate over all segments and copy data from them.

After acquiring all information from above, We can image that this code:

    var rand = new Random();
    var listOfInts = new List<NestedInts>()
    {
        new NestedInts(),
        new NestedInts(),
        new NestedInts()
    };

    List<int> result = new List<int>();

    foreach (NestedInts nestedInt in listOfInts)
    {
        var example = nestedInt.Array.Where(x => x > rand.Next(0, 10)).ToArray();
        result.AddRange(example);
    }

    class NestedInts
    {
        public int[] Array { get; set; } = new []{ 1,2,3,4,5,6,7,8,9,10};
    }

will create IEnumerator<NestedInts> then three times ArrayWhereIterator<int> and SegmentedArrayBuilder<int>.
Lots of allocations, iterations that could be simplified ;).

Simple enumerator

Let’s start writing code. Like in IEnumerator<T> We want to move to next element, get source and count, then try to copy source. To fulfill this assumptions we have to create interface like this:

public interface IStructEnumerator<T> : IDisposable
{
    bool Next(ref T current);

    bool GetCountToLeftEnumerate(out int count);

    bool GetUnderlying(ref ReadOnlySpan<T> span);

    bool TryCopy(ref Span<T> destination, int offset);
}

The reason behind ref, is that We don’t know the type of value, so I want to use it by reference (the downside of this approach is accidental change of source’s element).
IDisposable will help us to free unnecessary resources.

Implementation:

public ref struct ArrayStructEnumerator<T> : IStructEnumerator<T>
{
    internal readonly T[] Array;
    internal int Index;
    internal readonly int Count;

    public ArrayStructEnumerator(T[] array)
    {
        Array = array ?? throw new ArgumentNullException(nameof(array));
        Count = array.Length;
        Index = -1;
    }
    
    public void Dispose()
    {
        Index = -1;
    }

    public bool Next(ref T current)
    {
        if (Index < Count - 1)
        {
            current = Array[++Index];
            return true;
        }

        current = default(T);
        return false;
    }

    public bool GetCountToLeftEnumerate(out int count)
    {
        if (Index < 0)
        {
            count = Count;

            return true;
        }
        
        if(Index >= Count)
        {
            count = 0;
            return false;
        }
        
        count = Count - Index - 1;
        return true;
    }

    public bool GetUnderlying(ref ReadOnlySpan<T> span)
    {
        if(Array == null || Count == 0)
        {
            span = default(ReadOnlySpan<T>);
            return false;
        }
        
        span = Array.AsSpan();

        return true;
    }

    public bool TryCopy(ref Span<T> destination, int offset)
    {
        ReadOnlySpan<T> source = default;
        if (GetUnderlying(ref source))
        {
            if (offset < 0 || offset >= source.Length)
            {
                return false;
            }
            ReadOnlySpan<T> slice = source.Slice(offset);

            for (int i = 0; i < slice.Length; i++)
            {
                destination[i] = slice[i];
            }

            return true;
        }

        return false;
    }
}

A lot happening here, so let’s break it into smaller parts:

    internal readonly T[] Array;
    internal int Index;
    internal readonly int Count;

    public ArrayStructEnumerator(T[] array)
    {
        Array = array ?? throw new ArgumentNullException(nameof(array));
        Count = array.Length;
        Index = -1;
    }

Making array readonly helps us with accidental change of underlying array.
_count introduces traceability of max length of collection.

Let’s look at TryCopy and GetUnderlying

    public bool GetUnderlying(ref ReadOnlySpan<T> span)
    {
        if(Array == null || Count == 0)
        {
            span = default(ReadOnlySpan<T>);
            return false;
        }
        
        span = Array.AsSpan();

        return true;
    }

    public bool TryCopy(ref Span<T> destination, int offset)
    {
        ReadOnlySpan<T> source = default;
        if (GetUnderlying(ref source))
        {
            if (offset < 0 || offset >= source.Length)
            {
                return false;
            }
            ReadOnlySpan<T> slice = source.Slice(offset);

            for (int i = 0; i < slice.Length; i++)
            {
                destination[i] = slice[i];
            }

            return true;
        }

        return false;
    }

Both of them are operating on Span<T>, which is faster than operating on an array. Thanks to this approach, possible operations in a chain will cost us less resources.
For TryCopy we want to introduce copying part of source for the future.

Also, we shouldn’t forget about IEnumerable equivalent, that will help us in the future.

public ref struct StructEnumerable<TEnumerator, T>(TEnumerator enumerator)
where TEnumerator : struct, IStructEnumerator<T>, allows ref struct
{
    public TEnumerator Enumerator = enumerator;
}

Due to the fact that we want to benefit from stack speed, our IEnumerable equivalent must be ref struct. Furthermore, we predict that TEnumerator will be struct (which is true), and we allow ref struct, to avoid passing by value and support future enumerators.

Select

After implementing our first link in chain, Let’s move to implementing first projection operation - Select. We should keep availability to transforming one object into a new form as our main assumption. That means every Next element we should retrieve a new form of it, based on external function.
It’s time to look at implementation:

public ref struct InternalSelect<TEnumerator, TIn, TOut> : IStructEnumerator<TOut>
where TEnumerator : struct, IStructEnumerator<TIn>, allows ref struct
{
    internal TEnumerator _source;
    internal readonly Func<TIn, TOut> Selector;

    public InternalSelect(TEnumerator source, Func<TIn, TOut> selector)
    {
        _source = source;
        Selector = selector;
    }

    public void Dispose()
    {
        _source.Dispose();
    }

    public bool Next(ref TOut current)
    {
        TIn? fromSource = default(TIn);
        if (_source.Next(ref fromSource))
        {
            current = Selector(fromSource);
            return true;
        }

        current = default(TOut);
        return false;
    }

    public bool GetCountToLeftEnumerate(out int count)
    {
        if (_source.GetCountToLeftEnumerate(out int sourceCount))
        {
            count = sourceCount;

            return true;
        }

        count = 0;
        return false;
    }

    public bool GetUnderlying(ref ReadOnlySpan<TOut> span)
    {
        span = default;

        return false;
    }

    public bool TryCopy(ref Span<TOut> destination, int offset) => false;
}

The main difference between ArrayStructEnumerator is the fact, that our _source is not an array, but other IStructEnumerator. To achieve it We will use three reference types TEnumerator, TIn and TOut to point at incoming IStructEnumerator, type to change, and lastly type to be changed to.
We accept incoming Func<TIn, TOut> in constructor, and store is in a backing field as readonly to avoid any changes.

public ref struct InternalSelect<TEnumerator, TIn, TOut> : IStructEnumerator<TOut>
where TEnumerator : struct, IStructEnumerator<TIn>, allows ref struct
{
    internal TEnumerator _source;
    internal readonly Func<TIn, TOut> Selector;

    public InternalSelect(TEnumerator source, Func<TIn, TOut> selector)
    {
        _source = source;
        Selector = selector;
    }

Out Next will be similar to ArrayStructEnumerator, but a current element will be affected by a _selector

    public bool Next(ref TOut current)
    {
        TIn? fromSource = default(TIn);
        if (_source.Next(ref fromSource))
        {
            current = Selector(fromSource);
            return true;
        }

        current = default(TOut);
        return false;
    }

The main difference will be for two methods: GetUnderlying and TryCopy.
As We don’t accept _source as an array, We can’t get source’s array and copy of it. The reason behind this is that We can’t predict previous or next elements in our enumerator chain.

    public bool GetUnderlying(ref ReadOnlySpan<TOut> span)
    {
        span = default;

        return false;
    }

    public bool TryCopy(ref Span<TOut> destination, int offset) => false;

Using our knowledge, we can right now make InternalSelect dedicated for ArrayStructEnumerator, because we want to steer future chain by source.

public ref struct InternalSelectArray<TIn, TOut> : IStructEnumerator<TOut>
{
    private ArrayStructEnumerator<TIn> _source;
    private Span<TIn> _span;
    private readonly Func<TIn, TOut> _selector;

    public InternalSelectArray(ArrayStructEnumerator<TIn> source, Func<TIn, TOut> selector)
    {
        _source = source;
        _span = source.Array.AsSpan();
        _selector = selector;
    }

    public void Dispose()
    {
        _source.Dispose();
    }

    public bool Next(ref TOut current)
    {
        if (_source.Index < _source.Count - 1)
        {
            current = _selector(_span[++_source.Index]);
            return true;
        }

        return false;
    }

    public bool GetCountToLeftEnumerate(out int count)
    {
        if (_source.GetCountToLeftEnumerate(out int sourceCount))
        {
            count = sourceCount;

            return true;
        }

        Unsafe.SkipInit(out count);
        return false;
    }

    public bool GetUnderlying(ref ReadOnlySpan<TOut> span)
    {
        return false;
    }

    public bool TryCopy(ref Span<TOut> destination, int offset) => false;
}

Lastly, It’s time to decorate our InternalSelect with extension methods, to accept any array and IStructEnumerator.

public static class SelectStruct
{ 
    public static StructEnumerable<InternalSelect<TEnumerator, TIn, TOut>, TOut> StructSelect<TEnumerator, TIn, TOut>(
        this StructEnumerable<TEnumerator, TIn> source,
        Func<TIn, TOut> selector)
    where TEnumerator : struct, IStructEnumerator<TIn>, allows ref struct
    {
        return new(new InternalSelect<TEnumerator, TIn, TOut>(source.Enumerator, selector));
    } 
    
    public static StructEnumerable<InternalSelectArray<TIn, TOut>, TOut> StructSelect<TIn, TOut>(
        this TIn[] array,
        Func<TIn, TOut> selector)
    {
        return new(new InternalSelectArray<TIn, TOut>(new ArrayStructEnumerator<TIn>(array), selector));
    }
}

Benchmark

To check performance of our method, I’ve implemented two classes Example and NewExample, and some benchmark methods.

Example class with enum:

    public class Example
    {
        public string ExampleString { get; set; }
        public int ExampleNumber { get; set; }
        public ExampleEnum ExampleEnum { get; set; }
    }

    public enum ExampleEnum {
        First,
        Second,
        Third
    }

NewExample class:

public class NewExample
{
    public string NewStringProp { get; set; }
    public int HalvedInt { get; set; }
}

Benchmark class:

    public class SelectBenchmark
    {
        public IEnumerable<Example[]> Data()
        {
            //yield return Generator.Generate(5_000);
            //yield return Generator.Generate(10_000);
            //yield return Generator.Generate(50_000);
            yield return Generator.Generate(500_000);
        }
        
        
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void LinqSelect(Example[] input) =>
            input.Select(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());
        
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void ParallelLinqSelect(Example[] input) =>
            input.AsParallel().Select(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());
        
        
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void SelectUsingEnumerator(Example[] input) =>
            input.SelectUsingEnumerator(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());
        
        // Our implementation!
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void NonAllocSelect(Example[] input)
        {
            input.StructSelect(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());

        }
        
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void ZLinqSelect(Example[] input)
        {
            input.AsValueEnumerable().Select(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());

        }
    }

Results

| Method                | input           | Mean     | Error    | StdDev   | Completed Work Items | Lock Contentions | Gen0      | Gen1      | Gen2     | Allocated |
|---------------------- |---------------- |---------:|---------:|---------:|---------------------:|-----------------:|----------:|----------:|---------:|----------:|
| LinqSelect            | Example[500000] | 32.21 ms | 0.639 ms | 1.320 ms |                    - |                - | 1156.2500 | 1125.0000 | 218.7500 |  19.07 MB |
| ParallelLinqSelect    | Example[500000] | 32.74 ms | 0.651 ms | 1.470 ms |              15.0000 |                - | 1250.0000 | 1187.5000 | 187.5000 |   27.1 MB |
| SelectUsingEnumerator | Example[500000] | 31.47 ms | 0.460 ms | 0.408 ms |                    - |                - | 1125.0000 | 1093.7500 | 187.5000 |  22.89 MB |
| NonAllocSelect        | Example[500000] | 27.76 ms | 0.458 ms | 0.406 ms |                    - |                - | 1156.2500 | 1125.0000 | 218.7500 |  19.07 MB |
| ZLinqSelect           | Example[500000] | 27.52 ms | 0.397 ms | 0.352 ms |                    - |                - | 1156.2500 | 1125.0000 | 218.7500 |  19.07 MB |

Where

It’s worth to filter data from time to time. In comparison to a InternalSelect we will use one reference type less, and use similar approach on the matter of filtering Func.
This time rather than using Func<TIn, TOut> we will focus on Func<TIn, bool>. That’s because We want to filter out elements that don’t match in a Next method.

public ref struct InternalWhere<TEnumerator, TIn> : IStructEnumerator<TIn>
    where TEnumerator : struct, IStructEnumerator<TIn>, allows ref struct
{
    internal TEnumerator _source;
    internal readonly Func<TIn, bool> _filter;

    public InternalWhere(TEnumerator source, Func<TIn, bool> filter)
    {
        _source = source;
        _filter = filter;
    }

    public void Dispose()
    {
        _source.Dispose();
    }

    public bool Next(ref TIn current)
    {
        TIn? fromSource = default(TIn);

        //Because we want to check if source has next element and then filter it
        while (_source.Next(ref fromSource))
        {
            if (_filter(fromSource))
            {
                current = fromSource;

                return true;
            }
        }

        return false;
    }

    public bool GetCountToLeftEnumerate(out int count)
    {
        bool val = _source.GetCountToLeftEnumerate(out int enumeratorCount);

        count = enumeratorCount;
        return val;
    }

    public bool GetUnderlying(ref ReadOnlySpan<TIn> span)
    {
        span = default;

        return false;
    }

    public bool TryCopy(ref Span<TIn> destination, int offset) => false;
}

The Next method will iterate _source until conditions specified in a Func<TIn, bool> will be met.
The reason behind a internal keyword for a _source and a _filter will be explained in the future.

And concrete struct:

//concrete - array
public ref struct InternalWhereArray<TIn> : IStructEnumerator<TIn>
{
    private ArrayStructEnumerator<TIn> _source;
    private Span<TIn> _span;
    internal readonly Func<TIn, bool> _filter;

    public InternalWhereArray(ArrayStructEnumerator<TIn> source, Func<TIn, bool> filter)
    {
        _source = source;
        _span = source.Array.AsSpan();
        _filter = filter;
    }

    public void Dispose()
    {
        _source.Dispose();
    }

    public bool Next(ref TIn current)
    {
        while (_source.Index < _source.Count - 1)
        {
            var item = _span[++_source.Index];
            if (_filter(item))
            {
                current = item;

                return true;
            }
        }

        return false;
    }

    public bool GetCountToLeftEnumerate(out int count)
    {
        count = 0;
        return false;
    }

    public bool GetUnderlying(ref ReadOnlySpan<TIn> span)
    {
        return false;
    }

    public bool TryCopy(ref Span<TIn> destination, int offset) => false;
}

Let’s implement extensions:

    public static class WhereStruct
    {
        public static StructEnumerable<InternalWhereArray<T>, T> StructWhere<T>(
            this T[] array,
            Func<T, bool> filter)
        {
            return new(new InternalWhereArray<T>(new ArrayStructEnumerator<T>(array), filter));
        }

        public static StructEnumerable<InternalWhere<TEnumerator, T>, T> StructWhere<TEnumerator, T>(
            this StructEnumerable<TEnumerator, T> source,
            Func<T, bool> filter) where TEnumerator : struct, IStructEnumerator<T>, allows ref struct
        {
            return new(new InternalWhere<TEnumerator, T>(source.Enumerator, filter));
        }
    }

Benchmark

We will use the same classes as We did in Select.
Benchmark class:

    [MemoryDiagnoser]
    [ThreadingDiagnoser]
    public class WhereBenchmark
    {
        public IEnumerable<Example[]> Data()
        {
            //yield return Generator.Generate(5_000);
            //yield return Generator.Generate(10_000);
            //yield return Generator.Generate(50_000);
            yield return Generator.Generate(500_000);
        }


        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void LinqWhere(Example[] input) =>
            input.Where(x => x.ExampleNumber % 2 == 0)
                .ToArray()
                .Consume(new Consumer());

        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void ParallelLinqWhere(Example[] input) =>
            input.AsParallel().Where(x => x.ExampleNumber % 2 == 0)
                .ToArray().Consume(new Consumer());

        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void WhereUsingEnumerator(Example[] input) =>
            input.WhereUsingEnumerator(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());

        //Our implementation!
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void NonAllocWhere(Example[] input)
        {
            input.StructWhere(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());
        }

        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void ZLinqWhere(Example[] input)
        {
            input.AsValueEnumerable().Where(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());
        }
    }

Results

| Method               | input           | Mean      | Error     | StdDev    | Completed Work Items | Lock Contentions | Gen0     | Gen1     | Gen2     | Allocated |
|--------------------- |---------------- |----------:|----------:|----------:|---------------------:|-----------------:|---------:|---------:|---------:|----------:|
| LinqWhere            | Example[500000] |  6.328 ms | 0.1251 ms | 0.1754 ms |                    - |                - |  31.2500 |  31.2500 |  31.2500 |   1.92 MB |
| ParallelLinqWhere    | Example[500000] |  3.933 ms | 0.0767 ms | 0.1531 ms |              15.0000 |                - | 195.3125 | 125.0000 |  70.3125 |   5.92 MB |
| WhereUsingEnumerator | Example[500000] | 11.603 ms | 0.3069 ms | 0.9050 ms |                    - |                - | 140.6250 | 140.6250 | 140.6250 |   7.63 MB |
| NonAllocWhere        | Example[500000] |  5.476 ms | 0.2115 ms | 0.6237 ms |                    - |                - |  31.2500 |  31.2500 |  31.2500 |   1.92 MB |
| ZLinqWhere           | Example[500000] |  5.726 ms | 0.1136 ms | 0.1166 ms |                    - |                - |  31.2500 |  31.2500 |  31.2500 |   1.92 MB |

ToArray

We implemented beginning, middle, but let’s end our chain of enumerators.
Extension methods are perfect use for this, right now We need two - one for generic IStructEnumerator and second for filtering explicitly.

Generic

It’s time to look at generic implementation:

    public static T[] ToArray<TEnumerator, T>(this StructEnumerable<TEnumerator, T> enumerable)
        where TEnumerator : struct, IStructEnumerator<T>, allows ref struct
    {
        var enumerator = enumerable.Enumerator;
        T[] result = null;
        ReadOnlySpan<T> span = default;
        
        if (enumerator.GetUnderlying(ref span))
        {
            return span.ToArray();
        }
        
        if (enumerator.GetCountToLeftEnumerate(out int count))
        {
            result = GC.AllocateUninitializedArray<T>(count);

            T current = default;
            int idx = 0;

            while (enumerator.Next(ref current))
            {
                result[idx++] = current;
            }

            return result;
        }

        return result;
    }

We assume that chain can access to enumeration count, then it calls last enumerator’s method in chain - Next and adds element to array.
Why GC.AllocateUninitializedArray<T>() rather than new T[]?
Again, the first reason is performance, the second one, We don’t have to care if array was zeroed during creation.

Where-explicit (for now!)

Generic implementation can’t fit into filtering operations made by Where.
The main issue here is unpredictability of element’s count in target array.
Before executing selection operations We can safely assume that count of element at the beginning and the end will be the same.
To fix this let’s introduce ArrayBuilder<T>, based on approach introduced in SegmentedArrayBuilder and found in ZLinq’s .ToArray.

ArrayBuilder

Our ArrayBuilder must consist of:

  • Storage of buffers
  • Current buffer
  • Current buffer’s index
  • Current buffer element count
  • All items count

Gathering all assumptions like this we can start implementing.
Right now we should declare a ref struct ArrayBuilder<T> due to the fact that We want it to be as fast as possible.

    public ref struct ArrayBuilder<T> 
    {
        private const int DefaultCapacity = 4;
        private T[] _buffer;
        private SmallBuffer<T[]> _buffers;
        private int _count;
        private int _buffersIndex;
        private int _capacity;
        
        public ArrayBuilder()
        {
            _buffers = new SmallBuffer<T[]>();
            _count = 0;
            _buffersIndex = 0;
        }
    }

DefaultCapacity indicates that, we want to start with something after element is being added to an null array.

Let’s implement add method:

    public void Add(T item)
    {
        if (_buffer == null || _count >= _buffer.Length)
        {
            //todo: resize
        }

        _buffer[_count++] = item;
        _capacity++;
    }

This is straightforward - if a _buffer is null or overflows We want to resize, otherwise We just add the element and increment capacity counter.

    private void Resize(int minimum)
    {
        int newCapacity = Math.Max((_buffer?.Length ?? 0) == 0 ? DefaultCapacity : _buffer.Length * 2, minimum);

        var newArr = ArrayPool<T>.Shared.Rent(newCapacity);
        
        if (_buffer != null)
        {
            _buffers[_buffersIndex++] = _buffer;
        }

        _buffer = newArr;
        _count = 0;
    }

The Resize accepts minimum count to resize, due to possible extension of this struct by a AddRange method, but for now It will be always 1.
While We try to add a element to an null array It’s worth to create something using a DefaultCapacity. Otherwise It’s worth to select maximum from a current buffer’s length and minimum value.

Our array can be very big. The ArrayPool<T> is the best solution right now to use for this as We rent an array with a new calculated capacity.
Using a ArrayPool<T> enforces on us to return the memory after usage, so We need to keep it in mind. After this we store a current buffer (if exist) in a buffer storage, then we are assigning a new current buffer.

SmallBuffer
Existence of this method was sparked during browsing a ZLinq source code. It’s the same approach described here. The goal is to create a struct that behaves like an array.
If buffer length increases by 2, We can easily calculate the max buffer length here.

    public ref struct SmallBuffer<T> 
    //note 1: approach found here: https://www.jacksondunstan.com/articles/5051
    //note 2: increase number of elements to support more data
    {
        private T _item1;
        private T _item2;
        private T _item3;
        private T _item4;
        private T _item5;
        private T _item6;
        private T _item7;
        private T _item8;
        private T _item9;
        private T _item10;
        private T _item11;
        private T _item12;
        private T _item13;
        private T _item14;
        private T _item15;

        public T this[int index]
        {
            get
            {
                return index switch
                {
                    0 => _item1,
                    1 => _item2,
                    2 => _item3,
                    3 => _item4,
                    4 => _item5,
                    5 => _item6,
                    6 => _item7,
                    7 => _item8,
                    8 => _item9,
                    9 => _item10,
                    10 => _item11,
                    11 => _item12,
                    12 => _item13,
                    13 => _item14,
                    14 => _item15,
                    _ => throw new IndexOutOfRangeException()
                };
            }
            set
            {
                switch (index)
                {
                    case 0: _item1 = value; break;
                    case 1: _item2 = value; break;
                    case 2: _item3 = value; break;
                    case 3: _item4 = value; break;
                    case 4: _item5 = value; break;
                    case 5: _item6 = value; break;
                    case 6: _item7 = value; break;
                    case 7: _item8 = value; break;
                    case 8: _item9 = value; break;
                    case 9: _item10 = value; break;
                    case 10: _item11 = value; break;
                    case 11: _item12 = value; break;
                    case 12: _item13 = value; break;
                    case 13: _item14 = value; break;
                    case 14: _item15 = value; break;
                }
            }
        }
    }

So after implementing a SmallBuffer and partially a ArrayBuilder, we can prepare array returning methods.
First one - for a ArrayPool<T> is very straightforward. We just return all rented arrays.

    public void Clear()
    {
        if (_buffer != null)
        {
            ArrayPool<T>.Shared.Return(_buffer, RuntimeHelpers.IsReferenceOrContainsReferences<T>());
        }

        for (int i = 0; i < _buffersIndex; i++)
        {
            ArrayPool<T>.Shared.Return(_buffers[i], RuntimeHelpers.IsReferenceOrContainsReferences<T>());
        }
        
        _count = 0;
        _buffer = null;
        _buffers = new SmallBuffer<T[]>();
    }

Second one - for user - We have to iterate over every buffer and a current buffer to return all elements. It’s worth to use a Span<T> here

    public T[] ToArray()
    {
        var result = GC.AllocateUninitializedArray<T>(_capacity);
        var resultSpan = result.AsSpan();

        int nextIdx = 0;
        for (int i = 0; i < _buffersIndex; i++)
        {
            Span<T> buffer = _buffers[i].AsSpan();
            buffer.CopyTo(resultSpan.Slice(nextIdx, buffer.Length));
            
            nextIdx += buffer.Length;
        }

        _buffer.AsSpan().Slice(0, _capacity - nextIdx).CopyTo(resultSpan.Slice(nextIdx, _capacity - nextIdx));

        return result;
    }

Finally, our ArrayBuilder implementation should look like this:

    public ref struct ArrayBuilder<T> 
    {
        private const int DefaultCapacity = 5;
        private T[] _buffer;
        private SmallBuffer<T[]> _buffers;
        private int _count;
        private int _buffersIndex;
        private int _capacity;
        
        public ArrayBuilder()
        {
            _buffers = new SmallBuffer<T[]>();
            _count = 0;
            _buffersIndex = 0;
        }
        
        
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public T[] ToArray()
        {
            var result = GC.AllocateUninitializedArray<T>(_capacity);
            var resultSpan = result.AsSpan();

            int nextIdx = 0;
            for (int i = 0; i < _buffersIndex; i++)
            {
                Span<T> buffer = _buffers[i].AsSpan();
                buffer.CopyTo(resultSpan.Slice(nextIdx, buffer.Length));
                
                nextIdx += buffer.Length;
            }

            _buffer.AsSpan().Slice(0, _capacity - nextIdx).CopyTo(resultSpan.Slice(nextIdx, _capacity - nextIdx));

            return result;
        }
        
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public void Add(T item)
        {
            if (_buffer == null || _count >= _buffer.Length)
            {
                Resize(_count + 1);
            }

            _buffer[_count++] = item;
            _capacity++;
        }
        
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public void Clear()
        {
            if (_buffer != null)
            {
                ArrayPool<T>.Shared.Return(_buffer, RuntimeHelpers.IsReferenceOrContainsReferences<T>());
            }

            for (int i = 0; i < _buffersIndex; i++)
            {
                ArrayPool<T>.Shared.Return(_buffers[i], RuntimeHelpers.IsReferenceOrContainsReferences<T>());
            }
            
            _count = 0;
            _buffer = null;
            _buffers = new SmallBuffer<T[]>();
        }
        
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        private void Resize(int minimum)
        {
            int newCapacity = Math.Max((_buffer?.Length ?? 0) == 0 ? DefaultCapacity : _buffer.Length * 2, minimum);

            var newArr = ArrayPool<T>.Shared.Rent(newCapacity);
            
            if (_buffer != null)
            {
                _buffers[_buffersIndex++] = _buffer;
            }

            _buffer = newArr;
            _count = 0;
        }
    }

After completing all steps above We can implement method to return an array from filter operation

    private static T[] FromArrayBuilder<TEnumerator, T>(this TEnumerator enumerator)
        where TEnumerator : IStructEnumerator<T>, allows ref struct
    {
        T[] result = null;

        var arr = new ArrayBuilder<T>();

        T? current = default;
        int idx = 0;
        while (enumerator.Next(ref current))
        {
            arr.Add(current);
        }

        result = arr.ToArray();

        arr.Clear();

        return result;
    }

And our final method will look like this:

    public static T[] ToArray<TEnumerator, T>(this StructEnumerable<TEnumerator, T> enumerable)
        where TEnumerator : struct, IStructEnumerator<T>, allows ref struct
    {
        var enumerator = enumerable.Enumerator;
        T[] result = null;
        ReadOnlySpan<T> span = default;
        
        if (enumerator.GetUnderlying(ref span))
        {
            return span.ToArray();
        }
        
        if (enumerator.GetCountToLeftEnumerate(out int count))
        {
            result = GC.AllocateUninitializedArray<T>(count);

            T current = default;
            int idx = 0;

            while (enumerator.Next(ref current))
            {
                result[idx++] = current;
            }

            return result;
        }

        return enumerable.Enumerator.FromArrayBuilder<TEnumerator, T>();
    }

Note! Above implementation was used in all previous benchmarks.

Source code can be found here.

All benchmarks were created with BenchmarkDotNet.