19 minute read

Have you ever considered to go into the rabbit hole and look what’s inside LINQ? No? You are probably sane.
Unfortunately, one day I’ve asked myself some questions about internals of LINQ - How it was built, and if I could i make faster?
In previous years of work and education - I was told about ‘slowness’ of LINQ, and to try to avoid it where performance matters.
So, I decided to take a closer look inside. To make the task harder, I want to use as little as possible allocations to heap.
There is a library that realizes the same approach - ZLinq, so it’s probably a good starting point aside of Linq’s source code.

Enumerators, Iterators and structs

IEnumerator<T> is an interface exposed through IEnumerable.GetEnumerator() and allows us to iterate through the collection.
This interface (in their generic version) inherit concrete IEnumerator, exposing property, and two methods - T Current (object Current for IEnumerator), bool MoveNext(), void Reset().
When we use foreach-loop in our code over a collection, it’s get translated to IEnumerable and IEnumerator.
Let’s look:

    //Code written by us

    var people = new List<Person>()
    {
        new Person() { Name = "Bob", Age = 25 },
        new Person() { Name = "Jane", Age = 23 },
        new Person() { Name = "John", Age = 21 }
    };

    foreach (Person person in people)
    {
        Console.WriteLine(person.Name);
    }
    //Generated

    List<Person> personList = new List<Person>();
    Person person1 = new Person();
    person1.Name = "Bob";
    person1.Age = 25;
    personList.Add(person1);
    Person person2 = new Person();
    person2.Name = "Jane";
    person2.Age = 23;
    personList.Add(person2);
    Person person3 = new Person();
    person3.Name = "John";
    person3.Age = 21;
    personList.Add(person3);
    List<Person>.Enumerator enumerator = personList.GetEnumerator();
    try
    {
      while (enumerator.MoveNext())
      {
        Person person4 = enumerator.Current;
        DefaultInterpolatedStringHandler interpolatedStringHandler = new DefaultInterpolatedStringHandler(3, 2);
        interpolatedStringHandler.AppendFormatted(person4.Name);
        interpolatedStringHandler.AppendLiteral(" - ");
        interpolatedStringHandler.AppendFormatted<int>(person4.Age);
        Console.WriteLine(interpolatedStringHandler.ToStringAndClear());
      }
    }
    finally
    {
      enumerator.Dispose();
    }

Why is that?
List<T> implements IList<T>, which inherits IEnumerable<T>. We know right now that IEnumerable<T> exposes IEnumerator, so after calling foreach (Person person in people), we also called the constructor of the Enumerator struct.
To go through each item in a list, you need to use the bool MoveNext() method, until it returns false.

Why to use a struct here, rather than a class or a record?
The most important is the fact, that structure type was created to encapsulate data and related functionality. Enumerator fulfills this idea right on the spot.
Secondly, structs are stored on stack most of the time (until They are part of other object e.g class, or They are static field)
Lastly - they are smaller and can be passed by value (by default, as they are value-type) and reference.

Okay, so where is Iterator?
This is the point where System.Linq steps in.
Writing code using Linq’s methods is like building a chain. It’s like a domino effect.
When You write:

    new int[] { 1, 2, 3 }.Select(x => (x * 2).ToString()).ToArray();

it gets casted to (IEnumerable<int>) then Select extension method creates a new ArraySelectIterator<int, string>.
The ArraySelectIterator inherits from generic Iterator<TResult> (It’s an abstract class!). Value of a Iterator is immutable,
It means that represented operation cannot be changed. Iterator serves as its own enumerator, so the state of the iterator may change as it is being enumerated.
However, if you make something like this:

    new int[] { 1, 2, 3, 4, 5, 5, 6, 7, 10 }.Select(x => (x * 2).ToString())
    .Where(x => x[0] == '1').ToArray();

You’ll end up casting (again!) to (IEnumerable<int>) then invoke the new ArraySelectIterator<int, string>, and finally (thanks to the .Where(x => x[0] == '1') method) the new IEnumerableWhereIterator<string>.

Can you access it like a normal array?
No, To do this, you have to call the .ToArray() extension method (or .ToList(), .ToDictionary(), etc.). The .ToArray() executes a whole chain, by using the bool MoveNext() method of enumerators (Where calls Select calls source (IEnumerable<T>)). Inside we can found struct named SegmentedArrayBuilder<T>.
The array builder is being allocated on the stack, and operates on segments - when we add an element it checks if _currentSegment is long enough to add (if yes, then just adds under next index).
If _currentSegment is not long enough it calculates new size of a new segment length (currentSegmentLength * 2), then rents an array from ArrayPool<T> and allocates it to _currentSegment and the _segments array. At the end _segmentsCount is incremented to keep track of all allocated segments in _segments array.
So, when you call the .ToArray() it has to iterate over all segments and copy data from them.

After acquiring all informations from above, We can image that this code:

    var rand = new Random();
    var listOfInts = new List<NestedInts>()
    {
        new NestedInts(),
        new NestedInts(),
        new NestedInts()
    };

    List<int> result = new List<int>();

    foreach (NestedInts nestedInt in listOfInts)
    {
        var example = nestedInt.Array.Where(x => x > rand.Next(0, 10)).ToArray();
        result.AddRange(example);
    }

    class NestedInts
    {
        public int[] Array { get; set; } = new []{ 1,2,3,4,5,6,7,8,9,10};
    }

will create IEnumerator<NestedInts> then three times ArrayWhereIterator<int> and SegmentedArrayBuilder<int>.
Lots of allocations, iterations that could be simplified ;).

Simple enumerator

Let’s start writing code. For my requirements I will skip creating IEnumerable-like structure, and jump straight to IEnumerator-like beings.
Like in IEnumerator<T> We want to move to next element, get source and count, then try to copy source. To fulfill this assumptions we have to create interface like this:

public interface IStructEnumerator<T> : IDisposable
{
    bool Next(ref T current);

    bool GetCountToLeftEnumerate(out int count);

    bool GetUnderlying(ref ReadOnlySpan<T> span);

    bool TryCopy(ref Span<T> destination, int offset);
}

The reason behind ref, is that We don’t know the type of value, so I want to use it by reference (the downside of this approach is accidental change of source’s element).
IDisposable will help us to free unnecessary resources.

Implementation:

    public struct StructEnumerator<T> : IStructEnumerator<T>
    {
        private readonly T[] _array;
        private int _index;
        private readonly int _count;

        public StructEnumerator(T[] array)
        {
            _array = array ?? throw new ArgumentNullException(nameof(array));
            _count = array.Length;
            _index = -1;
        }
        
        public void Dispose()
        {
            _index = -1;
        }

        public bool Next(ref T current)
        {
            if (_index < _count - 1)
            {
                current = _array[++_index];
                return true;
            }

            current = default(T);
            return false;
        }

        public bool GetCountToLeftEnumerate(out int count)
        {
            if (_index < 0)
            {
                count = _count;

                return true;
            }
            
            if(_index >= _count)
            {
                count = 0;
                return false;
            }
            
            count = _count - _index - 1;
            return true;
        }

        public bool GetUnderlying(ref ReadOnlySpan<T> span)
        {
            if(_array == null || _count == 0)
            {
                span = default(ReadOnlySpan<T>);
                return false;
            }
            
            span = _array.AsSpan();

            return true;
        }

        public bool TryCopy(ref Span<T> destination, int offset)
        {
            ReadOnlySpan<T> source = default;
            if (GetUnderlying(ref source))
            {
                if (offset < 0 || offset >= source.Length)
                {
                    return false;
                }
                ReadOnlySpan<T> slice = source.Slice(offset);

                for (int i = 0; i < slice.Length; i++)
                {
                    destination[i] = slice[i];
                }

                return true;
            }

            return false;
        }
    }

A lot happening here, so let’s break it into smaller parts:

        private readonly T[] _array;
        private int _index;
        private readonly int _count;

        public StructEnumerator(T[] array)
        {
            _array = array ?? throw new ArgumentNullException(nameof(array));
            _count = array.Length;
            _index = -1;
        }

Making array readonly helps us with accidental change of underlying array.
_count introduces trackability of max length of collection.

Let’s look at TryCopy and GetUnderlying

    public bool GetUnderlying(ref ReadOnlySpan<T> span)
    {
        if(_array == null || _count == 0)
        {
            span = default(ReadOnlySpan<T>);
            return false;
        }
        
        span = _array.AsSpan();

        return true;
    }

    public bool TryCopy(ref Span<T> destination, int offset)
    {
        ReadOnlySpan<T> source = default;
        if (GetUnderlying(ref source))
        {
            if (offset < 0 || offset >= source.Length)
            {
                return false;
            }
            ReadOnlySpan<T> slice = source.Slice(offset);

            for (int i = 0; i < slice.Length; i++)
            {
                destination[i] = slice[i];
            }

            return true;
        }

        return false;
    }

Both of them are operating on Span<T>, which is faster than operating on an array. Thanks to this approach, possible operations in a chain will cost us less resources.
For TryCopy we want to introduce copying part of source for the future.

Select

After implementing our first link in chain, Let’s move to implementing first projection operation - Select. We should keep availability to transforming one object into a new form as our main assumption. That means every Next element we should retrieve a new form of it, based on external function.
It’s time to look at implementation:

    public struct InternalSelect<TEnumerator, TIn, TOut> : IStructEnumerator<TOut>
        where TEnumerator : IStructEnumerator<TIn>
    {
        private TEnumerator _source;
        private readonly Func<TIn, TOut> _selector;

        public InternalSelect(TEnumerator source, Func<TIn, TOut> selector)
        {
            _source = source;
            _selector = selector;
        }

        public void Dispose()
        {
            _source.Dispose(); 
        }

        public bool Next(ref TOut current)
        {
            TIn? fromSource = default(TIn);
            if(_source.Next(ref fromSource))
            {
                current = _selector(fromSource);
                return true;
            }
            
            current = default(TOut);
            return false;
        }

        public bool GetCountToLeftEnumerate(out int count)
        {
            if (_source.GetCountToLeftEnumerate(out int sourceCount))
            {
                count = sourceCount;

                return true;
            }
            
            count = 0;
            return false;
        }

        public bool GetUnderlying(ref ReadOnlySpan<TOut> span)
        {
            span = default;

            return false;
        }

        public bool TryCopy(ref Span<TOut> destination, int offset) => false;
    }

The main difference between StructEnumerator is the fact, that our _source is not an array, but other IStructEnumerator. To achieve it We will use three reference types TEnumerator, TIn and TOut to point at incoming IStructEnumerator, type to change, and lastly type to be changed to.
We accept incoming Func<TIn, TOut> in constructor, and store is in a backing field as readonly to avoid any changes.

public struct InternalSelect<TEnumerator, TIn, TOut> : IStructEnumerator<TOut>
    where TEnumerator : IStructEnumerator<TIn>
{
    private TEnumerator _source;
    private readonly Func<TIn, TOut> _selector;

    public InternalSelect(TEnumerator source, Func<TIn, TOut> selector)
    {
        _source = source;
        _selector = selector;
    }

Out Next will be similar to StructEnumerator, but a current element will be affected by a _selector

    public bool Next(ref TOut current)
    {
        TIn? fromSource = default(TIn);
        if(_source.Next(ref fromSource))
        {
            current = _selector(fromSource);
            return true;
        }
        
        current = default(TOut);
        return false;
    }

The main difference will be for two methods: GetUnderlying and TryCopy.
As We don’t accept _source as an array, We can’t get source’s array and copy of it. The reason behind this is that We can’t predict previous or next elements in our enumerator chain.

    public bool GetUnderlying(ref ReadOnlySpan<TOut> span)
    {
        span = default;

        return false;
    }

    public bool TryCopy(ref Span<TOut> destination, int offset) => false;

Lastly, It’s time to decorate our InternalSelect with extension methods, to accept any array and IStructEnumerator.

    public static class SelectStruct
    {
        public static IStructEnumerator<TOut> StructSelect<TIn, TOut>(this TIn[] array, Func<TIn, TOut> selector)
        {
            return new StructEnumerator<TIn>(array).StructSelect(selector);
        }

        public static IStructEnumerator<TOut> StructSelect<TEnumerable, TIn, TOut>(this TEnumerable source, Func<TIn, TOut> selector)
            where TEnumerable : IStructEnumerator<TIn>
        {
            return new InternalSelect<TEnumerable, TIn, TOut>(source, selector);
        }
    }

Benchmark

To check performance of our method, I’ve implemented two classes Example and NewExample, and some benchmark methods.

Example class with enum:

    public class Example
    {
        public string ExampleString { get; set; }
        public int ExampleNumber { get; set; }
        public ExampleEnum ExampleEnum { get; set; }
    }

    public enum ExampleEnum {
        First,
        Second,
        Third
    }

NewExample class:

public class NewExample
{
    public string NewStringProp { get; set; }
    public int HalvedInt { get; set; }
}

Benchmark class:

    public class SelectBenchmark
    {
        public IEnumerable<Example[]> Data()
        {
            //yield return Generator.Generate(5_000);
            //yield return Generator.Generate(10_000);
            //yield return Generator.Generate(50_000);
            yield return Generator.Generate(500_000);
        }
        
        
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void LinqSelect(Example[] input) =>
            input.Select(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());
        
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void ParallelLinqSelect(Example[] input) =>
            input.AsParallel().Select(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());
        
        
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void SelectUsingEnumerator(Example[] input) =>
            input.SelectUsingEnumerator(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());
        
        // Our implementation!
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void NonAllocSelect(Example[] input)
        {
            input.StructSelect(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());

        }
        
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void ZLinqSelect(Example[] input)
        {
            input.AsValueEnumerable().Select(x => new NewExample()
            {
                HalvedInt = x.ExampleNumber / 2,
                NewStringProp = x.ExampleString
            }).ToArray().Consume(new Consumer());

        }
    }

Results

| Method                | input           | Mean     | Error    | StdDev   | Completed Work Items | Lock Contentions | Gen0      | Gen1      | Gen2     | Allocated |
|---------------------- |---------------- |---------:|---------:|---------:|---------------------:|-----------------:|----------:|----------:|---------:|----------:|
| LinqSelect            | Example[500000] | 32.21 ms | 0.639 ms | 1.320 ms |                    - |                - | 1156.2500 | 1125.0000 | 218.7500 |  19.07 MB |
| ParallelLinqSelect    | Example[500000] | 32.74 ms | 0.651 ms | 1.470 ms |              15.0000 |                - | 1250.0000 | 1187.5000 | 187.5000 |   27.1 MB |
| SelectUsingEnumerator | Example[500000] | 31.47 ms | 0.460 ms | 0.408 ms |                    - |                - | 1125.0000 | 1093.7500 | 187.5000 |  22.89 MB |
| NonAllocSelect        | Example[500000] | 27.76 ms | 0.458 ms | 0.406 ms |                    - |                - | 1156.2500 | 1125.0000 | 218.7500 |  19.07 MB |
| ZLinqSelect           | Example[500000] | 27.52 ms | 0.397 ms | 0.352 ms |                    - |                - | 1156.2500 | 1125.0000 | 218.7500 |  19.07 MB |

Where

It’s worth to filter data from time to time. In comparison to a InternalSelect we will use one reference type less, and use similar approach on the matter of filtering Func.
This time rather than using Func<TIn, TOut> we will focus on Func<TIn, bool>. That’s because We want to filter out elements that don’t match in a Next method.

    public struct InternalWhere<TEnumerator, TIn> : IStructEnumerator<TIn>
        where TEnumerator : IStructEnumerator<TIn>
    {
        internal TEnumerator _source;
        internal readonly Func<TIn, bool> _filter;

        public InternalWhere(TEnumerator source, Func<TIn, bool> filter)
        {
            _source = source;
            _filter = filter;
        }

        public void Dispose()
        {
            _source.Dispose(); 
        }

        public bool Next(ref TIn current)
        {
            TIn? fromSource = default(TIn);
            while(_source.Next(ref fromSource))
            {
                if (_filter(fromSource))
                {
                    current = fromSource;
                
                    return true;
                }
            }
            
            return false;
        }

        public bool GetCountToLeftEnumerate(out int count)
        {
            bool val = _source.GetCountToLeftEnumerate(out int enumeratorCount);
            
            count = enumeratorCount;
            return val;
        }

        public bool GetUnderlying(ref ReadOnlySpan<TIn> span)
        {
            span = default;

            return false;
        }

        public bool TryCopy(ref Span<TIn> destination, int offset) => false;
    }

The Next method will iterate _source until conditions specified in a Func<TIn, bool> will be met.
The reason behind a internal keyword for a _source and a _filter will be explained in the future.
Let’s implement extensions:

    public static class WhereStruct
    {
        public static InternalWhere<StructEnumerator<TIn>, TIn> StructWhere<TIn>(this TIn[] array, Func<TIn, bool> filter)
        {
            return new StructEnumerator<TIn>(array).StructWhere(filter);
        }

        private static InternalWhere<TEnumerable, TIn> StructWhere<TIn, TEnumerable>(this TEnumerable source, Func<TIn, bool> filter)
            where TEnumerable : IStructEnumerator<TIn>
        {
            return new InternalWhere<TEnumerable, TIn>(source, filter);
        }
    }

Benchmark

We will use the same classes as We did in Select.
Benchmark class:

    [MemoryDiagnoser]
    [ThreadingDiagnoser]
    public class WhereBenchmark
    {
        public IEnumerable<Example[]> Data()
        {
            //yield return Generator.Generate(5_000);
            //yield return Generator.Generate(10_000);
            //yield return Generator.Generate(50_000);
            yield return Generator.Generate(500_000);
        }


        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void LinqWhere(Example[] input) =>
            input.Where(x => x.ExampleNumber % 2 == 0)
                .ToArray()
                .Consume(new Consumer());

        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void ParallelLinqWhere(Example[] input) =>
            input.AsParallel().Where(x => x.ExampleNumber % 2 == 0)
                .ToArray().Consume(new Consumer());

        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void WhereUsingEnumerator(Example[] input) =>
            input.WhereUsingEnumerator(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());

        //Our implementation!
        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void NonAllocWhere(Example[] input)
        {
            input.StructWhere(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());
        }

        [Benchmark]
        [ArgumentsSource(nameof(Data))]
        public void ZLinqWhere(Example[] input)
        {
            input.AsValueEnumerable().Where(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());
        }
    }

Results

| Method               | input           | Mean      | Error     | StdDev    | Completed Work Items | Lock Contentions | Gen0     | Gen1     | Gen2     | Allocated |
|--------------------- |---------------- |----------:|----------:|----------:|---------------------:|-----------------:|---------:|---------:|---------:|----------:|
| LinqWhere            | Example[500000] |  6.328 ms | 0.1251 ms | 0.1754 ms |                    - |                - |  31.2500 |  31.2500 |  31.2500 |   1.92 MB |
| ParallelLinqWhere    | Example[500000] |  3.933 ms | 0.0767 ms | 0.1531 ms |              15.0000 |                - | 195.3125 | 125.0000 |  70.3125 |   5.92 MB |
| WhereUsingEnumerator | Example[500000] | 11.603 ms | 0.3069 ms | 0.9050 ms |                    - |                - | 140.6250 | 140.6250 | 140.6250 |   7.63 MB |
| NonAllocWhere        | Example[500000] |  5.476 ms | 0.2115 ms | 0.6237 ms |                    - |                - |  31.2500 |  31.2500 |  31.2500 |   1.92 MB |
| ZLinqWhere           | Example[500000] |  5.726 ms | 0.1136 ms | 0.1166 ms |                    - |                - |  31.2500 |  31.2500 |  31.2500 |   1.92 MB |

ToArray

We implemented beginning, middle, but let’s end our chain of enumerators.
Extension methods are perfect use for this, right now We need two - one for generic IStructEnumerator and second for filtering explicitly.

Generic

It’s time to look at generic implementation:

    public static T[] ToArray<T>(this IStructEnumerator<T> enumerator)
    {
        T[] result = null;

        if (enumerator.GetCountToLeftEnumerate(out int count))
        {
            result = GC.AllocateUninitializedArray<T>(count);
            
            T current = default;
            int idx = 0;
            
            while (enumerator.Next(ref current))
            {
                result[idx++] = current;
            }
        }

        return result;
    }

We assume that chain can access to enumeration count, then it calls last enumerator’s method in chain - Next and adds element to array.
Why GC.AllocateUninitializedArray<T>() rather than new T[]?
Again, the first reason is performance, the second one, We don’t have to care if array was zeroed during creation.

Where-explicit (for now!)

Generic implementation can’t fit into filtering operations made by Where.
The main issue here is unpredicability of element’s count in target array.
Before executing selection operations We can safely assume that count of element at the begining and the end will be the same.
To fix this let’s introduce ArrayBuilder<T>, based on approach introduced in SegmentedArrayBuilder and found in ZLinq’s .ToArray.

ArrayBuilder

Our ArrayBuilder must consist of:

  • Storage of buffers
  • Current buffer
  • Current buffer’s index
  • Current buffer element count
  • All items count

Gathering all assumptions like this we can start implementing.
Right now we should declare a ref struct ArrayBuilder<T> due to the fact that We want it to be as fast as possible.

    public ref struct ArrayBuilder<T> 
    {
        private const int DefaultCapacity = 5;
        private T[] _buffer;
        private SmallBuffer<T[]> _buffers;
        private int _count;
        private int _buffersIndex;
        private int _capacity;
        
        public ArrayBuilder()
        {
            _buffers = new SmallBuffer<T[]>();
            _count = 0;
            _buffersIndex = 0;
        }
    }

DefaultCapacity indicates that, we want to start with something after element is being added to an null array.

Let’s implement add method:

    public void Add(T item)
    {
        if (_buffer == null || _count >= _buffer.Length)
        {
            //todo: resize
        }

        _buffer[_count++] = item;
        _capacity++;
    }

This is straightforward - if a _buffer is null or overflows We want to resize, otherwise We just add the element and increment capacity counter.

    private void Resize(int minimum)
    {
        int newCapacity = Math.Max((_buffer?.Length ?? 0) == 0 ? DefaultCapacity : _buffer.Length * 2, minimum);

        var newArr = ArrayPool<T>.Shared.Rent(newCapacity);
        
        if (_buffer != null)
        {
            _buffers[_buffersIndex++] = _buffer;
        }

        _buffer = newArr;
        _count = 0;
    }

The Resize accepts minimum count to resize, due to possible extension of this struct by a AddRange method, but for now It will be always 1.
While We try to add a element to an null array It’s worth to create something using a DefaultCapacity. Otherwise It’s worth to select maximum from a current buffer’s length and minimum value.

Our array can be very big. The ArrayPool<T> is the best solution right now to use for this as We rent an array with a new calculated capacity.
Using a ArrayPool<T> enforces on us to return the memory after usage, so We need to keep it in mind. After this we store a current buffer (if exist) in a buffer storage, then we are assigning a new current buffer.

SmallBuffer
Existence of this method was sparked during browsing a ZLinq source code. It’s the same approach described here. The goal is to create a struct that behaves like an array.
If buffer length increases by 2, We can easily calculate the max buffer length here.

    public ref struct SmallBuffer<T> 
    //note 1: approach found here: https://www.jacksondunstan.com/articles/5051
    //note 2: increase number of elements to support more data
    {
        private T _item1;
        private T _item2;
        private T _item3;
        private T _item4;
        private T _item5;
        private T _item6;
        private T _item7;
        private T _item8;
        private T _item9;
        private T _item10;
        private T _item11;
        private T _item12;
        private T _item13;

        public T this[int index]
        {
            get
            {
                return index switch
                {
                    0 => _item1,
                    1 => _item2,
                    2 => _item3,
                    3 => _item4,
                    4 => _item5,
                    5 => _item6,
                    6 => _item7,
                    7 => _item8,
                    8 => _item9,
                    9 => _item10,
                    10 => _item11,
                    11 => _item12,
                    12 => _item13,
                    _ => throw new IndexOutOfRangeException()
                };
            }
            set
            {
                switch (index)
                {
                    case 0: _item1 = value; break;
                    case 1: _item2 = value; break;
                    case 2: _item3 = value; break;
                    case 3: _item4 = value; break;
                    case 4: _item5 = value; break;
                    case 5: _item6 = value; break;
                    case 6: _item7 = value; break;
                    case 7: _item8 = value; break;
                    case 8: _item9 = value; break;
                    case 9: _item10 = value; break;
                    case 10: _item11 = value; break;
                    case 11: _item12 = value; break;
                    case 12: _item13 = value; break;
                }
            }
        }
    }

So after implementing a SmallBuffer and partially a ArrayBuilder, we can prepare array returning methods.
First one - for a ArrayPool<T> is very straighforward. We just return all rented arrays.

    public void Clear()
    {
        if (_buffer != null)
        {
            ArrayPool<T>.Shared.Return(_buffer, RuntimeHelpers.IsReferenceOrContainsReferences<T>());
        }

        for (int i = 0; i < _buffersIndex; i++)
        {
            ArrayPool<T>.Shared.Return(_buffers[i], RuntimeHelpers.IsReferenceOrContainsReferences<T>());
        }
        
        _count = 0;
        _buffer = null;
        _buffers = new SmallBuffer<T[]>();
    }

Second one - for user - We have to iterate over every buffer and a current buffer to return all elements. It’s worth to use a Span<T> here

    public T[] ToArray()
    {
        var result = GC.AllocateUninitializedArray<T>(_capacity);
        var resultSpan = result.AsSpan();

        int nextIdx = 0;
        for (int i = 0; i < _buffersIndex; i++)
        {
            Span<T> buffer = _buffers[i].AsSpan();
            buffer.CopyTo(resultSpan.Slice(nextIdx, buffer.Length));
            
            nextIdx += buffer.Length;
        }

        _buffer.AsSpan().Slice(0, _capacity - nextIdx).CopyTo(resultSpan.Slice(nextIdx, _capacity - nextIdx));

        return result;
    }

Finally, our ArrayBuilder implementation should look like this:

    public ref struct ArrayBuilder<T> 
    {
        private const int DefaultCapacity = 5;
        private T[] _buffer;
        private SmallBuffer<T[]> _buffers;
        private int _count;
        private int _buffersIndex;
        private int _capacity;
        
        public ArrayBuilder()
        {
            _buffers = new SmallBuffer<T[]>();
            _count = 0;
            _buffersIndex = 0;
        }
        
        
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public T[] ToArray()
        {
            var result = GC.AllocateUninitializedArray<T>(_capacity);
            var resultSpan = result.AsSpan();

            int nextIdx = 0;
            for (int i = 0; i < _buffersIndex; i++)
            {
                Span<T> buffer = _buffers[i].AsSpan();
                buffer.CopyTo(resultSpan.Slice(nextIdx, buffer.Length));
                
                nextIdx += buffer.Length;
            }

            _buffer.AsSpan().Slice(0, _capacity - nextIdx).CopyTo(resultSpan.Slice(nextIdx, _capacity - nextIdx));

            return result;
        }
        
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public void Add(T item)
        {
            if (_buffer == null || _count >= _buffer.Length)
            {
                Resize(_count + 1);
            }

            _buffer[_count++] = item;
            _capacity++;
        }
        
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public void Clear()
        {
            if (_buffer != null)
            {
                ArrayPool<T>.Shared.Return(_buffer, RuntimeHelpers.IsReferenceOrContainsReferences<T>());
            }

            for (int i = 0; i < _buffersIndex; i++)
            {
                ArrayPool<T>.Shared.Return(_buffers[i], RuntimeHelpers.IsReferenceOrContainsReferences<T>());
            }
            
            _count = 0;
            _buffer = null;
            _buffers = new SmallBuffer<T[]>();
        }
        
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        private void Resize(int minimum)
        {
            int newCapacity = Math.Max((_buffer?.Length ?? 0) == 0 ? DefaultCapacity : _buffer.Length * 2, minimum);

            var newArr = ArrayPool<T>.Shared.Rent(newCapacity);
            
            if (_buffer != null)
            {
                _buffers[_buffersIndex++] = _buffer;
            }

            _buffer = newArr;
            _count = 0;
        }
    }

After completing all steps above We can implement method to return an array from filter operation

    public static T[] ToArray<TEnumerator, T>(this InternalWhere<TEnumerator, T> enumerator)
        where TEnumerator : struct, IStructEnumerator<T>
    {
        T[] result = null;

        var arr = new ArrayBuilder<T>();
        
        T? current = default;
        int idx = 0;
        while (enumerator.Next(ref current))
        {
            arr.Add(current);
        }

        result = arr.ToArray();
        
        arr.Clear();

        return result;
    }

Note! Above implementation was used in all previous benchmarks.

Source code can be found here.

All benchmarks were created with BenchmarkDotNet.