Lost in Linq source code - Part 1
Have you ever considered to go into the rabbit hole and look what’s inside LINQ? No? You are probably sane.
Unfortunately, one day I’ve asked myself some questions about internals of LINQ - How it was built, and if I could i make faster?
In previous years of work and education - I was told about ‘slowness’ of LINQ, and to try to avoid it where performance matters.
So, I decided to take a closer look inside. To make the task harder, I want to use as little as possible allocations to heap.
There is a library that realizes the same approach - ZLinq, so it’s probably a good starting point aside of Linq’s source code.
Enumerators, Iterators and structs
IEnumerator<T>
is an interface exposed through IEnumerable.GetEnumerator()
and allows us to iterate through the collection.
This interface (in their generic version) inherit concrete IEnumerator, exposing property, and two methods - T Current
(object Current
for IEnumerator
), bool MoveNext()
, void Reset()
.
When we use foreach
-loop in our code over a collection, it’s get translated to IEnumerable
and IEnumerator
.
Let’s look:
//Code written by us
var people = new List<Person>()
{
new Person() { Name = "Bob", Age = 25 },
new Person() { Name = "Jane", Age = 23 },
new Person() { Name = "John", Age = 21 }
};
foreach (Person person in people)
{
Console.WriteLine(person.Name);
}
//Generated
List<Person> personList = new List<Person>();
Person person1 = new Person();
person1.Name = "Bob";
person1.Age = 25;
personList.Add(person1);
Person person2 = new Person();
person2.Name = "Jane";
person2.Age = 23;
personList.Add(person2);
Person person3 = new Person();
person3.Name = "John";
person3.Age = 21;
personList.Add(person3);
List<Person>.Enumerator enumerator = personList.GetEnumerator();
try
{
while (enumerator.MoveNext())
{
Person person4 = enumerator.Current;
DefaultInterpolatedStringHandler interpolatedStringHandler = new DefaultInterpolatedStringHandler(3, 2);
interpolatedStringHandler.AppendFormatted(person4.Name);
interpolatedStringHandler.AppendLiteral(" - ");
interpolatedStringHandler.AppendFormatted<int>(person4.Age);
Console.WriteLine(interpolatedStringHandler.ToStringAndClear());
}
}
finally
{
enumerator.Dispose();
}
Why is that?
List<T>
implements IList<T>
, which inherits IEnumerable<T>
. We know right now that IEnumerable<T>
exposes IEnumerator
, so after calling foreach (Person person in people)
, we also called the constructor of the Enumerator
struct.
To go through each item in a list, you need to use the bool MoveNext()
method, until it returns false
.
Why to use a struct
here, rather than a class
or a record
?
The most important is the fact, that structure type was created to encapsulate data and related functionality. Enumerator
fulfills this idea right on the spot.
Secondly, structs are stored on stack most of the time (until They are part of other object e.g class, or They are static field)
Lastly - they are smaller and can be passed by value (by default, as they are value-type) and reference.
Okay, so where is Iterator?
This is the point where System.Linq
steps in.
Writing code using Linq’s methods is like building a chain. It’s like a domino effect.
When You write:
new int[] { 1, 2, 3 }.Select(x => (x * 2).ToString()).ToArray();
it gets casted to (IEnumerable<int>)
then Select
extension method creates a new ArraySelectIterator<int, string>
.
The ArraySelectIterator
inherits from generic Iterator<TResult>
(It’s an abstract class!). Value of a Iterator
is immutable,
It means that represented operation cannot be changed. Iterator serves as its own enumerator, so the state of the iterator may change as it is being enumerated.
However, if you make something like this:
new int[] { 1, 2, 3, 4, 5, 5, 6, 7, 10 }.Select(x => (x * 2).ToString())
.Where(x => x[0] == '1').ToArray();
You’ll end up casting (again!) to (IEnumerable<int>)
then invoke the new ArraySelectIterator<int, string>
, and finally (thanks to the .Where(x => x[0] == '1')
method) the new IEnumerableWhereIterator<string>
.
Can you access it like a normal array?
No, To do this, you have to call the .ToArray()
extension method (or .ToList()
, .ToDictionary()
, etc.). The .ToArray()
executes a whole chain, by using the bool MoveNext()
method of enumerators (Where
calls Select
calls source (IEnumerable<T>)
). Inside we can found struct
named SegmentedArrayBuilder<T>
.
The array builder is being allocated on the stack, and operates on segments - when we add an element it checks if _currentSegment
is long enough to add (if yes, then just adds under next index).
If _currentSegment
is not long enough it calculates new size of a new segment length (currentSegmentLength * 2
), then rents an array from ArrayPool<T>
and allocates it to _currentSegment
and the _segments
array. At the end _segmentsCount
is incremented to keep track of all allocated segments in _segments
array.
So, when you call the .ToArray()
it has to iterate over all segments and copy data from them.
After acquiring all informations from above, We can image that this code:
var rand = new Random();
var listOfInts = new List<NestedInts>()
{
new NestedInts(),
new NestedInts(),
new NestedInts()
};
List<int> result = new List<int>();
foreach (NestedInts nestedInt in listOfInts)
{
var example = nestedInt.Array.Where(x => x > rand.Next(0, 10)).ToArray();
result.AddRange(example);
}
class NestedInts
{
public int[] Array { get; set; } = new []{ 1,2,3,4,5,6,7,8,9,10};
}
will create IEnumerator<NestedInts>
then three times ArrayWhereIterator<int>
and SegmentedArrayBuilder<int>
.
Lots of allocations, iterations that could be simplified ;).
Simple enumerator
Let’s start writing code. For my requirements I will skip creating IEnumerable
-like structure, and jump straight to IEnumerator
-like beings.
Like in IEnumerator<T>
We want to move to next element, get source and count, then try to copy source. To fulfill this assumptions we have to create interface like this:
public interface IStructEnumerator<T> : IDisposable
{
bool Next(ref T current);
bool GetCountToLeftEnumerate(out int count);
bool GetUnderlying(ref ReadOnlySpan<T> span);
bool TryCopy(ref Span<T> destination, int offset);
}
The reason behind ref
, is that We don’t know the type of value, so I want to use it by reference (the downside of this approach is accidental change of source’s element).
IDisposable
will help us to free unnecessary resources.
Implementation:
public struct StructEnumerator<T> : IStructEnumerator<T>
{
private readonly T[] _array;
private int _index;
private readonly int _count;
public StructEnumerator(T[] array)
{
_array = array ?? throw new ArgumentNullException(nameof(array));
_count = array.Length;
_index = -1;
}
public void Dispose()
{
_index = -1;
}
public bool Next(ref T current)
{
if (_index < _count - 1)
{
current = _array[++_index];
return true;
}
current = default(T);
return false;
}
public bool GetCountToLeftEnumerate(out int count)
{
if (_index < 0)
{
count = _count;
return true;
}
if(_index >= _count)
{
count = 0;
return false;
}
count = _count - _index - 1;
return true;
}
public bool GetUnderlying(ref ReadOnlySpan<T> span)
{
if(_array == null || _count == 0)
{
span = default(ReadOnlySpan<T>);
return false;
}
span = _array.AsSpan();
return true;
}
public bool TryCopy(ref Span<T> destination, int offset)
{
ReadOnlySpan<T> source = default;
if (GetUnderlying(ref source))
{
if (offset < 0 || offset >= source.Length)
{
return false;
}
ReadOnlySpan<T> slice = source.Slice(offset);
for (int i = 0; i < slice.Length; i++)
{
destination[i] = slice[i];
}
return true;
}
return false;
}
}
A lot happening here, so let’s break it into smaller parts:
private readonly T[] _array;
private int _index;
private readonly int _count;
public StructEnumerator(T[] array)
{
_array = array ?? throw new ArgumentNullException(nameof(array));
_count = array.Length;
_index = -1;
}
Making array readonly helps us with accidental change of underlying array.
_count
introduces trackability of max length of collection.
Let’s look at TryCopy
and GetUnderlying
public bool GetUnderlying(ref ReadOnlySpan<T> span)
{
if(_array == null || _count == 0)
{
span = default(ReadOnlySpan<T>);
return false;
}
span = _array.AsSpan();
return true;
}
public bool TryCopy(ref Span<T> destination, int offset)
{
ReadOnlySpan<T> source = default;
if (GetUnderlying(ref source))
{
if (offset < 0 || offset >= source.Length)
{
return false;
}
ReadOnlySpan<T> slice = source.Slice(offset);
for (int i = 0; i < slice.Length; i++)
{
destination[i] = slice[i];
}
return true;
}
return false;
}
Both of them are operating on Span<T>
, which is faster than operating on an array. Thanks to this approach, possible operations in a chain will cost us less resources.
For TryCopy
we want to introduce copying part of source for the future.
Select
After implementing our first link in chain, Let’s move to implementing first projection operation - Select
. We should keep availability to transforming one object into a new form as our main assumption. That means every Next
element we should retrieve a new form of it, based on external function.
It’s time to look at implementation:
public struct InternalSelect<TEnumerator, TIn, TOut> : IStructEnumerator<TOut>
where TEnumerator : IStructEnumerator<TIn>
{
private TEnumerator _source;
private readonly Func<TIn, TOut> _selector;
public InternalSelect(TEnumerator source, Func<TIn, TOut> selector)
{
_source = source;
_selector = selector;
}
public void Dispose()
{
_source.Dispose();
}
public bool Next(ref TOut current)
{
TIn? fromSource = default(TIn);
if(_source.Next(ref fromSource))
{
current = _selector(fromSource);
return true;
}
current = default(TOut);
return false;
}
public bool GetCountToLeftEnumerate(out int count)
{
if (_source.GetCountToLeftEnumerate(out int sourceCount))
{
count = sourceCount;
return true;
}
count = 0;
return false;
}
public bool GetUnderlying(ref ReadOnlySpan<TOut> span)
{
span = default;
return false;
}
public bool TryCopy(ref Span<TOut> destination, int offset) => false;
}
The main difference between StructEnumerator
is the fact, that our _source
is not an array, but other IStructEnumerator
. To achieve it We will use three reference types TEnumerator
, TIn
and TOut
to point at incoming IStructEnumerator
, type to change, and lastly type to be changed to.
We accept incoming Func<TIn, TOut>
in constructor, and store is in a backing field as readonly
to avoid any changes.
public struct InternalSelect<TEnumerator, TIn, TOut> : IStructEnumerator<TOut>
where TEnumerator : IStructEnumerator<TIn>
{
private TEnumerator _source;
private readonly Func<TIn, TOut> _selector;
public InternalSelect(TEnumerator source, Func<TIn, TOut> selector)
{
_source = source;
_selector = selector;
}
Out Next
will be similar to StructEnumerator
, but a current element will be affected by a _selector
public bool Next(ref TOut current)
{
TIn? fromSource = default(TIn);
if(_source.Next(ref fromSource))
{
current = _selector(fromSource);
return true;
}
current = default(TOut);
return false;
}
The main difference will be for two methods: GetUnderlying
and TryCopy
.
As We don’t accept _source
as an array, We can’t get source’s array and copy of it. The reason behind this is that We can’t predict previous or next elements in our enumerator chain.
public bool GetUnderlying(ref ReadOnlySpan<TOut> span)
{
span = default;
return false;
}
public bool TryCopy(ref Span<TOut> destination, int offset) => false;
Lastly, It’s time to decorate our InternalSelect
with extension methods, to accept any array and IStructEnumerator
.
public static class SelectStruct
{
public static IStructEnumerator<TOut> StructSelect<TIn, TOut>(this TIn[] array, Func<TIn, TOut> selector)
{
return new StructEnumerator<TIn>(array).StructSelect(selector);
}
public static IStructEnumerator<TOut> StructSelect<TEnumerable, TIn, TOut>(this TEnumerable source, Func<TIn, TOut> selector)
where TEnumerable : IStructEnumerator<TIn>
{
return new InternalSelect<TEnumerable, TIn, TOut>(source, selector);
}
}
Benchmark
To check performance of our method, I’ve implemented two classes Example
and NewExample
, and some benchmark methods.
Example class with enum:
public class Example
{
public string ExampleString { get; set; }
public int ExampleNumber { get; set; }
public ExampleEnum ExampleEnum { get; set; }
}
public enum ExampleEnum {
First,
Second,
Third
}
NewExample class:
public class NewExample
{
public string NewStringProp { get; set; }
public int HalvedInt { get; set; }
}
Benchmark class:
public class SelectBenchmark
{
public IEnumerable<Example[]> Data()
{
//yield return Generator.Generate(5_000);
//yield return Generator.Generate(10_000);
//yield return Generator.Generate(50_000);
yield return Generator.Generate(500_000);
}
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void LinqSelect(Example[] input) =>
input.Select(x => new NewExample()
{
HalvedInt = x.ExampleNumber / 2,
NewStringProp = x.ExampleString
}).ToArray().Consume(new Consumer());
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void ParallelLinqSelect(Example[] input) =>
input.AsParallel().Select(x => new NewExample()
{
HalvedInt = x.ExampleNumber / 2,
NewStringProp = x.ExampleString
}).ToArray().Consume(new Consumer());
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void SelectUsingEnumerator(Example[] input) =>
input.SelectUsingEnumerator(x => new NewExample()
{
HalvedInt = x.ExampleNumber / 2,
NewStringProp = x.ExampleString
}).ToArray().Consume(new Consumer());
// Our implementation!
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void NonAllocSelect(Example[] input)
{
input.StructSelect(x => new NewExample()
{
HalvedInt = x.ExampleNumber / 2,
NewStringProp = x.ExampleString
}).ToArray().Consume(new Consumer());
}
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void ZLinqSelect(Example[] input)
{
input.AsValueEnumerable().Select(x => new NewExample()
{
HalvedInt = x.ExampleNumber / 2,
NewStringProp = x.ExampleString
}).ToArray().Consume(new Consumer());
}
}
Results
| Method | input | Mean | Error | StdDev | Completed Work Items | Lock Contentions | Gen0 | Gen1 | Gen2 | Allocated |
|---------------------- |---------------- |---------:|---------:|---------:|---------------------:|-----------------:|----------:|----------:|---------:|----------:|
| LinqSelect | Example[500000] | 32.21 ms | 0.639 ms | 1.320 ms | - | - | 1156.2500 | 1125.0000 | 218.7500 | 19.07 MB |
| ParallelLinqSelect | Example[500000] | 32.74 ms | 0.651 ms | 1.470 ms | 15.0000 | - | 1250.0000 | 1187.5000 | 187.5000 | 27.1 MB |
| SelectUsingEnumerator | Example[500000] | 31.47 ms | 0.460 ms | 0.408 ms | - | - | 1125.0000 | 1093.7500 | 187.5000 | 22.89 MB |
| NonAllocSelect | Example[500000] | 27.76 ms | 0.458 ms | 0.406 ms | - | - | 1156.2500 | 1125.0000 | 218.7500 | 19.07 MB |
| ZLinqSelect | Example[500000] | 27.52 ms | 0.397 ms | 0.352 ms | - | - | 1156.2500 | 1125.0000 | 218.7500 | 19.07 MB |
Where
It’s worth to filter data from time to time. In comparison to a InternalSelect
we will use one reference type less, and use similar approach on the matter of filtering Func
.
This time rather than using Func<TIn, TOut>
we will focus on Func<TIn, bool>
. That’s because We want to filter out elements that don’t match in a Next
method.
public struct InternalWhere<TEnumerator, TIn> : IStructEnumerator<TIn>
where TEnumerator : IStructEnumerator<TIn>
{
internal TEnumerator _source;
internal readonly Func<TIn, bool> _filter;
public InternalWhere(TEnumerator source, Func<TIn, bool> filter)
{
_source = source;
_filter = filter;
}
public void Dispose()
{
_source.Dispose();
}
public bool Next(ref TIn current)
{
TIn? fromSource = default(TIn);
while(_source.Next(ref fromSource))
{
if (_filter(fromSource))
{
current = fromSource;
return true;
}
}
return false;
}
public bool GetCountToLeftEnumerate(out int count)
{
bool val = _source.GetCountToLeftEnumerate(out int enumeratorCount);
count = enumeratorCount;
return val;
}
public bool GetUnderlying(ref ReadOnlySpan<TIn> span)
{
span = default;
return false;
}
public bool TryCopy(ref Span<TIn> destination, int offset) => false;
}
The Next
method will iterate _source
until conditions specified in a Func<TIn, bool>
will be met.
The reason behind a internal
keyword for a _source
and a _filter
will be explained in the future.
Let’s implement extensions:
public static class WhereStruct
{
public static InternalWhere<StructEnumerator<TIn>, TIn> StructWhere<TIn>(this TIn[] array, Func<TIn, bool> filter)
{
return new StructEnumerator<TIn>(array).StructWhere(filter);
}
private static InternalWhere<TEnumerable, TIn> StructWhere<TIn, TEnumerable>(this TEnumerable source, Func<TIn, bool> filter)
where TEnumerable : IStructEnumerator<TIn>
{
return new InternalWhere<TEnumerable, TIn>(source, filter);
}
}
Benchmark
We will use the same classes as We did in Select.
Benchmark class:
[MemoryDiagnoser]
[ThreadingDiagnoser]
public class WhereBenchmark
{
public IEnumerable<Example[]> Data()
{
//yield return Generator.Generate(5_000);
//yield return Generator.Generate(10_000);
//yield return Generator.Generate(50_000);
yield return Generator.Generate(500_000);
}
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void LinqWhere(Example[] input) =>
input.Where(x => x.ExampleNumber % 2 == 0)
.ToArray()
.Consume(new Consumer());
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void ParallelLinqWhere(Example[] input) =>
input.AsParallel().Where(x => x.ExampleNumber % 2 == 0)
.ToArray().Consume(new Consumer());
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void WhereUsingEnumerator(Example[] input) =>
input.WhereUsingEnumerator(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());
//Our implementation!
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void NonAllocWhere(Example[] input)
{
input.StructWhere(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());
}
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void ZLinqWhere(Example[] input)
{
input.AsValueEnumerable().Where(x => x.ExampleNumber % 2 == 0).ToArray().Consume(new Consumer());
}
}
Results
| Method | input | Mean | Error | StdDev | Completed Work Items | Lock Contentions | Gen0 | Gen1 | Gen2 | Allocated |
|--------------------- |---------------- |----------:|----------:|----------:|---------------------:|-----------------:|---------:|---------:|---------:|----------:|
| LinqWhere | Example[500000] | 6.328 ms | 0.1251 ms | 0.1754 ms | - | - | 31.2500 | 31.2500 | 31.2500 | 1.92 MB |
| ParallelLinqWhere | Example[500000] | 3.933 ms | 0.0767 ms | 0.1531 ms | 15.0000 | - | 195.3125 | 125.0000 | 70.3125 | 5.92 MB |
| WhereUsingEnumerator | Example[500000] | 11.603 ms | 0.3069 ms | 0.9050 ms | - | - | 140.6250 | 140.6250 | 140.6250 | 7.63 MB |
| NonAllocWhere | Example[500000] | 5.476 ms | 0.2115 ms | 0.6237 ms | - | - | 31.2500 | 31.2500 | 31.2500 | 1.92 MB |
| ZLinqWhere | Example[500000] | 5.726 ms | 0.1136 ms | 0.1166 ms | - | - | 31.2500 | 31.2500 | 31.2500 | 1.92 MB |
ToArray
We implemented beginning, middle, but let’s end our chain of enumerators.
Extension methods are perfect use for this, right now We need two - one for generic IStructEnumerator
and second for filtering explicitly.
Generic
It’s time to look at generic implementation:
public static T[] ToArray<T>(this IStructEnumerator<T> enumerator)
{
T[] result = null;
if (enumerator.GetCountToLeftEnumerate(out int count))
{
result = GC.AllocateUninitializedArray<T>(count);
T current = default;
int idx = 0;
while (enumerator.Next(ref current))
{
result[idx++] = current;
}
}
return result;
}
We assume that chain can access to enumeration count, then it calls last enumerator’s method in chain - Next
and adds element to array.
Why GC.AllocateUninitializedArray<T>()
rather than new T[]
?
Again, the first reason is performance, the second one, We don’t have to care if array was zeroed during creation.
Where-explicit (for now!)
Generic implementation can’t fit into filtering operations made by Where
.
The main issue here is unpredicability of element’s count in target array.
Before executing selection operations We can safely assume that count of element at the begining and the end will be the same.
To fix this let’s introduce ArrayBuilder<T>
, based on approach introduced in SegmentedArrayBuilder
and found in ZLinq’s .ToArray
.
ArrayBuilder
Our ArrayBuilder must consist of:
- Storage of buffers
- Current buffer
- Current buffer’s index
- Current buffer element count
- All items count
Gathering all assumptions like this we can start implementing.
Right now we should declare a ref struct ArrayBuilder<T>
due to the fact that We want it to be as fast as possible.
public ref struct ArrayBuilder<T>
{
private const int DefaultCapacity = 5;
private T[] _buffer;
private SmallBuffer<T[]> _buffers;
private int _count;
private int _buffersIndex;
private int _capacity;
public ArrayBuilder()
{
_buffers = new SmallBuffer<T[]>();
_count = 0;
_buffersIndex = 0;
}
}
DefaultCapacity
indicates that, we want to start with something after element is being added to an null array.
Let’s implement add method:
public void Add(T item)
{
if (_buffer == null || _count >= _buffer.Length)
{
//todo: resize
}
_buffer[_count++] = item;
_capacity++;
}
This is straightforward - if a _buffer
is null or overflows We want to resize,
otherwise We just add the element and increment capacity counter.
private void Resize(int minimum)
{
int newCapacity = Math.Max((_buffer?.Length ?? 0) == 0 ? DefaultCapacity : _buffer.Length * 2, minimum);
var newArr = ArrayPool<T>.Shared.Rent(newCapacity);
if (_buffer != null)
{
_buffers[_buffersIndex++] = _buffer;
}
_buffer = newArr;
_count = 0;
}
The Resize
accepts minimum count to resize, due to possible extension of this struct by a AddRange
method, but for now It will be always 1.
While We try to add a element to an null array It’s worth to create something using a DefaultCapacity
. Otherwise It’s worth to select maximum from a current buffer’s length and minimum value.
Our array can be very big. The ArrayPool<T>
is the best solution right now to use for this as We rent an array with a new calculated capacity.
Using a ArrayPool<T>
enforces on us to return the memory after usage, so We need to keep it in mind.
After this we store a current buffer (if exist) in a buffer storage, then we are assigning a new current buffer.
SmallBuffer
Existence of this method was sparked during browsing a ZLinq source code. It’s the same approach described here.
The goal is to create a struct that behaves like an array.
If buffer length increases by 2, We can easily calculate the max buffer length here.
public ref struct SmallBuffer<T>
//note 1: approach found here: https://www.jacksondunstan.com/articles/5051
//note 2: increase number of elements to support more data
{
private T _item1;
private T _item2;
private T _item3;
private T _item4;
private T _item5;
private T _item6;
private T _item7;
private T _item8;
private T _item9;
private T _item10;
private T _item11;
private T _item12;
private T _item13;
public T this[int index]
{
get
{
return index switch
{
0 => _item1,
1 => _item2,
2 => _item3,
3 => _item4,
4 => _item5,
5 => _item6,
6 => _item7,
7 => _item8,
8 => _item9,
9 => _item10,
10 => _item11,
11 => _item12,
12 => _item13,
_ => throw new IndexOutOfRangeException()
};
}
set
{
switch (index)
{
case 0: _item1 = value; break;
case 1: _item2 = value; break;
case 2: _item3 = value; break;
case 3: _item4 = value; break;
case 4: _item5 = value; break;
case 5: _item6 = value; break;
case 6: _item7 = value; break;
case 7: _item8 = value; break;
case 8: _item9 = value; break;
case 9: _item10 = value; break;
case 10: _item11 = value; break;
case 11: _item12 = value; break;
case 12: _item13 = value; break;
}
}
}
}
So after implementing a SmallBuffer
and partially a ArrayBuilder
, we can prepare array returning methods.
First one - for a ArrayPool<T>
is very straighforward. We just return all rented arrays.
public void Clear()
{
if (_buffer != null)
{
ArrayPool<T>.Shared.Return(_buffer, RuntimeHelpers.IsReferenceOrContainsReferences<T>());
}
for (int i = 0; i < _buffersIndex; i++)
{
ArrayPool<T>.Shared.Return(_buffers[i], RuntimeHelpers.IsReferenceOrContainsReferences<T>());
}
_count = 0;
_buffer = null;
_buffers = new SmallBuffer<T[]>();
}
Second one - for user - We have to iterate over every buffer and a current buffer to return all elements. It’s worth to use a Span<T>
here
public T[] ToArray()
{
var result = GC.AllocateUninitializedArray<T>(_capacity);
var resultSpan = result.AsSpan();
int nextIdx = 0;
for (int i = 0; i < _buffersIndex; i++)
{
Span<T> buffer = _buffers[i].AsSpan();
buffer.CopyTo(resultSpan.Slice(nextIdx, buffer.Length));
nextIdx += buffer.Length;
}
_buffer.AsSpan().Slice(0, _capacity - nextIdx).CopyTo(resultSpan.Slice(nextIdx, _capacity - nextIdx));
return result;
}
Finally, our ArrayBuilder
implementation should look like this:
public ref struct ArrayBuilder<T>
{
private const int DefaultCapacity = 5;
private T[] _buffer;
private SmallBuffer<T[]> _buffers;
private int _count;
private int _buffersIndex;
private int _capacity;
public ArrayBuilder()
{
_buffers = new SmallBuffer<T[]>();
_count = 0;
_buffersIndex = 0;
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public T[] ToArray()
{
var result = GC.AllocateUninitializedArray<T>(_capacity);
var resultSpan = result.AsSpan();
int nextIdx = 0;
for (int i = 0; i < _buffersIndex; i++)
{
Span<T> buffer = _buffers[i].AsSpan();
buffer.CopyTo(resultSpan.Slice(nextIdx, buffer.Length));
nextIdx += buffer.Length;
}
_buffer.AsSpan().Slice(0, _capacity - nextIdx).CopyTo(resultSpan.Slice(nextIdx, _capacity - nextIdx));
return result;
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public void Add(T item)
{
if (_buffer == null || _count >= _buffer.Length)
{
Resize(_count + 1);
}
_buffer[_count++] = item;
_capacity++;
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public void Clear()
{
if (_buffer != null)
{
ArrayPool<T>.Shared.Return(_buffer, RuntimeHelpers.IsReferenceOrContainsReferences<T>());
}
for (int i = 0; i < _buffersIndex; i++)
{
ArrayPool<T>.Shared.Return(_buffers[i], RuntimeHelpers.IsReferenceOrContainsReferences<T>());
}
_count = 0;
_buffer = null;
_buffers = new SmallBuffer<T[]>();
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
private void Resize(int minimum)
{
int newCapacity = Math.Max((_buffer?.Length ?? 0) == 0 ? DefaultCapacity : _buffer.Length * 2, minimum);
var newArr = ArrayPool<T>.Shared.Rent(newCapacity);
if (_buffer != null)
{
_buffers[_buffersIndex++] = _buffer;
}
_buffer = newArr;
_count = 0;
}
}
After completing all steps above We can implement method to return an array from filter operation
public static T[] ToArray<TEnumerator, T>(this InternalWhere<TEnumerator, T> enumerator)
where TEnumerator : struct, IStructEnumerator<T>
{
T[] result = null;
var arr = new ArrayBuilder<T>();
T? current = default;
int idx = 0;
while (enumerator.Next(ref current))
{
arr.Add(current);
}
result = arr.ToArray();
arr.Clear();
return result;
}
Note! Above implementation was used in all previous benchmarks.
Links
Source code can be found here.
All benchmarks were created with BenchmarkDotNet.