Collection Expressions in .NET 8
A look at one of the exciting new features in C# 12, shipped with .NET 8
September 9, 2024
Overview
C# 12 introduced a series of new quality of life and performance enhancements. These include: primary constructors, inline arrays, aliasing any types, and of course collection expressions. Collection expressions are by far my most favourite improvement to come of the recent language changes. In this post, I will cover what they are and how you can use them, as well as explore some of the generated IL code to see what happens under the hood.
What are collection expressions
Collection expressions are a syntax for inlining the initialisation of collections. Below is a comparison of the syntax for initialising an array of integers.
// c# 3
int[] nums = new int[] { 1, 2, 3 };
// c# 12
int[] nums = [ 1, 2, 3 ];
Collection expressions can be used with other types too, like List<T>. Below are two examples which initialise a List of int with three values.
// c# 3
var nums = new List<int>() { 1, 2, 3 };
// c# 12
List<int> nums = [ 1, 2, 3 ];
HashSet is another collection that is supported with collection expressions.
// c# 3
var nums = new HashSet<int>() { 1, 2, 3 };
// c# 12
HashSet<int> nums = [ 1, 2, 3 ];
Dictionaries, however, cannot be initialized (with values) in this way. There is a C# language proposal for Dictionary expressions which would give developers a similar syntax for initialising this collection.
The new collection expression syntax can be used in other scopes too, beyond just creating local variables. It can be used on class fields and properties. The syntax can also be used to create empty collections, even on collections not supported.
public class Response
{
private List<int> _internalErrorCodes = [5001, 5002, 5003];
public int[] StatusCodes { get; } = [200, 400, 500];
public Dictionary<string, object?> Cache { get; } = [];
public int[] GetErrorCodes() => [5001, 5002, 5003];
public void Upload(List<string> items) { }
}
Collection expressions can also be passed through as method parameters.
var resp = new Response();
resp.Upload([]);
A consequence of this though is the underlying collection type is visually omitted from the developer. Now, C# is no stranger to type masking. For example, take the var keyword. This keyword gives developers the flexibility to assign variables without the explicitly specifying the type. The compiler can infer the type, allowing you to continue to use the variable as if it were defined.
I extensively use var in any of my projects, snippets, and at work. I have never really questioned the reason I do, or questioned what the drawbacks could be. It certainly makes writing and maintaining code easier
The Spread element
The spread element is a further iteration of an existing feature that came as part of List patterns, which came in C# 11. the .. syntax allows you to, in the context of collection expressions, concatenate collections together.
int[] nums = [1, 2, 3];
int[] nums2 = [4, 5, 6];
int[] all = [..nums, ..nums2]; // 1, 2, 3, 4, 5, 6
The spread element works in all the other examples of collection expressions we’ve seen, and they can be interchangeable.
int[] nums = [1, 2, 3];
List<int> nums2 = [4, 5, 6];
HashSet<int> all = [..nums, ..nums2]; // 1, 2, 3, 4, 5, 6
This makes it very easy to manipulate collections, especially when we combine the slice operator too.
int[] nums = [1, 2, 3];
int[] nums2 = [4, 5, 6];
int[] all = [..nums[^1..], ..nums2[0..1]]; // 3, 4
The same restrictions apply to the spread element as collection expressions. They can be used when passing variables to a method, or assigning a collection to a field or property.
I have found the spread operator particularly helpful in unit tests. For example, when asserting that a method is mutating a collection in the right way. A combination of the slice operator, spread element, and collection expressions gives us a useful set of tools for interacting with a collection.
Behind the scenes
Arrays
When using collection expressions to initialize an array, the compiler is producing the same IL (intermediate language) as the C# 3 initialization syntax. They both create a temporary array and call the RuntimeHelpers.InitializeArray method with the integers represented, statically, as bytes. Note, this is only applicable if the type of array is a primitive like int or double.
int[] array = new int[3];
RuntimeHelpers.InitializeArray(array, (RuntimeFieldHandle));
RuntimeFieldHandle will infer a generated class which has the primitive type represented as bytes. You can check out the generated IL for array initializers with primitives in SharpLab.
When initializing an array with more complex types, like a class called Car, you end up with something more akin to the generated code in List.
Car[] cars = new Car[] { new Car(), new Car(), new Car() };
public class Car {}
The compiler creates an array with a fixed length, and assigns the array elements with the variables we passed in the initialization.
Car[] array = new Car[3];
array[0] = new Car();
array[1] = new Car();
array[2] = new Car();
List
With List<T> the compiler produces slightly different IL, and ultimately JITted (just in time) machine code. Initialization in this way is limited and specific, there are not many (or any) reasons you would need to specify say 100, or 1000, items in this way. In C3 (c# 3 List initialization), the compiler turns our code in to something like:
List<int>() nums = new List<int>();
nums.Add(1);
nums.Add(2);
nums.Add(3);
Compared to C12 (C# 12 List initialization) which leverages CollectionsMarshal to override the underlying data structures of the collection1.
List<int> list = new List<int>();
CollectionsMarshal.SetCount(list, 3);
Span<int> span = CollectionsMarshal.AsSpan(list);
int num = 0;
span[num] = 1;
num++;
span[num] = 2;
num++;
span[num] = 3;
The difference in the compiled code, for the C# 3 and C# 12 syntaxes, results in some interesting benchmarks2.
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|---|---|---|---|---|---|
| C3_10 | 34.026 ns | 0.1521 ns | 0.1423 ns | 0.0043 | 216 B |
| C3_100 | 137.700 ns | 1.3390 ns | 1.2525 ns | 0.0234 | 1184 B |
| C3_1000 | 1,613.119 ns | 7.2462 ns | 6.4236 ns | 0.1678 | 8424 B |
| C12_10 | 8.241 ns | 0.0523 ns | 0.0489 ns | 0.0019 | 96 B |
| C12_100 | 27.166 ns | 0.1852 ns | 0.1732 ns | 0.0091 | 456 B |
| C12_1000 | 344.499 ns | 2.1626 ns | 1.8059 ns | 0.0806 | 4056 B |
Why does C3 perform so much worse than C12 you might ask?
List is created with an initial capacity of 4. Calling List.Add, like what the compiler generates, will resize the underlying backed array when capacity is reached. As the capacity changes, List will initialise a new array and copy the contents over (resulting in the higher allocations). Growing the underlying array causes a number of branches to be checked, causing the increase in execution time.
If we set initially set the List capacity, through passing a capacity integer in the constructor, we can reduce the new array allocations and produce similar results to C12. This also slightly improves the execution time as we avoid all the grow and resize branch.
I started to wonder why Rosyln, the compiler, would not use the List constructor which specifies the capacity as an optimisation during compilation.
After digging on GitHub I discovered this issue which explains the potential optimization. In order for the compiler to apply the suggested optimisation every collection, that can use the C3 initialization syntax, would also need a constructor which passes capacity. This may not be possible or not worth the potential backwards compatibility issues it causes.
Spread Element
Let’s take an earlier example of using the Spread Element, we initialize three integer arrays using the collection expression syntax. The third is a concatenation of the first two. I was curious as to what the compiler is doing behind the scenes, so I have provided and translated a snippet of the important bit below.
int num = 0;
int[] nums = ...;
int[] nums2 = ...;
int[] all = new int[nums.Length + nums2.Length];
int num2 = 0;
while (num2 < nums.Length)
{
int num3 = (all[num] = nums[num2]);
num++;
num2++;
}
int num4 = 0;
while (num4 < nums2.Length)
{
int num3 = (all[num] = nums2[num4]);
num++;
num4++;
}
The compiler creates a new array, setting the size to the lengths of both of nums and nums2 array. The compiler will then iterate over the n concatenations and assign a position (marked by num) a value from one of the respective input collections.
It is cool to see how the compiler takes my expression, collection expression with spread element, and turns it in to something that now seems obvious. Possibly O(n) in time complexity, although n is quite variable in this context given the number of collections you want to combine and the number of elements inside each collection. The only (additional) allocation made is creating the all array, which is to be expected.
What about when I refer to my example of interchanging between three types of collections, and using the spread element to concatenate them together. Below is the IL of a integer array and integer being concatenated into a integer hashset.
int[] nums = ...;
List<int> nums2 = ...;
int num = 0; // Compiler generated to copy the contents of a list into a Span
HashSet<int> hashSet = new HashSet<int>();
int num2 = 0;
while (num2 < nums.Length)
{
num = nums[num2];
hashSet.Add(num);
num2++;
}
List<int>.Enumerator enumerator = nums2.GetEnumerator();
try
{
while (enumerator.MoveNext())
{
num = enumerator.Current;
hashSet.Add(num);
}
}
finally
{
((IDisposable)enumerator).Dispose();
}
The array concatenation is doing the same as we have seen previously except it is using the Add<T>() method on HashSet to append the array items. List is using an approach akin to how foreach works in that the compiler retrieves an enumerator for the collection, and moves the enumerator until the end of the collection.