What is Yield? A Brief Overview
Yield is a feature of C# that is not present in its largest competitor, Java. Many of you have heard of it, but haven’t really used it, or may have dismissed it as “syntactic sugar.” Others might have never heard of it before. In short, it’s a way of replacing this:
public static List<int> ExampleList() { List<int> ret = new List<int>(); ret.Add(1); ret.Add(2); ret.Add(3); return ret; }
With this:
public IEnumerable<int> ExampleYield() { yield return 1; yield return 2; yield return 3; }
So what happened here? First, we’ve opted away from <code>List<int></code> for a more interface-driven <code>IEnumerable<int></code> function return. Also, we’ve reduced the number of lines; by two to be exact. And somehow, we’ve managed to return a list of items without actually allocating a new list. That’s an important point to make, because we have completely avoided instantiating a new object.
What we’ve done is we have actually created an anonymous IEnumerable-implementing object in only a few lines. We’ve created an object that says “Hey, let users call this method to request a bunch of stuff, but I don’t care if the stuff is a list or a dictionary or what. Also, don’t waste time allocating some new object if it isn’t necessary.”
That’s the key here: We haven’t filled up a new List object with these three items. That means that we have no unnecessary memory footprint. We’re just returning items. But how is it doing this?
Imagine this method, ExampleYield, were called like this:
foreach (int i in ExampleYield()) { DoSomething(i); }
Well, if we were calling ExampleList()
instead of ExampleYield()
, we’d load up a List object, and then return the entire thing and then operate on it. But with ExampleYield, the order that the code is run is shown in the comments of the code below:
foreach (int i in ExampleYield()) //1 { DoSomething(i); //3, 5, 7 } public IEnumerable<int> ExampleYield() { //Note that it is not necessary to allocate a List or any other kind of new data structure yield return 1; //2 yield return 2; //4 yield return 3; //6 }
There are a couple things to notice:
ExampleYield is not run to completion. Instead, code runs until the first yield is reached. Hopefully now you see why this is called “yield.” The ExampleYield method does not resume until another value is request by the calling foreach loop. If we weren’t using a foreach loop for the IEnumerable, this would happen when we called the MoveNext()
method, and then accessed the Current property.
As a result of the point made above, no new object needs to be created to hold all of these items. They’re just returned as-needed. This may not be a big deal with three items, but what if there were 300,000 items? What kind of memory footprint would a 300,000 item List object have?
Also, if you were to run an IEnumerable method like Any()
(which checks to see if the enumerable has a none-zero number of items), the ExampleYield method would run to the first line, “yield return 1;” line, and then stop. It would know the value is true, and would not have to run through the rest of the method. This would save even more time.
An IEnumerable can very easily be converted to a List, using ToList()
, which would take no more time than it would have taken to create the list in the first place. There is also a ToDictionary()
method, which requires a very simple lambda expression. However, this would be a business-layer-implementation decision by the calling code, rather than a data-access-layer decision in the ExampleYield method.
A sufficiently intelligent compiler can use this to figure out exactly what you’re trying to do. If the compiler decides that the best use of a multi-core processor is to dedicate one core to the yield method, and another core to the main code, it could do this. The dependencies are obvious when you write a yield method. However, I should mention that I don’t know the exact details of how compilers deal with yield for these purposes.
Lastly, the syntactic sugar value should not be overlooked. The method just LOOKS a lot better. It’s shorter and more to the point.
A Less Obvious Use for Yield: Decorating a Collection by Wrapping Each Object
I am a big advocate of wrapper objects. As a SharePoint developer, I strongly believe that no SPListItem object should ever be accessed outside of the wrapper object that is designed to contain it. Your view layer should not care that it’s dealing with a SharePoint list item.
So, let’s imagine that you agree with the last paragraph, and then imagine that you have a list called Students, and you have a business object wrapper called Student. How can you return an Enumerable of Student objects, while still leveraging the data access layer that SharePoint has provided for you? In other words, how do you perform a query on those items, without having to create a new List object to hold them? The answer is below:
private IEnumerable<Student> PerformStudentsQuery(SPWeb Web, String Query) { SPList list = Web.Lists[StudentsList.ListName]; SPQuery query = new SPQuery(); query.Query = Query; SPListItemCollection col = list.GetItems(query); foreach (SPListItem listItem in col) yield return new Student(listItem); }
See what happened there? You don’t have to fill an entire List with Student objects and then return the List. You start the query, and then just yield return each list item, but then throw a wrapper around each item as it is returned. There are no wasted cycles, no wasted memory footprint, and no exposed SPListItem objects that can be misused. Just a neatly wrapped object that is wrapped as it is requested.
One More Less Obvious Use: Augmenting a Collection
Say you need to return the contents of some Data Structure. An array, let’s say. But you need to variably add one or two static items to it.
private static bool includeListContents { get; set; } private static string[] listContents = { "circle", "square", "triangle" }; public IEnumerable<string> ListContents { get { yield return "none"; if (!includeListContents) yield break; //This ends the method, and no more items are yielded foreach (string value in listContents) yield return value; yield return "all of the above"; } }
Here we’ve added a custom static string to the front and the end of a string array if IncludeListContents is true. Otherwise, we just return a single item with a value of “None.” Notice the use of yield break;
. This ends the method (like a return might for a void function). Also worth noting is that there is an implied “yield break” at the end of any IEnumerable method, so you don’t normally need to put it in. But it can be useful for complex methods, like above.
We’ve done this without modifying the array. The calling method has no idea. It just knows it got back 0 or more items, which is all it needs to know.
Infinite Data
Yield is at its most powerful as a way of consuming data that might never end. Notice that, because this is done lazily, the absolute least amount of work will be done by the CPU to accomplish this task. Obviously, this will start to fail as the number becomes too large to be handled by a long.
IEnumerable<ulong> LazyFibonacci() { yield return 0; //Return base case. Avoids having to set prev to -1, which we can't do with a ulong ulong prev = 1; ulong curr = 0; ulong next; while (true) { next = prev + curr; yield return next; prev = curr; curr = next; } }
Advanced Topics
You can’t put a yield return in a try/catch, but you can put one in a try/finally. Read here for a more detailed explanation as to why.
If you decide to use the Enumerator manually, using MoveNext()/Current
, rather than in a foreach loop, you should be aware of problems cleaning up Disposables in your method, since the method will not be run to completion. So call Dispose()
on your Enumerator when you’re done with it or open it in a using block to avoid leaving any Disposables open in an Enumerator. More information can be found at this link.
To Summarize
Yield is useful in any kind of data-access layer method in which you haven’t decided what the calling method is going to want to do with the data. In other words, the majority of data-access layers. It is useful for very large amounts of data, or an infinite amount. It is useful for generalizing your data, and letting the business layer decide what specific structures to use.
Yield is elegant, efficient, clean, generalized code. It lowers the potential memory footprint of your code. It reduces the number of lines, and therefore, complexity. It gives valuable information to your compiler, which can be used to further optimize your code. It allows you to easily convert a list from one type to another as items are extracted from that list, rather than having to create a new list first. It makes your code more generalized, so it doesn’t matter if you need your data as a List, a Dictionary, for a for-each, as a tree, or whatever else you may use. It’s lazy, and only does as much work as is necessary to get the job done. And you’ll find that converting your existing code to yield, or writing new code with it, is often much easier than you assume.