Friday Links 0.0.22 - StringBuilder Updates
This is based on an email I send my .NET team at work
Happy Friday,
Let’s take a deep dive into one of the most beloved portions of the .NET base class library: the venerable StringBuilder.
StringBuilder: the Past and the Future
http://codingsight.com/stringbuilder-the-past-and-the-future/
Timur Guev has a neat article looking into the internals of System.Text.StringBuilder.
When .NET was first released, the implementation was mostly the same as List: whenever it ran out of space, it would double its capacity, copy the existing text, and start working on the new, bigger buffer.
In .NET 4, they changed the internals to instead be a Linked List of buffers.
This is sometimes called a Rope. As you Append()
text to the builder,
eventually its current buffer runs out of space, so a new one is allocated, and
a pointer is kept in the internal book-keeping. In this algorithm, there is no
need for copying strings around during the Append operation. However, when it
becomes time to create a real string
object, StringBuilder has to allocate a
big enough block of memory, then walk the linked list copying each buffer to the
output.
They did this because the most common use-case for StringBuilder is some kind of
tight loop that calls Append()
a bunch of times, before finally grabbing the
resulting string with ToString()
. The new algorithm is much better designed
for this scenario. There is a lot less copying of character bytes, and no extra
allocations of arrays for strings. This reduces CPU time and garbage collection
pressure.
However, other methods of StringBuilder suffer: Insert()
, Remove()
etc incur
extra bookkeeping and copying operations compared to the previous implementation
that just kept the entire data in an array. Also, the final ToString()
call is
slower, because it has to allocate a new string for the result and copy the data
into it. Prior to .NET 4, the StringBuilder could just return a pointer to its
internal buffer.
This is a really good example of encapsulation. The external interface of StringBuilder did not change at all in .NET 4, though its internals were completely reworked to target a different performance profile.
It’s also a good example of the tradeoffs involved in performance work. The framework designers decided it was better overall to improve the most common use case, even if some other scenarios would suffer.
Check out the link for a deeper look and some performance timings the author did to demonstrate the tradeoffs.