Nearly everyone reading this is using a machine with multiple cores. With a basic laptop containing a dual core CPU and your average desktop with a four core CPU, we have processing power our computing ancestors could only dream about. Most of us developers are writing multithreaded applications to take advantage of that power, but there’s a sad secret that developers keep amongst ourselves. Unless you are some sort of savant who can visualize all your threads in perfect parallelization, we really don’t have a good clue if we’re successfully taking advantage of those multiple threads.
The whole trick to multithreading is to never give up the time slice (called the quantum in kernel-speak). Just how do you give up the time slice? By synchronizing. Whenever you’re waiting to acquire a synchronization object, you’re not multithreading. Unfortunately, when developers are designing their code, they go through a few thought games and think they have an idea how things will work. What happens in production is that they don’t seem to get the scalability they expect, but don’t know why.
It turns out that many of these multithreading problems are that your code has one or more synchronization objects that you’re holding onto far longer than you expect. The trouble is that yesterday there was no way for your average developer to figure out what synchronization objects were causing all the contention without an extremely deep analysis through reading the source code and attempting to model the threads on paper.
Wouldn’t it be nice if we could have a tool that would point out all the synchronization objects that our application is fighting over? That way we could focus our analysis on just the spots in the code that matter to see where the skirmishing between threads is causing us to give up the time slice. Today there is just such a tool in Visual Studio 2010 Beta 2 called the Concurrency profiler.
I’ve already written about the improvements in the Sampling and Instrumentation profiler, but the Concurrency profiling is a brand new feature for Visual Studio 2010. As with the CPU and memory profiler, the Concurrency profiling works just great on .NET 2.0 binaries so you can start using it today to find those multithreading contentions and finally ensure you are multithreading correctly.
To enable the Concurrency profiler from the Performance Explorer, General property page, choose Concurrency and check the “Collect resource contention data” field. If you want to be hard core, you can also start all your profiling from the command line with VSPERFCMD.EXE and you’d use the /START:CONCURRENCY,RESOURCEONLY option. I’ll talk more about the Visualization option in another article.
The Concurrency profiler hooks both the managed and native synchronization methods/functions and when you code calls one, records the time spent blocking and the call stack. As we are still working with a Beta product, I haven’t done any formal performance tests, but the performance difference is so negligible that you won’t even know you’re gather the resource contention data. In fact, the first time I ran the Concurrency profiler, I thought it was not running because I couldn’t tell a performance difference.
To demonstrate the Concurrency profiler, I wrote a simple program that fired up two threads to execute the following method.
private void SyncBlockThread()
{
for (int i = 0; i < 100; i++)
{
lock (syncBlock)
{
Trace.WriteLine(String.Format(“{0} has the sync block”,
Thread.CurrentThread.Name));
Thread.Sleep(randWait.Next(500));
}
}
}
After the two threads ended, I fired up two additional threads that both executed the following method.
private void MutexThread()
{
for (int i = 0; i < 100; i++)
{
try
{
theMutex.WaitOne(-1);
Trace.WriteLine(String.Format(“{0} has the mutex”,
Thread.CurrentThread.Name));
Thread.Sleep(randWait.Next(500));
}
finally
{
theMutex.ReleaseMutex();
}
}
}
After your program ends or you detach the profiler, you get a view of your application you could only dream about before. The graph in the upper left corner shows you how many contentions were occurring as your application ran. In the example below, there are a maximum of seven contentions and contentions occurring all through most of the application. In an ideal world, you would never see this kind of graph. I’ll admit that this is a contrived program, but it is an excellent example to show exactly what the Concurrency profiler can do.
Like the sampling profiler I wrote about previously, the Concurrency profiler defaults to “Just My Code” which means only shows you the handles and threads where you have source code. If you need to see everything in the application because you are using libraries that don’t have PDB files, click the Show All Code link in the Notifications section in the upper right corner to see all handle contentions across your application.
The two tables at the bottom show the story of the contentions causing the most trouble. In most cases, you care about the Most Contended Resources table, as that shows the synchronization objects the threads are fighting over. To analyze the fighting, click on the handle and you’ll jump to the Resource Details view so you can see how much blocking is occurring on the resource.
The Total graph shows the combined blocking time of all threads blocking on that particular handle. Below that you see each thread showing exactly when, where (through the call stack), and how long a particular thread wasn’t getting any work done. This is a brilliant view and will help point you right at those problems for a particular resource. Keep in mind that normal applications won’t show patterns like my example here, but it’s informative to see what the degenerative case looks like.
If you’re curious about seeing a thread and all the different resources the thread is blocking on, in the Summary view, click the thread id and you’ll see the above view inverted where the thread blocking patterns as the top graph with all the different resource graphs below it. That’s a handy view for threads that are grabbing synchronization resources from all over the place.
There are other views supported by the Concurrency profiler such as the Call Tree view, which is a nice view to see the call path with the most contentions. However, after running many different types of applications under the Concurrency profiler, the Summary and Contentions by Handle/Thread are the ones you’re going to spend all your time analyzing. I love tools that are simple, yet solve a very hard problem.
After playing with Concurrency profiler for a while there are a few things I’ve noticed that will help you out. The first is that if you’re writing .NET applications and you totally control the threads, in other words, not using a thread queue, set the thread Name property. That makes it easier to identify the threads instead of the method name. For native C++ applications, the thread naming trick that works in the debugger does not work in the Concurrency profiler.
The second trick is simple. If you want to run Concurrency profiling on a WPF, Windows Forms, or console application, I find it best to disable the Visual Studio hosting process in the solution property pages. Running under the hosting process adds additional threads and synchronization to your application and by getting those out of the way it’s easier to see just your stuff.
The last trick I’m seeing is that to get the best information about the contention, you need to stress the application. Just doing a quick run in most cases won’t give you a good feel for the application threading. The Concurrency profiler needs to be a “must do” task for your stress tests. The more you bang on the application, the more likely you’ll start getting the real picture of your threading.
As we are at Beta 2, I’m hoping that the team can fix one thing in the Concurrency profiler. All synchronization objects are reported as “handles,” I would love to see the type of handle and more importantly, the handle name if one was specified at creation. That would make it much easier to identify your objects at a glance. It seems easy to me to accomplish this by hooking all the handle creation methods and recording those with names specified. Of course, if it were easy, the team would have done it already. However, I’ll be filing this as a feature request and keep my fingers crossed that my dream comes true at RTM.
The Concurrency profiler is a fascinating insight to your threads. Even if you don’t believe you have a problem it’s one of those tools you need to run regularly to get a feel for your application’s behavior. Half of performance tuning is understanding your normal behavior so you know when you’re dealing with an outlier condition. Remember, the Concurrency profiler works fine on today’s binaries so you can, and should, start using it immediately.