An introduction to threads

If you've never worked with a multi-threaded application before, you might not see a problem with the following code.

// C# code

class crashtest
{
    private const int thread_count = 30;
    private const int thread_items = 1000;

    private List<string> shared_data = new List<string>();
    private List<Thread> threads = new List<Thread>();
    bool begin = false;

    public crashtest()
    {
        for (int i = 0; i < thread_count; i++)
            threads.Add(new Thread(Work));
        for (int i = 0; i < thread_count; i++)
            threads[i].Start(i.ToString());

        begin = true;

        for (int i = 0; i < thread_count; i++)
            threads[i].Join();

        for (int i = 0; i < shared_data.Count; i++)
            Console.WriteLine(shared_data[i]);
        Console.WriteLine("Finished.");
        Console.ReadKey();
    }

    private void Work(object param)
    {
        Thread.CurrentThread.Priority = ThreadPriority.Lowest;
        while (!begin) Thread.Sleep(1);

        string id = (string)param;
        string data = "abc - " + id;
        for (int i = 0; i < thread_items; i++)
            shared_data.Add(data);
    }
}

The code compiles just fine, and depending on how lucky you are it might even execute properly once or twice. But, keep pressing your luck and sooner or later you'll receive an error. Eventually a context switch will happen at just the wrong moment resulting in an undesired behavior in your, or particularly in this example, Microsoft's code. A context switch refers to the moment when CPU switches from one task to another. Consider the example two threads created to execute the following function.

// C# code

void WriteSomeStuff() 
{
   Console.WriteLine("This is the first line of a thread.")
   Console.WriteLine("This is the second line of a thread.");
   Console.WriteLine("This is the third line of a thread.");
}

Say the first thread executes the first line of the function and outputs:

This is the first line of a thread.

Just afterwards a context switch occurs and the CPU begins working on the second thread and completes all three lines of code in the function. Now the output is:

This is the first line of a thread.
This is the first line of a thread.
This is the second line of a thread.
This is the third line of a thread.

The second thread completed, the CPU returns to the first thread and the final output becomes

This is the first line of a thread.
This is the first line of a thread.
This is the second line of a thread.
This is the third line of a thread.
This is the second line of a thread.
This is the third line of a thread.

It's important to note, context switches are expensive so it would be very unusual for two threads to be split up this way but simply for the sake of this example we're going to pretend they do. The main thing to remember is that a context switch can occur anywhere and hence you should always account for the possibility if you're writing a multi-threaded application. Which brings up something interesting about the Console.WriteLine() method. Keep in mind that it is not an atomic operation. (Meaning it can be broken up into pieces.) It's a method like any other and that it has several lines of code to execute and some of those lines call further methods that contain several lines of code. The point being a context switch could potentially occur inside the Console.WriteLine() method itself. And were it not for the fact that the method is thread safe theoretically the result could have been something like.

This is the fThis is the first line of a thread.
irst line oThis is thf a the second liread.
ne of This is the secoa thread.
Tnd line of ahis is th thread.e third line of a thread.

This is the third line of a thread.

However, the method uses mutual exclusion to ensure that no two calls overlap one another in it's critical section. A critical section is any portion of code that cannot be accessed by multiple threads if it is to function properly. So lets say you wanted your two threads to produce their outputs sequentially like so.

This is the first line of a thread.
This is the second line of a thread.
This is the third line of a thread.
This is the first line of a thread.
This is the second line of a thread.
This is the third line of a thread.

If you're already working on a solution as you're reading this article you may have considered the following.

// C# code

bool busyFlag = false;

// ...

void SomeThreadSafeFunction()
{
   while (busyFlag) ; // Wait for other threads to finish
   busyFlag = true; // Keep other threads out.

   // Critical section
   Console.WriteLine("This is the first line of a thread.")
   Console.WriteLine("This is the second line of a thread.");
   Console.WriteLine("This is the third line of a thread.");
   // End critical section

   busyFlag = false; // Let other threads in.
}

While this method of mutual exclusion would work 99% of the time, it isn't a real solution. Consider the following possibility

Thread 1                           Thread 2
Enters SomeThreadSafeFunction()
Checks busyFlag (false)
Exits while loop
                                   Enters SomeThreadSafeFunction()
                                   Checks busyFlag (false)
                                   Exits while loop
                                   Sets busyFlag to true
                                   Writes first line
Sets busyFlag to true
Writes first line
Writes second line
Writes third line
Sets busyFlag to false
                                   Writes second line
                                   Writes third line
                                   Sets busyFlag to false
                                   Exits function
Exits function

Potentially resulting in the same output as before.

This is the first line of a thread.
This is the first line of a thread.
This is the second line of a thread.
This is the third line of a thread.
This is the second line of a thread.
This is the third line of a thread.

At this point it should be clear that what you need is a way to test a flag and then set it without being interrupted. This is more commonly known as an atomic test and set operation. Several forms of these operations exist (lock, monitor, semaphore and message queue) and they all perform thread synchronization in this manner. However in this article I will be using mutexes which are commonly found in languages which support multi-threaded programming. So, lets change our prior example to use a mutex.

// C# code

System.Threading.Mutex mut = new System.Threading.Mutex();

// ...

void SomeThreadSafeFunction()
{
   mut.WaitOne(); // Wait for the mutex to become available and lock it.

   // Critical section
   Console.WriteLine("This is the first line of a thread.")
   Console.WriteLine("This is the second line of a thread.");
   Console.WriteLine("This is the third line of a thread.");
   // End critical section

   mut.ReleaseMutex(); // Unlock the mutex.
}

The critical section is now mutually exclusive and thread safe. Meaning, it is safe to call with multiple threads because the shared resource (the console window in this case) is only accessed by one thread at a time. If you were to look at the inner code of the Console.WriteLine() method you would see a similar approach being used to ensure that an entire line is written uninterrupted by another thread. The problem with the code at the beginning of this article is that the List<>.Add() method is not thread safe. Adding a mutex to the class and using it to protect the critical section (in this case a single line of code) will solve the problem.

// C# code

class crashtest
{
   private const int thread_count = 30;
   private const int thread_items = 1000;

   private System.Threading.Mutex mut = new System.Threading.Mutex();

   private List<string> shared_data = new List<string>();
   private List<Thread> threads = new List<Thread>();
   bool begin = false;

   public crashtest()
   {
      // ...
   }

   private void Work(object param)
   {
      // ...

      for (int i = 0; i < thread_items; i++)
      {
         mut.WaitOne();
         shared_data.Add(data);
         mut.ReleaseMutex();
      }
   }
}

That's it. This code will now function as expected each time it's run. If you've followed along so far congratulations! You now understand the basics of multi-threaded programming! There are just two more things I want to point out.

// C# code

Thread.CurrentThread.Priority = ThreadPriority.Lowest;

By setting the threads priority to lowest it increases the number of times that a context switch will occur. Since the example was meant to show that context switches can cause unexpected results in your code if not handled properly this setting helps ensure something bad happens. Typically speaking however, you don't need to (and usually shouldn't) mess with a threads priority level.

// C# code

bool begin = false;

You may have noticed that this shared boolean is accessed by multiple threads but never protected with a mutex. I don't want to get into why this is possible in this article but the point is that some shared access is thread safe by definition and so a mutex isn't always necessary.

Obviously there is a lot more to multi-threaded programming to be learned. Stay tuned as I'll continue to post on the subject. Take a look at my thread safe byte buffer queue for a further example.