Dijkstra Casually Invented Synchronization Primitives as an Afterthought
Edsger Dijkstra is famous for his many, many contributions to the field of computer science, but I think it’s quite likely, however implausible, that his greatest contribution to tech culture is the phrase “__ considered harmful.” I wonder how many people have actually read his original essay, Goto Statement Considered Harmful. Dijkstra’s writing style is controversial: terse, often patronizing, professorial. Nevertheless, that particular title has somehow become so much of a meme in tech writing that there is now an entire trope of considered harmful essays considered harmful. It’s an interesting legacy, to say the least, for a man who published over a thousand essays.
However, Dijkstra’s greatest contribution to the mental distress of undergraduate CS students has to be the invention of semaphores and mutexes. Even more impressively, he introduced them both in the appendix of his paper on THE Multiprogramming System. (Yes, THE is the name of the operating system. He was Dutch. That’s his excuse. He was not particularly modest, so I can’t imagine he was displeased with how it sounded in English to introduce THE Multiprogramming System.) I was a TA for the undergraduate operating systems course during my time in grad school. About 60% of my office hours were occupied by questions and explanations of synchronization primitives, that is, semaphores and mutexes. Concurrency is tricky to wrap your head around, and takes a lot of practice to reason about. And Dijkstra, that rascal, first introduced the methods for synchronizing concurrent processes as a mere afterthought in another paper.
I have no words (okay, that’s an exaggeration; I always have plenty of words). For one, that would be unlikely to happen in today’s academic climate—papers have increasingly become smaller and smaller in scope, often doing the bare minimum to count as new research and get published. That wasn’t the case in 1968, when Dijkstra published this work in the ACM, but it feels refreshing and even audacious to the modern reader. Second, the amount of information and new research presented in just 6 pages? Legendary. By comparison, my Master’s thesis conveyed about 1% of the quality of information in about 10x as much space. Finally, regardless of how Dijkstra comes off (borderline insufferable, IMHO), it’s just a gutsy move to introduce a brand new (and fairly advanced) concept as part of a freaking appendix.
I’ll try to summarize the contributions of his paper without getting too technical, and I’ll leave some further reading for anyone who wants to learn more. In the mid to late 60’s, there was a race to produce a functional multiprogramming system. At that point in time, not only were mainframe computers extremely expensive, but they could only run one program at a time. Jobs that did a lot of I/O, such as reading or writing to memory, or otherwise interacting with peripherals, often wasted a lot of valuable computing time just waiting. Many different researchers came to the realization that what would be ideal is if the CPU could work on a different task while waiting for the time-consuming I/O to complete. Keep in mind that computers only had a single CPU at the time, so there wasn’t true concurrency happening.
For a completely non-technical analogy, imagine you’re making some toast, and you also need to do the dishes. You can’t actually be two places at once, but once you put the toast in the toaster and you’re waiting on it to cook, it’s not a very efficient use of your time to just stand there staring at the toaster. You could be doing the dishes in the meantime, and only return to your toast once you hear the familiar sound of it popping up (in computer parlance, that would be an interrupt). However, at the time, there wasn’t yet an established way to have the computer do anything while it was waiting. It did the equivalent of just staring at the toaster. When you pay several million dollars for something, you don’t really want to pay it to stare at a toaster, so to speak.
Dijkstra’s system was not the first multiprogramming system, nor was it ever commercially viable—it was developed at about the same time as two of the other most influential operating systems of that decade: IBM’s OS/360 and MIT’s Multics. As far as I’m aware, the only two truly new ideas Dijkstra introduced were paged virtual memory and semaphores. Beyond that, his paper succinctly laid out the layers of abstraction that still guide modern operating systems, and doubtless inspired system designers in the decades to come. Essentially, he laid out the concept that each layer of a computer system should be fully abstracted to all layers above it, which paved the way for all kinds of things. We take it for granted now that you can run Windows whether you have a Dell desktop or an Acer laptop. That wasn’t always the case. In the 60’s, pretty much each new hardware system had its own custom operating system.
Let’s return to how semaphores fit into all of this. Although abstraction allowed for some of the greatest improvements in computing (at least in terms of software), it also created some unique problems. Namely, at some point you must interact with the actual, physical system. It’s all well and good to have multiple programs running on the same system, politely taking turns and sharing the CPU, but what happens when multiple programs want to use the same hardware at the same time? Whatever virtual resources exist, all the programs are still just sharing one terminal, one printer, the same physical storage, etc. This is what’s known as a synchronization problem (it doesn’t just happen with hardware, either, but I’m trying to keep from going too long).
Let’s say I’m running two programs. One of them wants to write “Geckos are cool” to the terminal, and the other one wants to write “I love cats”. I’m running both at the same time. If there’s no way for one program to claim the terminal while they’re both running, what I might end up getting is “IGe lockove cs are ccatoosl”. Both programs did what they were supposed to do, and yet I’m not very happy with the results. At its simplest, a semaphore is basically a flag that says whether or not a resource is free. If the first program was already using the terminal and writing about geckos, the second program would be put on a waiting list until the terminal was free again. “But wait,” you say, “that doesn’t actually sound like that impressive of an invention.” Compared to virtualizing the CPU, memory, and peripherals, I will admit that semaphores are much simpler. However, there’s more to them than it initially seems.
- The way that Dijkstra defines semaphores in his paper allows for some formal reasoning methods to be used in analyzing the system. Although formal reasoning has gone in and out of fashion in computing many times (and mostly out), it’s extremely useful for concurrency, particularly within an operating system, when one wants to be very, very sure that such situations as deadlock are not possible.
- Perhaps due to the deceptive simplicity of the problem, mistakes are common. Semaphore and concurrency problems, such as the Producer-Consumer problem and the Dining Philosophers, are tricky. Over half the CS juniors and seniors in our operating systems classes struggled to get them right.
- The other important observation is that these must be atomic operations. Simply, if two programs check the semaphore’s value at the same time, it must not be possible for both to see that it is free, and both gain access to the resource
It’s difficult to say much more without getting more technical than I intended. I’ll leave the curious reader with further reading. I have no ultimate point to make, beyond trying to bring awareness to what I believe is a very interesting part of the history of operating systems and Dijkstra’s legacy that is not particularly well-known, as far as I’m aware.
- The wikipedia entry on semaphores is actually quite good, and uses some non-technical analogies.
- OSTEP - Free PDF of the textbook I read in my first operating systems course. Depending on experience, you could possibly read the Semaphores chapter without context.
Many thanks to Jon Walpole for first exposing me to this article in his CS533 course.