Here is the article that appeared in MacTech, in the January 1997 issue (Vol. 13, #1). Since an earlier draft of the article accidentally went to press, the MacTech crew (Jessica Courtney, Eric Gundrum, Neil Ticktin) kindly allowed me to put this up-to-date revision on the web.

The source code is also available separately.

Thanks to Matt Neuburg, former Managing Editor at MacTech, for having proposed me to write an article about the Thread Manager, and to Steve Sisak whose suggestions helped clarifying some issues and especially the charts.

Here is some feedback on the article, containing additional insight.

Update: At long last the Multiprocessing (MP) API Library can run on systems with Virtual Memory enabled! This is fixed in Mac OS 8.6, as documented in TechNote 1163: Mac OS 8.6 - Multiprocessing API Library.

Update: If you need threads that are compatible all the way back to System 7, you should check out Red Shed Threads. The Red Shed website also has the best papers about concurrent processing on the Mac I've ever seen, including details on how atomic instructions work on the 68K and the PowerPC.

Last updated: January 6, 2001

Preempting the Mac

By Fabrizio Oddone, Torino, Italy

How to use preemptive threads and how well they work

Introduction

This article focuses on the little known facts about the Thread Manager. I assume that the reader is already familiar with the relevant Thread Manager documentation listed in the Bibliography.

Purpose of this article is to invite developers not to overlook the current Thread Manager preemptive capabilities, and to use them to improve application responsiveness on 680x0 Macs; this may even prove useful in preparation for Mac OS 8. In the process I point out some shortcomings and suggest workarounds.

Why Preemptive Threads?

One widely heard complaint against the MacOS is its supposed lack of preemptive multitasking, dubbed also "true" multitasking by the lovers of George Boole. The Thread Manager provides this "longed for" capability, but a number of gotchas (besides the ones listed in the Thread Manager 2.0 documentation, Part Three) have steered developers away from adopting its most attractive feature:

You cannot call most of the Toolbox from within a preemptive thread.
Preemptive threads use only half of the CPU power available to the application (the other half is reserved to cooperative threads).
Preemptive threads are not available on PowerMacs.
There is no system-supported semaphore API (the library enclosed with the Apple SDK is 680x0 only).

Before delving into each of these gripes I would like to warn you against one very nasty Thread Manager bug. You must take this bug into account if you are willing to make use of preemptive threads in your application.

One Word of Caution

Thread Manager versions earlier than 2.1 had a "feature" that made preemptive threads practically unusable [Bechtel, 1995]:

Preemptive threads did not preempt after the first threaded application launched in Threads 2.0.1. This was fixed.

Apple, in their infinite wisdom, does not tell us how to detect the fundamental bug fix (did I miss a TechNote?); the Universal Headers 2.1.2 (the ones included with CodeWarrior 10) lack the relevant information as well. We lucky Gestalt Selectors List dwellers (thanks to Rene G.A. Ros for maintaining the mailing list, by the way) have figured out a tentative answer:

Listing 1: Gestalt.c

GestaltCheck

Checks whether a reliable Thread Manager is installed. This routine should deliver TRUE if you can safely call the Thread Manager API, FALSE otherwise.

Boolean GestaltCheck(void)
{
enum {
// preemptive scheduler fix present?
gestaltSchedulerFix = 3
};
long     Gresp;
Boolean     pThreads = false;

if (TrapAvailable(_Gestalt)) {
     if (Gestalt(gestaltThreadMgrAttr, &Gresp) == noErr) {
          pThreads = (Gresp & (1L << gestaltThreadMgrPresent)) &&
                         (Gresp & (1L << gestaltSchedulerFix));
          }
     }
// If we are compiling for the Code Fragment Manager,
// check whether we can successfully call the library.
// Remember that ThreadsLib should be imported "weak".
// The gestaltThreadsLibraryPresent bit is not correctly set
// sometimes, so we don't rely on it.
#if GENERATINGCFM
if (pThreads)
     if (NewThread == (void *)kUnresolvedCFragSymbolAddress)
          pThreads = false;
#endif


return pThreads;
}

Since preemptive threads do not work as expected under outdated Thread Managers and a fixed version is freely available, your best bet is requiring the bug fix at all times. This is not official Apple gospel, so you will have to take my word for it.

Cannot call the Toolbox, forever?

Many developers usually surrender, maybe hoping that Mac OS 8 would improve matters. Indeed, preemptive tasks in Mac OS 8 should allow a little more freedom. The Toolbox and older applications would run cooperatively, sharing the same address space; only specifically written code, allowed to call a subset of the Toolbox, may be entitled to run preemptively in a separate, protected address space. The solution, as always, entails factoring your application: keep the user interface code (using the Toolbox) clearly apart from the actual code (not relying on the Toolbox). If you manage to run preemptively under the Thread Manager today, chances are that you will run with little or no effort preemptively and safely under Mac OS 8. It is most important to keep the parts shared by the preemptive thread and the host application to a minimum, and clearly documented. Currently a thread has complete access to the application memory and globals, whereas a Mac OS 8 task may not, because of protected memory.

In the past years Copland's name has been bandied about often enough; I shall now turn to what is available today.

Half-witted?

The Thread Manager 2.0 documentation states:

Preemptive threads are not required to make yield calls to cause a context switch (although they certainly may) and share 50% of their CPU time with the currently executing cooperative thread. However, calling yield from a preemptive thread is desirable if that thread is not currently busy.

This paragraph has given birth to an ill-conceived superstition: that any calculation would run at half the speed if assigned to a preemptive thread, all else being inactive. Thorough tests clearly demonstrate that the facts are different and somewhat surprising. I grabbed the Apple sample code implementing the Dhrystone benchmark, and I modified it in order to use either preemptive or cooperative threads (CW10 Gold Reference:MacOS System Extensions:Thread Manager:Sample Applications:68k Examples:Traffic Threads).

Our main event loop is structured like this:

Listing 2: EventLoop.c

EventLoop

This is our simple event loop. If you are calculating and need to use the CPU as much as possible, pass a zero sleep time. If the Mac is executing only your calculations and you pass X ticks, those X ticks are actually lost. Remember to reset the sleep parameter to a reasonable value (usually CaretTime if a blinking cursor is visible) when your application is idle again.

void EventLoop(void)
{
EventRecord     event;

do {
     if (WaitNextEvent(everyEvent, &event, 0UL, nil)) {
          DoEvent(&event);
// this is a "busy" cooperative loop,
// waiting for the preemptive thread to finish;
// useful to evaluate whether the CPU is
// evenly scheduled between the main cooperative
// thread and the preemptive thread
#if _TESTBALANCE
          while (gDhrystoneDone == false) ;
#endif
          }
     else {
// the else clause executes when no events are pending;
// since we want to give preference to the calculating thread,
// we explicitly tell the scheduler;
// if you are using many calculating threads
// keep them in a list, in order to yield to each
          (void) YieldToThread(gDhrystoneThreadID);
          if (gDhrystoneDone) {
// stuff used to update the window and the log file removed
               gDhrystoneDone = false;
               (void) SetThreadState(gDhrystoneThreadID, kReadyThreadState, kNoThreadID);
               }
          }
     }
while ( gQuit == false );
}

We want to evaluate how much time the Mac actively spends calculating, thus we need to establish a proper frame of reference. Since we are probing the Operating System's behavior, but we want our findings independent of the relative speed of each Mac model, we set each Mac maximum performance level equal to 100%. With "maximum performance level" we mean the one obtained executing the test calculation within a cooperative thread that never yields the CPU. Note that we nitpickers are also very interested in evaluating possible behavioral changes when the application is kept in the background (as opposed to the foreground) and nothing else is running. Cooperative threads, when yielding, yield the CPU every 20 ticks (1/3 of a second); preemptive threads do not need to yield and in fact never yield in our test.

Listing 3: Yield.c

Dhrystone

This shows my calculation routine that may be called by a cooperative or preemptive thread. The symbol _COOPYIELD must be set to 0 in the latter case.

void
Dhrystone(void)
{
// other variables removed for clarity
register UInt32          Run_Index;

// the following gets compiled only when
// we don't use preemptive threads;
// this is very important since you shall NEVER
// call TickCount() within a preemptive thread!
#if     _COOPYIELD
UInt32          base_Time = TickCount();
UInt32          curr_Time;
#endif


/* Initializations */
// initialization stuff removed
for (Run_Index = 1; Run_Index <= kNumber_Of_Runs; ++Run_Index) {
//          if((Run_Index & 0xFFF) == 0) YieldToAnyThread();
// the above method was originally used in the Apple sample;
// decidedly unwise, since slower Macs will yield too little
// (impairing responsiveness) and faster Macs will yield too
// much (wasting precious CPU time)


// calculating stuff removed


// actual yielding code
#if     _COOPYIELD
     curr_Time = TickCount();
     if (curr_Time > base_Time + 20UL) {
          YieldToAnyThread();
          base_Time = curr_Time;
          }
#endif
     } // loop "for Run_Index"
}

All Macs were tested when running under System 7.5.1 with extensions turned off (the System incorporates Thread Manager 2.1.1), except for the SE/30 under 32 bit mode, having MODE32 7.5 installed. If not specified, 32 bit mode is implied in any case. No other applications were running, and the mouse was left quiet.

Results Cooperative fg Cooperative bg Preemptive fg Preemptive bg
Classic 93.804 82.401 43.399 59.169
LC II 95.413 78.853 71.072 48.539
SE/30 24 bit 99.994 90.792 83.112 66.813
SE/30 32 bit 96.599 88.260 80.976 65.764
IIvx 24 bit 98.913 95.509 86.642 88.582
IIvx 32 bit 93.640 91.711 84.880 86.737
Quadra 700 24 bit 98.841 92.274 92.600 74.585
Quadra 700 32 bit 98.593 92.189 92.994 74.897
PB 540 99.354 96.901 92.743 87.395

Note that the displayed results are averaged on a reasonable number of runs (ten runs at most, sometimes less since the timings settle quickly). I always treated the first run as an outlier and discarded it (because of window updates, major context switches, etc.).

Except for a couple of Mac models, the situation is much better than one would expect, in the light of the 50% passage I have previously quoted from the Thread Manager documentation. This probably happens because of the explicit YieldToThread() call at idle time. However, our desire is to observe a constant pattern across Mac models, since we normalized our data against the faster result on each Mac. On the contrary, we cannot but spot a significant and annoying variability in the measured behavior. A chart sharply supports our contention:

Figure 1. Thread methods compared

I am completely at a loss here. The very same program under the very same Operating System version behaves differently, depending on the Mac model. Just when you thought that computers were deterministic devices... Let's now look at the same data under another perspective:

Figure 2. Macs compared

A quick glance at the chart may fool the reader into thinking that Macs perform better under 24 bit mode. This is not true, and remember the normalization trick. The absolute timings show that, in the faster calculation, 32 bit mode always outperforms 24 bit mode. To understand this, we quote develop #9 (Winter 1992), p. 87: "Turning on 32-bit addressing helps because it reduces interrupt handler overhead." Also, some parts of the Toolbox may run faster when 32 bit mode is on; notably QuickDraw. Rather unusually, when using threads the opposite happens: calculations proceed at a better pace under 24 bit mode. Apple is not known for being quick at repartee; nonetheless we are all ears, waiting for a detailed explanation upon this subject.

Even the worst case situation portrays a variable outcome, though less pronounced. We obtain this chart by setting the _TESTBALANCE symbol to 1. (See Listing 2.) We conclude that the CPU is not evenly divided between the main cooperative thread and the preemptive thread: the former has a little, but significant, Mac-dependent scheduling advantage.

Figure 3. Worst case situation

Although with my data collection at hand I cannot but reproach the slouching gait of preemptive threads, I still think that reengineering an existing application (or writing one from scratch) letting the user choose between preemptive and cooperative threads has no contraindications of sort. The potential speed loss occurring with preemptive threads is indeed bearable, especially if you think that one popular decompression utility (hint, hint!) runs in background at an appalling speed: its CPU utilization barely reaches 2% (yes, two percent!) yet nobody looks concerned. Preemptive threads, besides increasing responsiveness, add one benefit: when your application is in front, they run even while the user is dragging things around or attempting menu selections.

One last remark for those who are screaming since the start of this section: "If you don't like the default scheduler, write your own! The Thread Manager allows this!". My answer is simple: custom schedulers are intended (at least they should be) for unusual situations, not for fairly standard programming constructs. Remember than programmers, though superhuman to some extent, are mere mortals themselves. Therefore, Donald Norman's motto is still valid: "Activities that are easy to do tend to get done; those that are difficult tend not to get done."

Preemptive threads not available on PowerMacs

This is not a good reason for leaving us poor 680x0 diehards (I don't own a PowerMac yet) with sluggish applications. I have just shown that you can easily remodel a preemptive thread into a CPU-yielding cooperative thread by conditionally compiling a short bunch of code. However, since we have advocated a preferences-based option, and we also want to take advantage of preemptive threads on PowerMacs automatically, in case Apple decides they are worth the effort, we have to modify the previous listing appropriately:

Listing 4: YieldRealWorld.c

DhrystoneR

This shows a real-world calculation routine used either cooperatively or preemptively, depending on a global setting.

void
DhrystoneR(void)
{
// other variables removed for clarity
register UInt32          Run_Index;

UInt32          base_Time;
UInt32          curr_Time;


/* Initializations */
if (gUseCooperative)
     base_Time = TickCount();
// other initialization stuff removed
for (Run_Index = 1; Run_Index <= kNumber_Of_Runs; ++Run_Index) {


// calculating stuff removed


// actual yielding code
     if (gUseCooperative) {
          curr_Time = TickCount();
          if (curr_Time > base_Time + 20UL) {
               YieldToAnyThread();
               base_Time = curr_Time;
               }
          }


     } // loop "for Run_Index"
}

Of course the application would check at initialization time whether preemptive threads are available or not (calling NewThread() and checking for paramErr), and gray out the relevant choice in the latter case. Speaking about user friendliness, I think that most users are neither aware, nor interested in the cooperative vs. preemptive issue, so we should label the two choices avoiding technical jargon.

As a last remark, while it is true that you cannot spawn preemptive threads in native mode, you can in emulation mode. My Disk Charmer application takes advantage of it.

Multiprocessing Trivia

Multiprocessing is the wave of the present, especially now that the BeOS has been unveiled and multiprocessor Macs are available from Apple and other vendors. It would be great if the Thread Manager could automatically allocate preemptive threads on multiple processors, but this does not emerge from the Multiprocessing (MP) API [MP, May 1995] Daystar and Apple developed. Instead, one has to call yet another API instead of the Thread Manager's, in order to benefit from the added CPU horsepower. At any rate, the rules an MP task must follow are the same as those pertaining preemptive threads, so if you support preemptive threads you are probably ready for multiprocessing as well.
There is one serious drawback in the current MP API implementation, though: virtual memory must be turned off (the MP API Specification hints that this only necessary if the computer has more than one CPU, but the use of the word "probably" in that context is not very reassuring).

No semaphores

I think you'll have to live without them, at least until they are explicitly supported (the Multiprocessing API supports semaphores and other synchronization constructs.) There are reasons, however, that suggest to avoid the semaphore construct whenever possible. Let me try to clear the mist (or add to the confusion).

semaphores are good
This is taken from Silberschatz-Galvin [1994], p. 186:
Although semaphores provide a convenient and effective mechanism for process synchronization, their incorrect use can still result in timing errors that are difficult to detect, since these errors happen only if some particular execution sequences take place, and these sequences do not always occur.
See also this brief excerpt from the Ada95 Rationale:
[...] avoided the methodological difficulties encountered by the use of low level primitives such as semaphores and signals. As is well known, such low-level primitives suffer from similar problems as gotos; it is obvious what they do and they are trivial to implement but in practice easy to misuse and can lead to programs which are difficult to maintain. [...]
For these reasons, the adoption of higher level constructs is advocated (interested readers may refer to the texts above cited). The semaphore/goto simile reminds me that, oddly enough, the man who first attacked gotos [Dijkstra, 1968] is the one who earlier introduced semaphores [Dijkstra, 1965].
semaphores are efficient
Semaphores may be efficient, though it depends on the implementation. Months ago I had to write the customary "dining philosophers" program using the UN*X semaphore primitives. Running under HP-UX 9 on a 68040 HP workstation produced disconcerting results. You run the program (for the record, I used an asymmetric solution) with some forty semaphores/processes, and as the CPU load skyrockets, the whole computer unbelievably slows down, crawls, and withers. The hapless user at the console can barely move the mouse (if the dreaded X Window System is running), worse than some Mac models while initializing a floppy. Responsiveness improves slightly, but not to a reasonable degree, if you set the friendliest priority with the 'nice' command. By the way, did you know that UN*X has two different 'nice' commands, one built into the cshell, one available as an external command, each one having its own command syntax?

So, my advice when it comes to synchronizing primitives is: if the programming language you are using supports tasking constructs (for example, Ada95) go for it. As an added advantage, you may easily port your tasking code on different platforms. If you are stuck with a mainstream language without tasking support (Pascal, C, C++) stay with the primitives you are offered.

On a related note, on the Internet I've seen some semaphore implementations that use the Enqueue() and Dequeue() system calls. Provided that you are only using threads and not other interrupt-level code, this method is overkill because the abovementioned system calls disable interrupts. The Thread Manager API is more desirable because the relevant critical region calls disable thread preemption only, leaving interrupts enabled [Anderson-Post, 1994].

At the last minute, I discovered that somebody indeed rolled up his sleeves and wrote a better semaphore library. His name is Stuart Cheshire and his library can be found on the Apprentice CD-ROM 4 (Apprentice:Information:Programming:Stuart's Tech Notes:Stu'sThreadUtils) or on the newer Apprentice CD-ROM 5 (Apprentice 5:Source Code:C:Snippets:Stuart's Tech Notes:Stu'sThreadUtils) and maybe on the Internet, too. I have not had the time to experiment with it, but after reading his enlightening and witty technotes I am convinced that he is way competent, so his stuff may work in the end.

I/O with Preemptive Threads

Things get a little tricky, says the article on develop #17. Indeed, attempting to call a simple PBReadAsync() from a preemptive thread produced enough random deadlocks to make me think it is not worth the trouble. Oddly enough, the formatting and verifying PBControl() calls seem to work (that is how Disk Charmer does its magic), but you usually won't need them. If you still want to put yourself out with I/O within preemptive threads, pay attention to the stack size; stack usage varies greatly among Mac models, and there are stack-related problems in some low-level Managers. I suggest to never allocate less than the default thread stack size, unless after careful, extensive, thorough testing.

Wish List and Concluding Remarks

Concurrent processing is getting more and more of a hassle for us developers, due to the API multiplication miracle and the lack of a native Event Manager. Developers are much alike Buridan's donkey; when faced with many alternative choices, they get hopelessly confused. (For those who like ancient stuff, the story about the donkey, bound to starve to death when confronted with equal heaps of fodder, is probably due to Aristotle, not to the French philosopher Buridan.) I have in mind the shared library labyrinth (Component Manager vs. CFM vs. ASLM vs. SOM) that Quinn aptly described in develop #23. Of course, the Apple documentation grows exponentially in the process, a very welcome development for us dogged readers. The not too distant future has even more thrilling details to deal with: Thread Manager preemptive threads are not supported under CFM68K (but who cares, after the recent CFM68K fiasco? [Wenzel, 1996]); you may need to patch your event loop once more, in order to work around a Notification Manager bug (yet another TechNote to read); under Mac OS 8, the WaitNextEvent() sleep parameter shouldn't be set to zero when processing from within a separate preemptive task. The outcome is an overly tangled mass of code with conditional preprocessor directives sprinkled here and there.

What is desperately needed at this point is a single, multiprocessing-scalable, levelheaded scheduling API, that runs natively on both 680x0 and PowerPC with any version of whatever system component, allowing for maximum CPU utilization in a simple way. Am I asking for too much?

That's all, folks. You have enough material to bash Apple for the next few weeks, and enough enthusiasm to dive head over heels into preemptive threads!

Bibliography and References

Anderson, Eric and Post, Brad. "Concurrent Programming with the Thread Manager". develop, The Apple Technical Journal, issue 17 (March 1994), pp. 73-98. Apple Computer's Developer Press.

Anderson, Eric & friends. "Thread Manager for Macintosh Applications". Final Draft, Revision 2.0 (January 24, 1994). [CW7 Gold Reference:MacOS System Extensions:Thread Manager 2.1:Thread Manager Documentation].

Bechtel, Brian. "System 7.5 Update 1.0". TechNote OS 07 (February 1995).

Dijkstra, E. W. "Cooperating sequential processes". Technical Report EWD-123, Technological University, Eindhoven, the Netherlands, (1965). Reprinted in [Genuys, 1968], p. 43-112.

Dijkstra, E. W. "GOTO statement considered harmful". Communications of the ACM, 11.3.147 (1968). ACM Press.

Genuys, F. (editor). Programming Languages (1968). Academic Press, London, England.

Silberschatz, Abraham and Galvin, Peter B. Operating System Concepts, Fourth Edition (1994). Addison-Wesley.

"Multiprocessor API Specification" prepared by Apple Computer and DayStar Digital, Inc. for the WorldWide Developers Conference (May 1995).

Wenzel, Karen. "Running CFM-68K Code at Interrupt Time: Is Your Code at Risk?". TechNote 1084 (November 1996, later revised).

Relevant Internet URLs

Multiprocessing sites:

You can download for free the Ada95 Rationale at the Ada home

The Gestalt Selectors List

The CFM68K Hall of Shame

This page was last built with Frontier on a Macintosh on Mon, Apr 22, 2002 at 9:41:57 PM by

Fabrizio Oddone, fab@kagi.com

Results	Cooperative fg	Cooperative bg	Preemptive fg	Preemptive bg
Classic	93.804	82.401	43.399	59.169
LC II	95.413	78.853	71.072	48.539
SE/30 24 bit	99.994	90.792	83.112	66.813
SE/30 32 bit	96.599	88.260	80.976	65.764
IIvx 24 bit	98.913	95.509	86.642	88.582
IIvx 32 bit	93.640	91.711	84.880	86.737
Quadra 700 24 bit	98.841	92.274	92.600	74.585
Quadra 700 32 bit	98.593	92.189	92.994	74.897
PB 540	99.354	96.901	92.743	87.395