Issue 4-31, August 4, 1999

Be Engineering Insights: The Kernel Programming Model Revisited

By Cyril Meurillon

Lately, it's been fashionable to write about the driver model and API in our newsletter. Being very hip, I couldn't possibly avoid the subject.

We've had several articles on the basic rules a driver writer must follow -- rules that apply to all kernel code and that define the kernel programming model. Still, too often I run into code that breaks them -- and it's not only third-party; I've found questionable code in house as well. Perhaps the blame lies with my broken English—but I'll try again to explain the basic rules. This time I'll tell you why you have to respect them and what happens if you don't. I hope that will help you to memorize them. If not, at least the details of how the kernel works should make the curious developer happy.

In the rest of the article, I'll use the terms "interrupt handler" and "spinlock section." By interrupt handler, I mean a routine that serves either IO interrupt (installed using install io interrupt) or timer interrupt (scheduled using add timer). By spinlock section, I mean the critical section protected by a spinlock.

If you have questions or comments about this article, don't hesitate to e-mail me: cyril@be.com.

Interrupt handlers and spinlock sections cannot be preempted.

It is illegal to cause a preemption while in an interrupt handler or spinlock section. A preemption can occur, for example, when release sem() is called without the B_DO_NOT_RESCHEDULE flag. Instead, the handler should return B_INVOKE_SCHEDULER if preemption is desired after the interrupt has been processed—say because a semaphore was released. Preemption can also occur when interrupts are enabled. This explains why it's necessary to disable interrupts before acquiring a spinlock—and restore them after releasing it.

Depending on the exact context, various *bad things* can happen if you break this rule.

  • With interrupt handlers:

    -- IO interrupt handlers: with level triggered interrupts, the CPU can take new interrupts of the same level only after the interrupt handler returns. In other terms, interrupts of the same level are masked for the time the handler runs. If the handler is preempted, there is no guarantee when it will resume execution. The interrupted thread may be very low priority for example, in which case it can wait tenths of a second before it is run again. Blocking interrupts for that long is *very bad*.

    -- Timer handlers: if a timer handler is preempted, the other timer handlers that were scheduled to run at the same time will run only after the preempted thread resumes execution, which can be tenths of a second later, as we've seen. That would also have terrible effects on interrupt latency, which is currently below a couple of hundred microseconds.

  • With spinlock sections

    -- Preemption in the middle of a critical section can lead to deadlocks. The classic priority inversion problem, for example, leads to a system blockage when using spinlocks instead of semaphores. Imagine thread A, priority 10, grabs a spinlock. It is preempted in the critical section. Thread B, real time priority 100, attempts to grab the spinlock and loops while waiting for it. Because B is real time and does not block, the scheduler will never run A again. Hence the deadlock.

Interrupt handlers and spinlock sections cannot block.

Just as interrupt handlers and spinlock sections can't be preempted, neither can they block. Blocking can occur directly by calling acquire_sem(). It can also occur indirectly as the result of invoking a blocking kernel call—like malloc() or read_port()—or even by touching unlocked memory, which causes the thread to enter VM. The list of calls that can potentially block is long, and includes some you wouldn't suspect. For example, delete_sem() blocks because it calls free() to free the semaphore structure. remove_io_interrupt_handler() blocks because it waits for any executing instance of the removed handler to complete execution. And so on. Therefore, as I've written in previous articles, don't invoke any function that is not explicitly allowed. You can find a list of permissible functions in my article "Be Engineering Insights: Attention Driver Writers!."

If you feel you really need to make a call that's not on the list, the best thing to do is e-mail me about it.

Here are some insights about what can happen if you don't respect this rule:

  • With interrupt handlers: The scheduler is designed so that there must always be one null thread ready to run per CPU in case all other threads are blocked. If the interrupt handler that blocks happened to interrupt a null thread, it won't be ready to run any more—and the kernel will most likely die a horrible death.

    Also, blocking in interrupt handlers would break the semantic of thread priority. For example, the highest priority real time thread is guaranteed to run until it blocks. This would no longer be true if it was interrupted and blocked by some interrupt.

  • With spinlock sections: Similar to preemption, blocking also causes a deadlock in the classic priority inversion problem. This is the same scenario that was discussed above.

    Another scenario that leads to a deadlock, but involves VM is this: thread A touches unlocked memory in a spinlock section. It enters VM that needs to page in the frame. VM calls read() on the disk driver, which programs the IO and waits for the disk interrupt. Then thread B runs and tries to grab the spinlock, with interrupt masked, of course. The processor will never be able to service the disk interrupt. We have a clear deadlock.

    A variation of this scenario possibly applies to a misbehaved IO interrupt handler as well. For example, if the interrupt handler serves a device on PCI (level-sensitive interrupts) and has the same interrupt priority as the disk device (PCI also), then the disk interrupt cannot be propagated to the CPU until the interrupt handler returns, which won't happen until the page fault is processed...

About pairing acquire_spinlock() and release_spinlock().

Spinlocks must be initialized to 0 before they are used. Every call to acquire_spinlock() must be paired with a call to release_spinlock() on the same spinlock. This defines a critical section. If you nest spinlocks, i.e., define critical sections within critical sections, it is essential that the spinlocks are released in the opposite order from which they are acquired.

The reason for this is that the kernel keeps track of which spinlocks are held and which are being waited on. The logic that does the tracking expects spinlock to be used in a mutex manner, or to put it differently, that spinlocks are used to protect critical sections. This implies that spinlock must be initialized to 0, that acquiring and releasing it are exactly paired, and that nesting is done in an ordered manner.

You may wonder why the kernel keeps track of spinlocks. On a multiple CPU system, deadlocks can occur under some circumstances involving intercpu interrupts. The kernel has to do the tracking in order to detect and break those deadlocks.

Imagine a dual CPU system. CPU 0 acquires spinlock S. CPU 1 tries to acquire S, but keeps looping because S is held already. Then CPU 0 sends a intercpu interrupt to CPU 1. This means that CPU 0 signals CPU 1 and waits for the interrupt to be taken. But interrupts are masked on CPU 1 because we are in a critical section. The system is deadlocked. Other, more complex, scenarios involving intercpu interrupts, more spinlocks, and/or more CPUs also lead to deadlocks.

Those deadlocks are dealt with in acquire_spinlock(). When a cycle is detected in the dependency graph, acquire_spinlock() "flushes" the intercpu interrupt by executing the service routine. This releases the CPU waiting on the interrupt to be processed and doing this it breaks the deadlock.


Be Engineering Insights: Application Server Q&A

By George Hoffman

I have embarked on a bold experiment. I have forsworn all soda (I already abjure coffee), I eat no snacks while programming, and I consume small, balanced meals. The idea is that soon enough I will be fit and trim: a svelte coding Adonis. Of course, my productivity dropped to almost nil for the first few days of my new diet, but I am slowly learning to coax my brain into something resembling functionality without the use of artificial stimulants or buckets of microwave popcorn. It is a liberating experience—I highly recommend it. Unless you have any deadlines coming up soon...

The deadline for this newsletter article crept up on me and pounced like a Stanford senior at Full Moon On the Quad. None of the scraps of sample code I've been working on were ready, and 4.5 was not a heavy feature release, so I had no new features to blab about. This being the case, I was unprepared to discharge my obligation to you, my loyal legions of fanatically devoted readers, with an article of the quality you have come to expect (i.e., good).

My friends, I felt guilt. I felt shame. I very nearly felt a deep sense of despair. I was confronted with the realization that I would have to Let You Down with some goofy article about C++ coding style, of the type Pavel usually subjects you to.

Then I realized, with dawning consciousness, that you ungrateful scrubs never read my newsletter articles anyway! Not a week goes by that I don't read about or hear some developer complaining about a missing feature which I implemented (and documented in a newsletter article!) many months ago. Book 'em! Ignorance of the API is no excuse!

Yeah, yeah, I know it's not in the Be Book yet, punk. Are you here to complain or are you here to code? Read the newsletter. Read the headers. Have some friggin' balls! Slam it on the table, open yer wussie electrocuted-C-mode text editor, and get to work!

So anyway, in my righteous wrath, I decided to use this article to answer some frequently asked questions about the app server and related low-flying phenomena. Feature requests. Bug reports. I'm keeping the communication lines open. I want us to have a trusting, caring relationship.

Q:

How come you don't allow BViews to be "transparent" so they can draw over a background drawn by their parent?

A:

We do. Set the flag B_DRAW_ON_CHILDREN in the parent view, and the parent's drawing will not be clipped to exclude its children. If you also set the child view's color to be B_TRANSPARENT_COLOR, you'll get the effect you're looking for. For more about this, see my previous newsletter articles.

Q:

Why does the app server redraw all my views when I resize my window, even when they're just moving around in the window and a simple screen-to-screen blit would suffice?

A:

Sorry about that. I'm gonna fix it as soon as I find time (probably not for R5; possibly in the release after that). It will use the view transaction mechanism, so be sure to use that if you want to take advantage of screen-to-screen blit optimization (when it gets implemented) when moving your own views around.

Q:

I have a bug: when I draw into a BBitmap with a BView and then blit the bitmap to the screen, my drawing sometimes doesn't show up.

A:

This "bug" is submitted every few weeks. It is not a bug; the behavior is fully and correctly documented in the Be Book. You are not guaranteed that your drawing will be completed until you call BView::Sync() on the view you were drawing in. So, call Sync() on the view attached to the bitmap before you blit it to the screen (or another bitmap).

Q:

Can I have Chelsea's phone number?

A:

No.

Q:

When are we gonna get anti-aliased primitives?

A:

Another popular one. There are several challenges to doing anti-aliased primitives well (i.e., fast), and it would require some re-architecting of the rendering pipeline. This is not scheduled right now, so I just don't know when we'll have it.

Q:

Why isn't my CyberVision Wizzbang2300 AGP card supported?

A:

We are producing drivers as fast as our brains and (the more limiting factor) the documentation we are given by hardware companies allow. If you need a driver for a particular card, tell us, but also please contact the hardware company and pressure them to work with us to produce one, or at least to give us the required documentation.

Q:

D00d! It would be so |{-r/\|) 31I+e if u would do themes, and translucent windows, like Enlightenment under X. I use E all the time, and I have this /\\/\/E50/\/\3 theme with like, aliens, and Richard Stallman, and he's like, riding this giant butterfly...

A:

Translucent windows, as they appear on Enlightenment, are more or less a sham, meaning that the effect breaks down as soon as whatever the window is over changes. The same trick can be done under BeOS (with no help from me, I might add), but it's not really all that exciting. (I'm speaking from my experience with Enlightenment, which is several releases out of date, but I doubt the fundamentals have changed. If I'm wrong, and the E guys have managed to make X jump through a hoop that high, kudos to them.)

Similarly, the tricks being pulled in Windows NT 5.0 (with menus that fade in and out, etc.) are not applicable to a real, general purpose solution.

Doing "real" translucent windows on BeOS, that look good, feel good, behave as expected, and do not interfere unduly with performance, is a decidedly nontrivial task. Such a project is not scheduled or close to being scheduled, although some behind-the-scenes work planned for the next few releases will make it easier to do, once it becomes a goal.

Q:

How about themes?

A:

Also not scheduled, although you'll see a bit of work on prettying up the interface for the next release.

Q:

How about nonrectangular windows?

A:

I've had several requests for this and I've done some work on it. It possibly will be in R5.

Q:

The app server is way slow on my machine! When I [do this] and then at the same time [do that], everything slows to a crawl! You/BeOS/the-app server/non-Open-Source-OSes suck!

A:

I am very interested in hearing about your performance problems, but in order for me to do anything about them I need to know the details. Submit a specific, detailed bug so that it can be assigned to the right person and reproduced. (Over half the time I see this complaint, it isn't the app server bogging the system down, but some other team.) If the bug just says, "The app server is slow... make it faster," it is useless to us and will be thrown away.

I hope this answers some of your common questions. If you have further questions or concerns, please feel free to contact me (geh@be.com).


Developers' Workshop: One Target Per Invoker Is Not Enough

By Owen Smith

Recently, the DTS oracle has been peppered with questions about messaging. One question that popped up was: "OK, all this sending of messages to one target is well and good, but what if I need to send a message to multiple targets?" This resulted in the rallying cry that I adopted as the title of this article—inspired by the BeBox slogan from a few years ago.

There are several ways to solve this problem, depending on what your requirements are. One way is to keep a list of BMessengers around, one for each target, and when you want to send a message, simply tell each messenger to send the message for you. Of course, it would be nice to encapsulate this functionality in a class, so I have:

<ftp://ftp.be.com/pub/samples/application_kit/MultiInvoker.zip>

This little piece of code implements a class that looks much like a BInvoker, and like BInvoker, it works well as a mix- in class. Its interface differs slightly from BInvoker, however, because instead of a single target, it maintains a list of targets. Appropriately, when you tell it to Invoke(), it simply runs through the list of targets and sends the message to each target.

Note that you can either add a target using the standard BHandler/BLooper pair of parameters, or you can create a BMessenger dynamically and hand it off to the MultiInvoker (the MultiInvoker deletes the messenger when it's done). The great advantage to the latter approach is that, if you have a specially derived BMessenger (e.g., one that sends messages to a target across a network), you can simply toss one of those puppies to a MultiInvoker and it will Just Work (tm)—an extra piece of flexibility that BInvoker doesn't provide.

This approach works well for any Observer patterns in your code (this terminology comes from "Design Patterns," by Gamma, Helm, Johnson, and Vlissidies, one of my all-time favorite books about programming). For example, the sample code archive for this week contains a simple test app with several observers. The application creates an Observable object that derives from MultiInvoker. When the time comes for the observable to broadcast an update, it simply calls Invoke() with a message that describes its state. The messaging mechanism inherent in MultiInvoker makes this approach particularly well-suited for multithreaded applications.

One limitation to this approach, however, is that the sender has to know who the targets are. Depending on your situation, you may want flexibility on the receiver's end about who receives the message—for example, if you're sending a message to a Mediator object, and you want the mediator to dispatch the message to the appropriate targets. That limitation can easily be overcome as well, by creatively using BLoopers or BMessageFilters—but that's a topic for another time...


BIT BY BIT: New Column

By Stephen Beaulieu

One of the main tasks of Be's Developer Technical Support Team is to provide sample code to help you (our developers) ship your applications. Our sample code library has grown substantially, but most of the code is in the form of simple applications or utilities that demonstrate interesting features of our APIs. These apps are generally limited in scope; they detail how to perform queries, print, use menus, and so on.

Unfortunately, such code doesn't adequately demonstrate how to write real world applications that deal with many of these areas combined. Also, parts of our APIs aren't in sample code at all, because they're more useful as the complexity of applications increases.

The purpose of this new column, "Bit by Bit" is to construct real apps over the course of many articles. Each article will introduce a BeOS programming concept, and will gradually advance the complexity of the application. This iterative process should allow us to cover each subject simply but thoroughly. It will help you to understand not only the concepts presented, but also how they fit together in the overall structure of the application. The idea is to build an application that does everything the right way.

Next time I'll start our first application—Rephrase, a text processing application based on BTextView. Rephrase won't be a threat to other word processing apps, which generally implement their own text engines—but look out StyledEdit!

The first installment will focus on the fundamentals of the BeOS programming model: the application, the window, and the view.

See you next week!

Creative Commons License
Legal Notice
This work is licensed under a Creative Commons Attribution-Non commercial-No Derivative Works 3.0 License.