Be Newsletters - Volume 4: 1999

Issue 4-45, November 10, 1999

Be Engineering Insights: The Scheduler Is Your Friend

By Jean-Baptiste Queru

Buried deep at the very heart of BeOS is a silent agent that rules the traffic inside the computer: the scheduler. The scheduler's job is basically to choose which thread will now run, and this happens whenever some special events trigger. The scheduler is one of the components that enable preemptive multitasking.

Warning

The technique described in this article is an unconventional way of programming (in other words, a hack). While it is believed to be reasonably safe, it has a number of side effects. Using it might make the system behave oddly, and/or exhibit problems in programs that otherwise seemed to work.

Warning

This article only deals with the case of a single CPU system. The behaviour of a multi-CPU system is somewhat more complex and is beyond the scope of this article

When Is the Scheduler Invoked?

There are different events that can cause the scheduler to trigger:

Blocking calls: any function that a thread can call to make its state switch from B_THREAD_RUNNING to one of B_THREAD_SUSPENDED, B_THREAD_WAITING, B_THREAD_RECEIVING and B_THREAD_ASLEEP. Those functions obviously include acquire_sem(), receive_data(), and snooze(), as well as suspend_thread() when a thread calls it with its own thread id. There are also many indirect cases that can cause a thread to block; those cases include kernel calls to wait_for_thread(), read_port() and write_port(). Be aware too that the BeOS virtual memory subsystem is protected by semaphores, which means that almost any instruction can cause a thread to block (e.g., when handling a page fault, when checking parameters for a kernel call, when dealing with areas...)
Synchronous preemption: any function that causes a thread that was blocked to become ready again will also invoke the scheduler. Those functions include resume_thread(), kill_thread(), release_sem(), delete_sem(), send_data(), the death of a thread, and lots of others. There are also many indirect cases that can cause a synchronous preemption, including calls to read_port() and write_port(), and any case where a thread exits the virtual memory subsystem.
Asynchronous preemption: whenever a piece of hardware requires some attention from a CPU, it triggers an interrupt which causes the CPU to suddenly jump to another piece of code (called an interrupt handler). I'm not going to get into the details of programming an interrupt handler, but it's important to know that every interrupt handler has a chance to invoke the scheduler. One special kind of interrupt is very important: the timer interrupt. The CPU can use it to be warned after a given amount of time has elapsed. When that happens, the interrupt handler for the timer invokes the scheduler (that's actually all it does). That's the way you create preemptive multitasking. The scheduler uses the timer to be warned when a snooze() (or any other blocking call that has a timeout) expires, and to make sure that it gets invoked on a regular basis. On BeOS R4.5.x, the scheduler is guaranteed to run at least once every 3 ms. This amount of time is called the scheduler quantum.

How Does the Scheduler Choose Which Thread Is Going to Run?

Each time the scheduler is invoked, it chooses which is the next thread that's going to run. The scheduler can (obviously) only pick a thread that's ready to run. The scheduler uses an algorithm to pick that thread, depending on the priorities of all the threads that are ready to run. That algorithm is quite simple (and mostly explained in the Be Book):

If a real-time thread (i.e., a thread with a priority greater than 100) is ready to run, this thread is automatically scheduled. If there are several such threads, the one with the highest priority is scheduled.
If there is no real-time thread ready to run, the scheduler chooses a time-sharing thread (i.e., a thread with a priority between 1 and 99). The likeliness that such a thread be scheduled increases with a factor of two for such unit of priority.
The kernel keeps some special threads around, called idle threads, that are scheduled whenever no other thread is ready. Since those idle thread never block, the scheduler is guaranteed to always be able to schedule a thread.

What About Full-Screen Games Running at 60 fps?

First, why would someone ask this question? Let's look at the way a game typically works:

get input from the user
render a new image
wait until the vertical retrace and display the newly computed image
loop

and it's usually quite tricky (if not impossible) to know anything that's going to happen in the future.

Let's imagine that the game is single-threaded, running on a single-CPU system, at priority 20, and never blocks. The duration of a frame is 16.6 ms, which is approximately 5.5 scheduler quanta. Assuming nothing else happens, the system will (on average) reschedule five times in the middle of each frame. Let's now assume that another thread is running at priority 10, and that this thread never blocks either.

The difference between the priorities in 10, thus the "other" thread will be scheduled (on average) once every 1025 times, and will consume (on average) 0.1% of the CPU time. I see you think "Why do I care about 0.1%?". You care about 0.1%, because the scheduler will not spread the 0.1%s evenly across all the frames. This will occur (on average) once every 205 frames, which means that, once every 3 or 4 seconds, the game will only have 80% of the CPU power available. This sudden loss of 20% of the CPU power can be enough to make the game glitch.

So, What's the Fix?

Let's now imagine that we have the same game running at priority 20, and the annoying thread running at priority 10, but let's also imagine that the scheduler triggers every 100 us (microseconds). The scheduler will trigger on average 166 times per frame. The law of probabilities tell us that 85% of the time, the annoying thread never gets schedules during a frame. It gets scheduled once for 13.8% of the frames, twice for 1.1% of the frames, three times for 0.06% of the frames, and so on.

Overall, it means that the annoying thread will use 1/166 of the time in 13.8% of the frames, 2/166 of the time in 1.1% of the frames, and so on, which creates a total of (on average) the same 0.1% as we had before. So what changed? Well, probabilities tell us that the annoying thread will take less than 2% of the CPU time in more than 99.998% of the frames.

Wow, that's great, so why doesn't BeOS automatically reschedule every 100 us? Well, BeOS is optimized for multithreading, and rescheduling does not cost that much, but it doesn't come for free either. Rescheduling every 100 us costs several percent of the CPU power on a PII-350.

So, what's the point of talking about something that doesn't exist? If we look at the way the scheduler works, there's a way to trigger it a lot more often. If a real-time thread does something like while (1) { snooze(100); }, this thread will call snooze, which will invoke the scheduler, and the scheduler will schedule some other thread. After 100 us the real-time thread will be ready again. This will invoke the scheduler, which will schedule our real-time thread, which will loop back into snooze, which will invoke the scheduler again, and so on...

What About Non-Interactive Animations Running at 60 fps?

This case is very different. Since the animations are not interactive, the different frames can sometimes be buffered ahead so that the impact of losing 20% of the CPU time in a frame is not that important. Probabilities tell us that a program that uses 90% of the total CPU power and can buffer ahead 1/3 of its computation can resist two reschedules of another thread in two frames, which (on average) happens less than once every 10 minutes.

What Priorities Should I Use for a Full-Screen App?

Some people have been talking about using real-time priorities for tasks that require lots of CPU power. This is a very bad idea. One uses real-time priorities for tasks that do not use much (if any) CPU power, but that need to be rescheduled extremely quickly. A task that uses all available CPU power should not run at high priorities. You want the user to be able to switch to other applications and have those applications stay reasonably responsive.

Here's a rule of thumb: bumping the priority of a thread that already uses most of the CPU will not make it run much faster, but will make other tasks run much slower. The priority of a thread running a CPU-heavy task should be related to how long that thread is going to run:

If the thread is doing a task that the user is not going to wait for (e.g., something that might take several hours or several days), use the lowest possible priority (1).
If the thread is doing something that's still long, but short long enough so that the user might want to use the computer while waiting (e.g., something that lasts for a few minutes), use B_LOW_PRIORITY (or B_NORMAL_PRIORITY if the task doesn't take more than a minute).
If the thread is doing something related to direct user interaction through a regular user interface (e.g., if the thread redraws the preview of a filter in an image-procesing program while the user moves a slider), use B_NORMAL_PRIORITY if the task lasts more than a second, or B_DISPLAY_PRIORITY if the computation only takes a fraction of a second.
If the thread is doing some full-screen display, you can use B_DISPLAY_PRIORITY. B_URGENT_DISPLAY_PRIORITY should only be used for critical parts of a full-screen application (e.g., the game engine of an application might be running at B_URGENT_DISPLAY_PRIORITY, while the display threads should only run at B_DISPLAY_PRIORITY).

How Does All This Work in the Real World?

Well, here are some results. I wrote three tiny programs. The first just looks at itself while it runs: it keeps track of how long it took it between two calls to system time(). The second is one of those annoying programs: it just spins in a loop without ever blocking. The third spawns a real-time thread that causes the scheduler to be invoked more often.

The test program (the first one) was run at priority 20, the annoying program ran at priority 10, ans the rescheduler (the third one) runs a snooze(125); in a loop.

Here's a summary of the results: The left column counts how many times the test program was interrupted for more than 256 us. The right column shows the longest time the test program was interrupted. In all the cases, the test program took 256 million samples. The "bare-bones" system is a system with only the kernel and the app server. The "normal" system is a system where all the OS services are running, plus a dozen applications.

bare-bones                          35            3060
bare-bones, with annoying program  136            3219
bare-bones, with annoying
program and rescheduler            365             459
normal                             702            4776
normal, with annoying program     1139            6127
normal, with annoying program
and rescheduler                   2970             838

Two conclusions can be drawn from those results:

Even on a bare-bones system, a thread with a high priority can be interrupted for several milliseconds. It is useless to try to turn down OS services hoping that it fixes the scheduling gaps, because it doesn't.
Even on a moderately loaded system, using a rescheduler can help to spread the CPU load more evenly between tasks, at the cost of a bigger overhead (which you cannot see on my reasult, but can definitely be measured).

Developers' Workshop: The Ying to the Server's Yang

By Eric Shepherd

In my last article for the newsletter, I presented a simple network server program that would respond to connections by spewing back the entire contents of a file, with a very simple header.

Developers' Workshop: Serving It Up: Creating a Simple Server Application

I received a few emails thanking me for this example, but a couple of them suggested that I should also show the other side of the coin: how about a client designed to access this server?

So this week we'll have a look at Requester, the counterpart to the Responder program created in my last article.

Before we begin, I should point out a trio of matching silly bugs in Responder:

connect->Send(&buffer, count);

ssize_t size = file.Read(&buffer, 65536);
connect->Send(&buffer, size);

On these three lines in the send_file() function, the "&" is unnecessary (and is in fact a major bug). The code worked, but was stomping loudly through memory. Just take those "&" characters out and everything will be much better.

OK, on to the Requester program:

#include <NetAddress.h>
#include <NetEndpoint.h>
#include <File.h>
#include <stdio.h>
#include <socket.h>
#include <OS.h>
#include <stdlib.h>


static status_t request(char *host, char *filename);

int main(int argc, char *argv[]) {
    char *filename;
    char *host;
    status_t err;

    if (argc != 3) {
        printf("usage: requester <host> <filename>\n");
        return 1;
    }

    host = argv[1];
    filename = argv[2];

    err = request(host, filename);

    if (err == B_OK) {
        printf("File received.\n");
    } else {
        printf("An error occurred receiving the file.\n");
    }

    return 0;
}

The primary purpose to main(), as in Responder, is to parse the command line and dispatch control. As a client, we won't bother to spawn any additional threads. We just grab the first argument as the hostname to connect to and the second as the filename with which we'll save the retrieved file. We then call request() to actually handle the networking.

status_t request(char *host, char *filename) {
    BNetEndpoint server;
    status_t err;
    bool first_buffer = true;
    uint8 buffer[65536];
    uint8 *bufstart;
    int32 count;
    char header[32];
    off_t filesize;
    BFile file;

    if (server.InitCheck() < B_OK) {
        return B_ERROR;
    }

As in Responder, we start by instantiating a BNetEndpoint and then we check to ensure that it's valid by calling InitCheck(). If it's invalid, we return an error.

    err = server.Connect(host, 4242); // Connect to server
    if (err < B_OK) {
        goto bail;
    }

Then we connect to the host, whose name was specifed by the user on the command line. If this fails, we use the oft-debased but occasionally handy goto statement to bail out of the request() function.

After this, the loop to receive the file begins, since immediately after we connect to the server, we'll begin receiving data. We use a while loop to continue to receive until there's no more data. Since there's a header that needs to be parsed, we special-case the first buffer using the first buffer flag. Once we've received the first buffer, we clear the flag so we won't look for the buffer anymore.

    while ((count = server.Receive(buffer, 65536)) > 0) {
        if (first_buffer) {
            int i;

            first_buffer = false;    // Don't do this again
            for (i=0; i<32; i++) {
                if (buffer[i] != '\n') {
                    header[i] = buffer[i];
                }
                else {
                    header[i] = 0;
                    break;
                }
            }
            // Got the header.  Parse it.
            if (!strcmp(header, "ERROR")) {
                err = B_ERROR;
                goto bail;
            }
            filesize = atoll(header);
            if (!filesize) {
                err = B_ERROR;
                goto bail;
            }
            i++;
            bufstart = &(buffer[i]);
            count -= i;
            file.SetTo(filename,
            B_CREATE_FILE|B_WRITE_ONLY);
            if ((err = file.InitCheck()) < B_OK) {
                goto bail;
            }
        }
        else {
            bufstart = buffer;
        }

If the first_buffer flag is true, we begin by clearing it so we don't try to parse headers out of future blocks. Then we copy the header into the header string. Since the header is always a one-line string (either the file size in bytes or "ERROR" followed by a newline), this is easy to do.

Once the header has been grabbed, we check to see if it's "ERROR". If so, we return B_ERROR to the caller. Otherwise, we use atoll() to get the file's size from the header. Then we set bufstart, our buffer pointer, to point past the header, and subtract the appropriate number of bytes from the buffer size in count.

Then the file is opened; if this fails, we return the appropriate error.

If the buffer isn't the first, we simply set the bufstart pointer to the beginning of the buffer, since we want to process all of it.

        err = file.Write(bufstart, count);
        if (err < B_OK) {
            goto bail;
        }
    }

    err = B_OK;

bail:
    server.Close();
    return err;
}

Then we write the buffer to the file, starting at offset bufcount, and writing count bytes. If an error occurs, we bail out. Otherwise we continue looping until Receive() returns 0, meaning the connection has closed. We then close the BNetEndpoint and return.

When you build this code, don't forget to link against libnetapi.so.

This working client shares limitations with the Responder server—it sets the filename of the retrieved file to the name the user specifies on the command line (the server doesn't tell the client what to call the file), and file attributes aren't maintained. Only the file's data is transmitted. If you transmit an application, you'll have to chmod +x the file before you can run it.

All Pros and No Cons? It's a Con...

By Jean-Louis Gassée

When an idea, a proposition, a cause is presented to me in terms that leave me no alternative but to be for it, because it's all pros and no cons, then I know I'm being conned. How can I be against freedom? How can I be against innovation? How can I be against freedom to innovate? So, when I hear Microsoft's carefully orchestrated refrain—all we ask is the freedom to innovate, we'll never renounce our freedom to innovate -- being sung by top executives and paid consultants, I wonder. Do they really believe this? Or is it a calculated bet on what our emotional response to an appeal to higher principles will be?

Last Friday, November 5th, Judge Jackson gave his decoding of Microsoft's refrain. His finding of facts concludes that Microsoft is a monopoly. To quote his concluding paragraph:

... Microsoft has demonstrated that it will use its prodigious market power and immense profits to harm any firm that insists on pursuing initiatives that could intensify competition against one of Microsoft's core products. Microsoft's past success in hurting such companies and stifling innovation deters investment in technologies and businesses that exhibit the potential to threaten Microsoft. The ultimate result is that some innovations that would truly benefit consumers never occur for the sole reason that they do not coincide with Microsoft's self-interest.

The entire document is at <http://usvms.gpo.gov/findfact.htm>. It's long (412 paragraphs), but worth reading. It's written in plain English and takes pains to establish definitions for products and markets in order to create a structure for the findings.

The net effect of the finding is troubling. Even for someone as ambivalent (as in having truly mixed feelings) as yours truly towards Microsoft, my emotions range from surprise to sadness, laced with the expected irritation. Surprise, because I didn't expect the breadth and depth of abuse of power discussed by Judge Jackson's. From large companies such as IBM or Intel, to ISVs, service providers, and tiny start-ups such as ours, no one seems to escape Microsoft's vigilance. "Most harmful of all is the message that Microsoft's actions have conveyed to every enterprise with the potential to innovate in the computer industry", concludes the Judge.

My sadness at the language and conclusions of the finding comes from looking at the impenetrable wall of paranoia and misunderstanding. I'm reminded of a six-hour conversation I had with Bill Gates at a PC Forum conference early in 1988. He complained repeatedly that Apple didn't trust him, and nothing I said could help him see why his actions and words created fear and distrust.

A decade later, I've personally seen grown men fear for their company and themselves at the thought of incurring Bill's wrath and Microsoft's retaliation if they didn't "behave." DOJ officials spoke bitterly about PC makers' unwillingness to come forward to testify about Microsoft's business practices. Perhaps, but they would have done so at great potential cost. I'm in no position to blame them; I declined to testify because it might have interfered with other projects that were then pending at our company and also because testifying would have cost us hundreds of thousand of dollars in legal fees. We couldn't take those risks.

Still on the topic of the wall of misunderstanding, there is Microsoft's "what's mine is mine, what's yours is negotiable" attitude. Bill Neukom, the company's chief legal eagle, takes the position that "It's our song" when asked about the restrictions placed on Windows licensees. It's our song and we're free to dictate the way it'll be played on PCs. This is a strange metaphor. If I pay the royalties, I'm free to sing "Let it Be" in any key I want, preceded and followed by whatever act I fancy.

Returning to Microsoft's licensing, they used it as a club to prevent potentially errant PC makers from making BeOS visible to their customers. It "works" like this: You can use Microsoft's boot manager to load any number of OSes, as long as they're made by Microsoft. If you (the PC OEM, Windows licensee) use a non-Microsoft boot loader, you cannot use it to load Windows. If you do, you're in violation of the license and you could lose it—and your business.

That's how Microsoft prevented Hitachi from visibly offering Windows and BeOS at boot time. BeOS was loaded on the hard disk, but the customer didn't see the choice at boot time. You'd have to read complicated instructions to set up a dual-boot situation yourself. A neat legal trick that effectively prevents PC OEMs from offering a genuine dual-boot situation featuring both Windows and BeOS.

In that context, when I'm asked what Judge Jackson's ruling changes for us, I have to say that in the short-term, materially, not much. Psychologically, however, it's different. The ruling could open minds and, perhaps, doors. By shedding light on Microsoft's practices, the finding of fact might limit the company's ability to leverage its dominance of the PC market into the emerging Web appliance sector. As for specific remedies, they constitute a complicated, controversial topic. Fortunately, I found one well-written survey of this issue in this week's Time magazine (dated November 15th), shorter than Judge Jackson's document but balanced and complete nonetheless.