We all hate it when a piece of software crashes or misbehaves. Yet, it's all too easy to write software that's fragile or "delicate." If you'll allow me to step up to the white board for a few minutes, I'd like to share a few thoughts about how to avoid some common coding problems.
First and foremost: Be prepared to deal with a nonstatic world. The possibility that an event might interrupt the normal flow of your program is something you have to accommodate. That file you're reading may have just been deleted by the user, or the data could have been overwritten. Servers have to pay particular attention: The client you (the server) were talking to just a moment ago may have died in the last second. This can result in partially written data, responses that never get to the intended recipient, half-completed transactions, and so on. When a program starts up, it should verify that the files and operations it wants to perform make sense, and that they will operate on valid data. A server mustn't hang when a client disappears; the integrity of the user's data must be preserved at all costs.
Your program wants to read from a file. Can you assume that just because the file exists that its data is valid? Certainly not. So what do you check while reading? If your program writes files that it will read later, then you can control the format in a known way: For example, you can put magic values in the file to help detect file corruption, or structure the file so each section is self-describing. But if you're looking at a "generic" file, be particularly careful while reading it in. Would your program crash if it read a file that went through an ftp-induced CR/CRLF conversion? Hopefully not. Keep in mind that many types of file corruption errors are accidental. Many users can trash files without ever realizing it. To them, it's your fault.
Now your program wants to write some data to the disk. What sorts of problems can you run into? You could run out of disk space, the disk could catch on fire, the directory you were planning to save in may no longer exist... If the size of the data you're writing isn't prohibitive, it's wise to save to a temporary file and then, once you're sure the data has been completely and safely written, rename the file to the name requested by the user. This technique prevents the user from losing data should the disk become full or catch on fire while you're saving.
Because DR9 is much better about dealing with out-of-disk- space errors, the POSIX error value ENOSPC is very real and must be dealt with. Running out of disk space should not be a catastrophic event to an application. The free disk situation may change very quickly, so it's best if an application can fail gracefully and allow the user to try again when disk space is available.
Applications must also be able to deal with changes to the underlying file system layout while they're running. On a system like the BeOS, it's natural to flip back to the Tracker (nee Browser) to do some file operations and then flip back to using an application. This can present some difficulties for an application, but again, dealing with the disappearance of files and directories is simply a fact of life. An error returned from trying to save something in a particular directory should bring up another Save panel or an alert panel indicating the error—it shouldn't crash the application.
To begin making a server robust, consider the ownership of the ports that it uses. If the client owns the port that it reads from and the server owns the port that the client writes to, then if either the client or the server dies, the other can be made aware of the situation and clean up accordingly. This works (in the BeOS) because when a team dies, any ports it owns are deleted, and an error will be returned by the next read or write on the surviving port. This is very important, but it's easy to overlook (until recently even some of the homegrown Be servers didn't handle this properly).
Another aspect of a server's job that can cause real headaches is that it might not have much control over the quality of its clients: For every well-behaved client program, there are any number that are poorly written or possibly even intentionally malicious. A server should always validate what has been sent to it by a client. A bug in the client may cause improper data to be sent; a malicious client may try to trick a server by giving it incorrect information. All fields and arguments passed to a server from a client should be validated. Again, magic values in structures can sometimes help as a first line of defense, but range checking all values is also important (does a bitmap width of 2147483648 make sense?).
Shared memory is a great feature of the BeOS. It can make the sharing of large amounts of data between two applications very efficient. However, shared memory should only be used to share *data*, not to *control information*. If you place control information in a shared area, one of the sharing applications could accidentally write over it. It's much safer to pass control information through a sideband channel, such as a port or a pipe.
Another common error that programmers (especially UNIX programmers) make
is that they take the presence of virtual memory as a guarantee that all
memory allocation requests will always succeed. Virtual memory *is not*
infinite. It seems almost schoolmarmish of me to have to remind people to
check the return of malloc()
and new
...
but you absolutely must.
Part of our stress testing here at Be involves pushing a machine to its
limits by running out of memory and disk space at the same time.
Applications must be able to deal with these situations and fail as
gracefully as possible. And by the way, calling exit()
when malloc()
fails is not considered "graceful."
I could go on, but it would be impossible for me to enumerate all of the possible problems that an application could run into. My aim with this article is to raise the consciousness of developers so that you'll think about these issues when developing applications. If applications are written to deal with inevitable errors and anomalies, then users will develop more confidence in the application and in the system as a whole.
So when babies are first born, they have a basic set of immunities that they inherit from their mothers. They don't really get sick in the first couple of months, all screaming and crying aside. Then slowly over time they build up their own immune system and get colds with the rest of us.
When Yasmin approached her first winter, our doctor said we could expect her to get sick every six weeks or so!! Oh boy, what will this do to my productivity. Well, as I have said in the past, I practiced for fatherhood by learning to program quickly, without error, and optimally... At least to my best effort.
Last week Yasmin was right on time for an illness. Funny thing for kids and viruses. They fight them by raising their body temperatures to ungodly levels to burn them out. Then the fever breaks and they're back to running around and bouncing on you. So I was left at home one day with her sleeping on my chest while I sat on the couch doing nothing... except thinking about the programming that needed to get done, sick child or not.
In such situations, I thank my lucky stars that I get to program on the BeOS. It's light, easy, and less filling. It allows for amazing feats, and you can't even see my lips moving! I decided that for my one-day task I would take a whack at the VideoPlayer code that we've been giving out with our Hauppauge WinCast driver.
The Hauppauge WinCast/TV card made for successful demos at Macworld. We subsequently gave the driver to some folks and the header and sample code to others so they could write their own video applications. You must understand, Steve Sakoman, the writer of the driver, is a hardware guy. He's also responsible for our FireWire driver and a number of other things, like early BeBox™ designs. His sample software, although very decent, doesn't quite have the OO and wasteful polish that developers such as myself like to make money off of.
So, I took his VideoPlayer code, tweaked it, simplified, limited, twisted, renamed, and called it my own.
ftp://ftp.be.com/pub/Samples/vidview.tgz
This sample code and included driver will allow you to use the WinCast board to display some live video in a window on the screen. The code that deals with the video stream is fairly isolated and can be replaced with other video sources fairly simply, but that's for another time. The package includes the latest driver and borrows some code from the QCamView3 program.
The structure of the application is very simple. A thread basically goes to the video source and asks for a frame, which is a pointer to a BBitmap. It then displays it in a view, and asks for the next frame. It shouldn't be too hard to imagine that this GetNextFrame() method could easily be replaced by anything pretending to be a video source. QuickCam, WinCast, MPEG. It shouldn't really matter. Perhaps these are just add-ons that the user can use to select the source at run-time. I don't know.
A few months ago we were a multimedia platform with absolutely no support for any video hardware. Then along came the QuickCam driver, and then the more impressive WinCast driver. FireWire is knocking on the door, and things are looking up.
And so. I've mentioned a couple of times the various forums available to the Be developer community for communications. There are the news groups, comp.sys.be.*, there's our own BeDevTalk, and now there are bugs on-line! In case you haven't noticed, we're publishing our bugs on-line so that developers don't have to go wondering whether something they're running into is a known problem.
This should be a welcome sight to many developers. In conjunction with the BeDevTalk list, it gives developers plenty of resources from which to draw knowledge for development. It appears that some of our developers have taken to making the BeDevTalk list a little more useful than it currently is. You can find a nice web archive of the mailing list at:
http://www.bespecific.com
Once again, I'll be the first to admit that I have an association with this company, but when they do good work, they deserve to be pointed out as much as anyone else. I think this archive is a very useful tool that makes the BeDevTalk list that much more valuable and usable as a knowledge base.
In the past week we put out quite a few messages about the May Developers' Conference, master's program, DR8.3 release, and other things. If you're a registered developer, you'll want to make sure you to update your record so that we know you're still alive and interested in being in our developers' program.
So, here we are. DR8.3 has rescued us from the brink of the April 1 bomb. DR9 will be later than expected. There are more new applications every day, the sun is shining, Yasmin is now well again, and I'm proud to be a part of one of the most exciting things since sliced silicon.
At first, it looks like a strange question. By now we ought to know, right? Yet the question was given a new life by recent events as well as by animated discussions on the net.
The events I refer to are our negotiations with Apple and our exit from the hardware business. As for the discussions on the net, they're voluminous, and the tone ranges from "You're great!" to "Who do you think you are?" We appreciate the compliments and our gratitude goes to the critics: They keep our enthusiasm under control and remind us of our mortality. In any case, it's not a good idea to assume everyone knows or agrees with the answer, nor that it is given once and for all. Put another way, if the answer changes all the time, it's too tactical (I'm being very polite). If it never changes, it's too abstract. You're dead, or soon will be.
So what do we live for, why do we do what we do? Is the answer money, ego, because we can, the technical challenge, fun, or dissing our elders? Certainly a combination of all of these.
Let's start with the money. When I joined HP in June 1968, I was horrified to see profit posted as HP's number one objective in new employee induction material. The 1968 student riots had just ended, and I came from a rather leftist Paris University background. The HR Director, when I confronted him with this gross expression of naked capitalism, was kind enough not to fire me on the spot. I've since mended my ways.
We do what we do to make money. If we do a good job, software developers will make money, our OEMs and other business partners will make money, our investors will make a killing, and we ourselves, lifted by the tide, will pay off our mortgages, send our kids to college, or be able to afford similar end-of-the-century extravaganzas. If we don't make money, nothing else will matter. I write this not because I've become a capitalist trapped inside the body of a Frenchman, but because I've learned from painful experience what happens when the money motive takes the back seat to loftier goals.
What about the ego? Of course it matters; everyone has one, even Mother Theresa. And hers must be pretty healthy to have sustained her and driven her to do all the good she has done. It's how we feed our egos that matters. My hope is that we'll feed our egos by doing good work, building something people enjoy, and serving our partners. They'll be the judges.
Because we can? Usually this is meant as doing something for its own sake, with the expressed criticism that the goals are inward-looking as opposed, for instance, to market driven. It's sometimes expressed as, "Build it and they will come." We used to get this kind of criticism during the formative years of the company. It was an honest disagreement, but it hurt our fund-raising activities. Some people felt there was no room for a new OS. We believed we could build one and find room for it. We knew the legacy platforms were building more and more baggage, more layers of software silt over time. And we saw we could avoid competing with the entrenched leaders in the mature office productivity markets. Instead, we'd focus on the emerging digital media applications, where no giant predators would automatically eat a fledgling start-up alive. We saw how to tilt the legacy versus baggage balance in our favor.
And the technical challenge? Yes. It's an important factor. You can't run the kind of marathon race we're running if you're not excited by the sport. It's interesting, it's fun, it's motivating. What better reward than to see others excited by your work? If you meet a knee surgeon and all he talks about over dinner is knee surgery, if he thinks this is the most entrancing activity in the world, you'll probably think he's "different." And I'll call him when my knee acts up.
As for dissing our elders, I'd rather acknowledge our debt to them for the good and bad examples they set for us, for the learning experience we all gained. But it would be dangerously dishonest if we kidded ourselves and didn't acknowledge a sense of rivalry as well. Age-old mixed feelings.
Does this tell the whole story? Certainly not. Group dynamics, human frailties and virtues, ironic luck are just as important as my limited view of what moves us. And, in the end, we'd like our product to do the talking for us. For that, we still have a lot to do.
BeDevTalk is an unmonitored discussion group in which technical information is shared by Be developers and interested parties. In this column, we summarize some of the active threads, listed by their subject lines as they appear, verbatim, in the mail.
To subscribe to BeDevTalk, visit the mailing list page on our web site: http://www.be.com/aboutbe/mailinglists.html.
What's a good language/environment for a scriptable installer? Some suggestions: FORTH, Java, Perl, Lisp (the world's only civilized computer language). Java was generally accepted as the most generally acceptable, although not without some specific objections.
Language aside, should the script that drives the installer be binary or human readable? Some folks would like to read the script before running it; there's some feeling that a binary could be a Trojan horse. This was countered by the reminder that the application that's being installed is itself binary—why trust the app but distrust the installer?
The desirability of embedded installers was also debated. Fred Fish:
“I would prefer there be one installer application [and interpreter] that lives somewhere on the system...[A] package being distributed [should] include only the installer script...”
But according to Michael Bayne:
“If I am going to distribute an application ...I have to *guarantee* that the user can install it. Short of an installer provided by Be, there is no other way to guarantee the existence of such a program without bundling it. Nothing less than a 100% guarantee will suffice.”
Is the file system block size (in DR9) tied to the size of the hard disk? Is there a performance hit for small block sizes? How much disk space is eaten for overhead?
THE BE LINE: File system block size is tunable (per system): 512K to 8K. As for the performance of various sizes of blocks, it depends on what you're doing. If you're streaming huge amounts of data from disk, then big blocks are better. But remember, speed isn't always everything --big blocks waste space if you have a lot of little files.
Dominic Giampaolo offered this testimonial regarding overhead:
“I just created a file system on a 4 gig disk and there is about 1 megabyte used for the initial data structures laid down on the disk (i.e. you have 4 gig minus 1 meg available for user data).”
UI discussion: Should focus follow the mouse? Is auto-raise (where the window under the cursor comes to the front) evil? Is the inability to type in a nonraised window lamentable? How about some window-management tools?
In this thread, which sprouted from the side of "Mouse focus...," Talin offered a number of interesting UI "studies show" thoughts:
“Menus are more efficient when the mouse pointer is constrained. The simplest example is having the menu bar at the top of the screen—you can 'slam' the mouse pointer against the top of the screen without having to gauge the distance that you are traveling.”
“Studies have shown that the time it takes to hit a target with a mouse increases with the distance of mouse travel and decreases with the size of the target. Constraining the mouse increases the effective target area.”
“Another innovative menu structure is that of 'radial menus.' These are pop-up menus that display menus arranged radially around the current mouse position. It turns out that people are _much_ better at remembering angles than they are at remembering distances.”
Most of these points were granted by our listeners—but some folks questioned the basic premise: Is the mouse really such a great input device? Can a user move faster and with greater accuracy by using keyboard equivalents?
We heard (and debated) the results of some homegrown experiments.
An initial request for distinct icons for individual files led to a discussion of the DR9 MIME type implementation and use. In addition to file extension-to-MIME mapping, some folks would like the system to map from UNIX-style magic numbers, regexp hits, and so on. This could, it was thought, be provided by third parties if MIME recognition machinery could be complemented by add-ons.
Adam Hinkley wrote in to testify his dissatisfaction with flow-controlled data-sending functions that packetize the input buffer; he finds that this leads to partial data transmission. His solution:
“[H]ave a function that returns how much data in the internal buffers has not yet been sent. [T]he app can watch how much data is waiting to be sent, and when it drops below, say 500K, call the Send function to queue up some more data.”
It was suggested that this sort of mechanism is better left to individual programs (and a code example that loops over send() was given)—it shouldn't be a default mode of the system. Mr. Hinkley countered that the synchronous send() is wasteful—you could be doing something better with your time than waiting for send() to return. But some feel blocking send() is a good thing because it consumes less CPU than does a busy-wait (or poll). If you want to do something else while waiting, you can...
“...spawn a child thread. That's what they exist for.” [Osma Ahvenlampi]
Mr. Hinkley offered a (real world) example in which a server is talking to a large number of clients (he proposed 100). If you spawn a thread for each connection, you'll kill the system. Not so, responded many listeners: 100+ threads in the BeOS is hardly unusual.
THE BE LINE: We won't force anyone to use threads.