Perhaps you've heard: There's a complete rewrite of the Be file system going on. In last week's Engineering Insights column, Dominic Giampaolo described how Be files are organized and represented in the new system and how database features are merged into this representation. In other words, last week we saw the "file" part of the file system; this week, I'm going to show you the "system."
If we zoom out a bit, we'll see that the Be file system proper is actually just one implementation of the more general Be file system protocol (FSP).
The file system protocol is an API for a set of common file operations (open, read, write, and so on). To support a particular "brand" of file system, one simply implements the FSP API to do whatever the file system expects to be done.
In addition to our native file system, we've implemented (or are implementing) this protocol for DOS, HFS, NFS, and ISO9660. When you (the developer or user) want to talk to a file that resides on a DOS volume, the kernel simply loads the DOS FSP add-on, just as it would load an add-on to talk to a specific graphics card or other device. In fact, implementing an FSP is mostly an exercise in writing a kernel add-on: The type of work that's required is comparable to writing a new device driver. It's not for the hobbyist, but it's not terribly difficult (unless, of course, you care about efficiency, reliability, and integrity).
The communication between the kernel and a file system (that is, an FSP add-on) is "message-based." When a program makes a file system call, such as open(), read(), or mkdir(), the kernel maps the call to a specific FSP message and then "sends" the message to the file system. The part of the kernel that provides this mapping is called the file system independent layer (FSIL).
Currently, FSP messages are simply function calls; thus, for example, when your program calls open(), the FSIL maps the call to (and invokes) an FSP function called op_open(). This function-call-as-message interface works well with add-ons - - but we also want to extend the interface to make it more flexible. For example, we plan to allow user-level programs to act as file systems. Communication, in this case, could be through a pipe or port.
While designing the file system protocol and the file system independent layer, we had three goals:
Parallelism. We wanted a given file system to be able to handle more than one FSP message at a time—it should be able to delete file A while writing to file B. To aid in this, we needed to design the FSIL such that it did as little serialization (of FSP messages) as possible.
Of course, some operations need to be serialized in order to maintain structural consistency and re-entrancy; but only the file system implementation (at the FSP level) should know which calls need to be serialized. The FSIL shouldn't make any assumptions. For example, imagine what would happen if the FSIL assumed that reading and closing a file were mutually exclusive operations that needed to be serialized: In other words, if you tried to close a file that you were reading, the FSIL would hold the close() message until the read() completed. This is fine for normal files—but it doesn't work for a UNIX-style pipe in which read() is often blocked waiting for more data to show up. If the FSIL withheld the close() call while the read() call was blocked, your application (and, possibly, the entire file system) would deadlock.
Re-entrancy. The FSIL must be thread-safe. Thread safety applies most notably to the identification of "nodes" (files or directories). How does the FSIL tell a file system which node an operation is meant to apply to? Names can't be used since there's no guarantee that the name will stay the same: For example, some other thread might rename the file we're currently reading—and we don't want our read to fail simply because the open file has been renamed! We could serialize all the calls to a particular node, but that violates our goal of maximum parallelism.
Instead, the FSIL maintains a "node structure" that's independent of the node name. This structure is passed with every file system operation. When can the FSIL free a node structure? It isn't safe to do it after deleting a node because some other thread may still hold the node open.
The solution we came up with is to provide a reference count on the nodes. Every time a node is opened (or passed as a parameter to an FSP function), its reference count is incremented. Then, when the node is closed (or after the FSP function returns), the reference count is decremented. The node is freed only when the reference count reaches 0.
Thorough Generality. We wanted the FSP to be general enough to handle a broad variety of file systems: DOS, HFS, NFS, ISO9660, and others. And in addition to these "real" file systems, we also wanted to be able to implement "virtual" file systems. Instead of representing real files stored somewhere on a disk, virtual file systems provide "services." For example, the /dev file system provides access to hardware devices; the /pipe file system gives access to UNIX-like pipes. Virtual file systems are potentially the most interesting part of the new system.
I'm getting itchy... Let's look at some of the FSP protocol:
typedef intop_open
(void *ns
, void *node
, intomode
, void **cookie
); typedef intop_close
(void *ns
, void *node
, void *cookie
); typedef intop_free_cookie
(void *ns
, void *node
, void *cookie
); typedef intop_read
(void *ns
, void *node
, void *cookie
, off_tpos
, void *buf
, size_t *len
); typedef intop_write
(void *ns
, void *node
, void *cookie
, off_tpos
, constvoid *buf
, size_t *len
); typedef intop_ioctl
(void *ns
, void *node
, void *cookie
, intcmd
, void *buf
, size_tlen
);
In these prototypes, ns
designates the "name space," or volume, that the
node is on; this is necessary since a single FSP implementation (that is,
for a particular file system) can serve more than one volume. The cookie
argument is what allows the file system to distinguish between different
clients of the same file. Every time open()
is called, the file system
has the opportunity to allocate a cookie. The FSIL passes the cookie as
an argument to the FSP calls with every subsequent operation on this open
file.
By looking at the operation names, the mapping between system calls and
file system operations is quite obvious. For example, the system call
open()
is turned into the op_open()
FSP function. op_free_cookie()
is the
only operation that's not self-explanatory: It frees the cookie allocated
by op_open()
. Why do we need a separate function for this? You might
think that we could free the cookie in op_close()
—but that would
actually be wrong. If op_close()
freed the cookie, there would be a race
condition with ongoing op_read()
and op_write()
calls operating on the
same cookie. The solution, once again, is to reference count the cookie.
A separate operation op_free_cookie()
is invoked after op_close()
when
the last operation on the cookie returns.
I hope you've gotten an idea of the kinds of issues we dealt with while designing the FSIL and the FSP. The FSP API probably won't be released until after DR9, but no doubt the subject will show up in the news groups before then. If you have any questions, don't hesitate to contact me at cyril@be.com.
Last week was Thanksgiving. Other than the fact that it ushers in the heaviest shopping day of the year, I can't forget that I am grateful for many things.
My 19-month-old likes keyboards and mice.
My wife is a programmer.
I have plenty of computers.
I'm allowed to program as much as I want.
And my favorite computer now has a MIDI synthesizer in software!
In case you missed that, Marc Ferguson, who spelled out the whats and whyfors of MIDI in a previous newsletter, has released the software synthesizer for DR8.2. There are two packages you can install. One is relatively small and will get you hearing as soon as possible. The other is quite large and you can download it at a more leisurely pace.
ftp://ftp.be.com/pub/dr8_update/simple_midi_small.tgz
In this package you will find two things:
simple-midi | The actual player |
general.midi.small | A smaller version of the instruments (315 K) |
Place the general.midi.small file into /boot/system/bin.data and rename it general.midi. Put the simple-midi file anywhere you like, possibly /boot/apps. You should do a setfile simple-midi once it's installed.
Alternatively, you can get the bigger package, which contains the higher-quality instruments.
ftp://ftp.be.com/pub/dr8_update/simple_midi_large.tgz
simple-midi | The actual player |
general.midi | The full bodied instruments (5 MB) |
Again, place the general.midi file into /boot/system/bin.data and put the simple-midi file anywhere you like.
This is a drag-and-drop interface for playing MIDI files. Simply drag a MIDI file onto the icon and it starts playing. You can also play files by starting the application and using the Open button. You should only execute a single MIDI player at a time. If you try tricks like copying the file and playing multiple files that way, you might not get what you expect.
Personally, I'm not a MIDI person. I don't do instruments, but I can certainly appreciate high-quality music (for a computer) when I hear it. I played with quite a few MIDI sequences and they sound great. And since the instruments will be the same on all machines that install this package, the music will sound relatively similar on yours, too.
This is not the final effort for software MIDI on the BeOS. This is merely a placeholder for what will come in DR9. The synthesizer doesn't provide you with an API so that you can use it from within your own application; this will show up in DR9. But for those who've been dying to hear some thumpin' sounds out of their machines without having to buy a hardware synthesizer, your day has come.
I thought I was a pretty good programmer and that I could help other programmers to get serious things done using the BeOS. One thing I've learned in recent years is that no matter how good you are, the Internet will bring someone to your door who is much better. I believe the BeOS has attracted extremely skilled, highly motivated programmers who crank out the most awesome stuff, and honestly, I find myself learning from them as often as I humbly try to teach them.
If you don't check the BeWare™ and ftp.be.com site often, you should. There are new offerings coming out all the time. I've mentioned a number of developer offerings in the past, and I'll continue to try and highlight things as they come by.
Of particular note this week are two applications that have been progressing over the past few months. You can check the What's New page at ftp://ftp.be.com/pub/contrib/whatsnew.txt to see where they are.
BetMap. This is a drag-and-drop paint program, the likes of which you've probably never seen. I said the same thing when it first came out, and it's still true. Only now it has an API so you can participate in this add-ons-gone-wild world.
Zonic. This is a very useful and very clean layout engine and widget set. I think it demonstrates the ease with which multiple interfaces can be created for the BeOS without too much work. You should get the demo and play with it. "Really neat" is probably the best description.
As many people have pointed out to me via e-mail, "Why aren't the samples available when the newsletter comes out?" No one to blame but me. Excuses... Excuses... Excuses... The standard refrains apply, and I'll try to do better in the future. There's only me doing these things, and things do get busy, but there is hope. One of the busy things I've been doing is interviewing many candidates to jump into the pit with me. It goes well, and you should have some new targets... I mean new voices of support soon.
Remember Macworld? It's now one week closer than the last time I mentioned it. Thump, thump, thump. Hear that? That's the beat of the Be Drum. It's pushing you to develop that code as fast as you can so that we can show your wares and what promises to be an illuminating, highly visible event for the BeOS and its developers.
I'm not a great fan of Comdex, so much so that I've avoided attending or exhibiting there for six years. But when Motorola graciously offered us two demo stations in their PowerPC Pavilion at Comdex this year, the tightwad in me couldn't resist. Accordingly, Maureen Miller, our trade show manager, Ron Theis, our Webmaster, and I spent five days showing the BeOS on a BeBox™ and on a Power Computing machine to thousands and thousands of people.
It was actually fun.
This is quite amazing.
Most people work for two kinds of income: Actual money and psychic income. Psychic income is the satisfaction you get from doing your job. Comdex was fun because no matter how many times we gave the BeOS demo, we got tremendous psychic income from the reactions of the audience. It was also fun because Las Vegas appears to have outgrown the show. It was possible to get a cab without waiting for two hours. It was possible to get a table at a restaurant. And it was possible to get a hotel room, albeit for a price that made one's eyeballs bleed.
We survived the show by working hard and by avoiding the traditional trade show excesses of late-night parties and lots of drinking. Our feet hurt, but not our heads. I will confess that all three of us succumbed to the allure of the craps table but, incredibly, all won Big Money (well, OK, Some Money). This bodes well and probably means that there's good karma surrounding Be employees these days. The casino system seems to function on the basis that, while we cheered and rejoiced at winning $15 (our threshold of excitement was fairly low), portly gentlemen in plaid pants smoking noxious cigars were losing $100 at a time.
The most pleasant and gratifying part of the show was that over half the people coming to our booth knew our company's name, knew roughly what we're trying to achieve, and wanted to see a demo. This is a huge change from, say, May of this year, when most visitors to a trade show booth had never heard of us. The rampant speculation in the press about our future seems to have had one benefit, at least. It was also nice to meet with many Be developers and put faces to e-mail addresses. Nicer still, many of the developers are working on real projects, having moved on from simply experimenting with the BeOS.
Were there any other interesting things at Comdex? I don't know! Our booth was never quiet enough to allow us to slip away to check out the rest of the show.
I just flew back from Maui by way of Pittsburgh, Pennsylvania. Using this itinerary to raise questions about my sanity is fruitless—I'm engaged in work on an alternative personal computer platform to begin with. But, seriously, I had a good reason. After taking my family of French pilgrims to celebrate the ultimate American holiday in a locale more fitting our (vastly exaggerated) hedonistic roots, I flew to Carnegie Mellon for the reading of Paul Clip's thesis, CoffeeBean, a Java virtual machine implemented on the BeOS under the tutelage of Professor Adam Beguelin.
It was good and useful fun. Developing applications for a platform still in its infancy isn't for everyone. Writing system-level code is living even more dangerously, especially at a time when the system is still changing at a fast pace from one revision to another. So, Paul Clip's project wasn't only a test of his intellect and manhood, but also a way to push our product in areas that aren't necessarily probed in other developments. Overall, Paul was generous in his comments, so much so I won't repeat the most enthusiastic ones. His work pointed out to several weaknesses (in California, "areas for improvement"). Some have to do with differences with Sun's views of the world. For instance, Java expects 64-bit signed integers, which are unavailable in the current version of the C++ compiler on our system. There's also a difference between the systems used by the two worlds for priorities and synchronization. An opportunity for us to roadtest our kernel in subtle scheduling situations. Paul's work also fingered a problem other Be developers have discovered, this one related to hardware. Not all multithreading situations are dealt with equally well by our system. When we moved from the AT&T Hobbit processor to the PowerPC, the 604 was unavailable and projected to be prohibitively expensive (for us at least). The 601 was clearly an ephemeral product. Only the 603 had the future and the price we liked. There was one glitch, however: Motorola stated that multiprocessor systems couldn't be implemented with the 603. On closer examination, this techno-marketing statement was based on the absence of cache-coherency hardware on the 603. Loosely speaking, cache coherency is a function by which one processor can advise others of the "pollution" of data contained in its cache, thus preventing its colleagues from reliance on now-invalid copies of the same data in their own caches. We decided we could work around that problem, mostly in software, and we produced working 603-based dual-processor hardware. In most cases, the workaround we designed imposes an invisible performance penalty. Now, imagine a situation where two threads, one to each processor, work on the same data. This can give rise to sizable overhead when the caches have to be constantly updated. One can construct cases when two processors perform more slowly than when one BeBox CPU is turned off. Fortunately, in real life, the system performs loosely coupled tasks most of the time, and the 603 penalty isn't a factor. The 604 is now available, without much of the earlier price penalty exacted on our small company with its limited purchase power; it features cache-coherency hardware and thus does away with the limitations of the 603 in MP applications. If not all, most PowerMac licensees have 604 dual-processor hardware but not much system software to take advantage of it, a situation we intend to deal with promptly.
Back to Carnegie-Mellon, we're grateful for Prof. Beguelin's hospitality, we thank Paul Clip for his dissecting dissertation of our system, and we've already incorporated several of his suggestions into the next release of our system.
BeDevTalk is an unmonitored discussion group in which technical information is shared by Be developers and interested parties. In this column, we summarize some of the active threads, listed by their subject lines as they appear, verbatim, in the mail.
To subscribe to BeDevTalk, visit the mailing list page on our web site: http://www.be.com/aboutbe/mailinglists.html.
More discussion on simulating Amiga "signals" using semaphores (or benaphores) and thread calls. Various scenarios were hypothesized; some were answered with source code examples.
This discussion, which was initially about different ways to let the user place/tile/layer windows on the desktop, veered into a debate on the meaning and enforcement of "multi- user," as well as a search for the perfect real-world analogy for user-settable GUI preferences.
Combining the two thoughts, we have your boss's wife peeking over your shoulder as she tries to hijack your car. No one pointed out the blatant sexism in "the boss's wife," but the thread is young.
A heated debate on whether VM is really a good thing. No clear consensus, but a number of folks pointed out that you have to keep the concept of VM separate from that of swapping—it's possible to have "good" VM but "bad" swapping. In particular, an unbounded swap file is bad.
A simple question—"How do I shutdown my BeBox from within a program" —was answered, and then the thread went on to discuss whether programatically shutting down or powering off is a desirable feature, and, if allowed, how it should be coordinated with a UPS. In general, soft-off is seen as a nice feature that can easily be abused; it was proposed that the mechanism be protected by root authentication (once multi-user is implemented).
Some initial remarks about Marc Verstaen's AppSketcher led to a wider discussion of UI modeling engines (M. Verstaen pointed out that AppSketcher isn't intended to be such an engine). Should the UI for an app be hand-tweaked, and the coordinates hard-coded, or is this in itself an indication of poor design? It was offered that a good layout engine can figure out where things should go based on relative sizes and positions without the programmer having to supply any coordinate values. Others balk at the thought of automated UI design. It ended in tears.
This thread collected comments on Dominic Giampaolo's article in last week's newsletter. Some folks are disappointed in the decision to retain a hierarchical file system, but there was approbation (and expectation) for the IFS aspects of the system. (See Cyril Meurillon's article in this issue for more information.) The final thread (of the above) offered suggestions for fine-tuning the journaling feature.
Different schemes for storing and encoding of version denoters were proposed. The thread became quite precise in its detail: The exact format of version numbers and the structure that would store version information were discussed. The one debatable subject was whether version strings should be human-readable.
How does ImageViewer do its drag-to-save thing? A number of listeners (well, a couple, anyway) were interested in emulating this behavior. There were some guesses about what was happening, and then Peter Potrebic parted the curtains:
User starts dragging a selection in ImageViewer. It doesn't know the destination so it uses it's own data format for the image.
The image is dropped on the Browser. The Browser doesn't
understand the message so it replies with a B_MESSAGE_NOT_UNDERSTOOD
.
ImageViewer receives the B_MESSAGE_NOT_UNDERSTOOD
... [creates]
a temp file and replies back to the sender.
The Browser gets this reply. It knows what to do with files (refs) so it copies the data file to the appropriate place.
You can find example code in the RRRRRRRRaster app: ftp://ftp.be.com/pub/Samples/Rras_sdk.tgz
A proposal for generalizing the drag-to-save mechanism was proposed and debated.