THESE EXPERIMENTS MAY RENDER YOUR DISK UNBOOTABLE.
While all computer users know how to boot their machine, very few know what is happening under the hood. Depending on your operating system, you may see your machine reboot a dozen times a day, so often that some companies regard this period of time as valuable commercial property. For such systems, it may help to observe the congressional hearings before altering the appearance or function of your boot sequence.
BeOS users, however, should find this article sufficient. It explains booting in general, how the BeOS boot sequence works on the PC, how it can be customized for your machine, and demonstrates some common configurations.
As a starting point, I have chosen March 3, 1982. This is the date on the following file:
HIGHMEM = 010000 clr r0 1: mov (r0)+, HIGHMEM-2(r0) cmp r0, $end blo 1b add $HIGHMEM, pc mov $0172526, r0 clr (r0) mov $-01000, -(r0) mov $060003, -(r0) 1: tstb (r0) bpl 1b clr pc end:
I wrote this bootstrap for the PDP-11. After entering the 18 machine words (in octal) using the front panel, 512 bytes would be loaded from 9-track tape, followed by Coherent, a UNIX-like operating system.
16 years later, nothing has really changed. Bootstrap code copies itself to a safe location, jumps there, and loads a larger bootstrap from some device. This is repeated until the operating system itself is loaded. In the modern world of code bloat, bootstraps are still written with an old skill: wringing the most work from the smallest program.
On the PC, the ROM-based BIOS acts as the first bootstrap, saves us the trouble of entering it by hand. After that, various boot stages are available. This article is most interested in LILO, written by Werner Almesberger. This package, and the BIOS itself, are essential to booting BeOS.
The following cases have been selected both to illustrate the workings of the boot process, and to provide practical setups for immediate use. I have tested each case personally.
THESE EXPERIMENTS MAY RENDER YOUR DISK UNBOOTABLE.
Before a change to your boot procedure, be sure it matches your disk partitioning and OS placement. Having a boot floppy for each OS is good protection, backups are better, and a scratch machine is best.
This is a schematic representation of the simplest boot sequence, and the one used by Release 3 BeOS. In detail, here are the stages:
The BIOS. The code is stored in nonvolatile RAM, and is able to operate the basic peripherals of a PC. It loads the master boot record (MBR): this is sector 0 of the first hard drive (ignoring floppies for now). If supported by your BIOS, this can be a SCSI drive.
The first stage LILO bootstrap. This is the content of the MBR. It employs the disk-reading services of the BIOS.
The second stage LILO bootstrap. This is located within a Be file system. This stage is large enough to offer many features, like a boot menu, and can load large files. It, too, uses the BIOS to read disks.
/system/zbeos
. This stage prints the "Starting the BeOS boot
sequence" message, does general initialization for the BeOS kernel. It
can operate many more devices than the BIOS, display a fancy boot
menu, and knows about the various Be-supported file systems. Life is
pretty cozy at this point. It finishes by loading the BeOS kernel.
Stages (1) and (4) know where to find the next stage on the disk. How do
the LILO stages, (2) and (3), know where to go? The answer is a utility
in the LILO package called, confusingly, "lilo". To paraphrase Monty
Python, it's spelled /bin/lilo
, but it's pronounced "map installer." The
map installer analyses each file needed during the boot process, which
can be located on any of your Be file systems. The analysis yields the
BIOS drive number and absolute sector for every sector of the file, and
stores them as a big list in a "map file." The map file itself, as BIOS
drive number and starting sector, is stashed in the first stage LILO
bootstrap.
The major side-effect of this approach is the need to run the map installer whenever a file used in the boot process is moved. This includes moving a partition and changes to drive parameters in the BIOS configuration. The good news is the LILO bootstrap code needs no knowledge of the file system. Any type of file system supported by BeOS can host the program and map files.
The map installer is called from the shell command line, and operates
according to instructions from a configuration file. Here is
/etc/lilo/lilo.conf
for Case 1:
boot = /dev/disk/ide/0/master/raw install = /etc/lilo/boot.b linear ignore-table image = /system/zbeos
Here is the command to install the map:
lilo -C /etc/lilo/lilo.conf
The name /dev/disk/ide/0/master/raw
is BeOS
parlance for the master IDE
drive on the first interface. The "raw" device will ignore any partitions
and target the MBR. The "linear" and "ignore-table" options are required
at this time: the occasional "Invalid partition table" message can be
ignored.
When run for the first time, "lilo" squirrels away a copy of the original
MBR in /var/lilo/boot.0300
. This can be restored using the following
shell command:
cp /var/lilo/boot.0300 /dev/disk/ide/0/master/raw
Here is the list of files used by LILO:
/etc/lilo/lilo.conf /etc/lilo/boot.b * /etc/lilo/chain.b * /var/lilo/map * /var/lilo/boot.0300 /system/zbeos (and equivalent) *
If files marked by (*) are moved, remember to run the map installer again. If you don't, you'll hear about it.
So much for Case 1.
In Case 1, LILO was used as a program loader, proceeding without delay from BIOS to BeOS. However, LILO can also serve as a boot manager, permitting you to select which OS to boot. As usual, you must prepare the configuration file, and run the map installer.
Here is lilo.conf
for Case 2:
boot = /dev/disk/ide/0/master/raw install = /etc/lilo/boot.b delay = 50 linear ignore-table image = /system/zbeos label = beos other = /dev/disk/ide/0/master/0_0 loader = /etc/lilo/chain.b label = w95
I assume Windows 95 is on the first partition (0_0
). The "label" option
allows you to assign a word to each OS (try to control yourselves). The
"delay" option gives you time to hit the Shift key, which starts the
menu. Otherwise, LILO will make a wise decision on your behalf.
If 0_0
is not found during map install,
invoke DriveSetup. Then try to
mount the DOS partition. This should cause 0_0
to appear.
The "image" designation causes the entire program file to be loaded, and
is compatible with zbeos
, Linux kernels like vmlinuz
, and certain
stand alone programs (see Case 5). The "other" designation causes just
one sector to be loaded, namely the partition boot sector. This is sector
0 of the specified partition, and it customarily contains an OS-specific
bootstrap. Think of it as a per-partition MBR. If you're still awake, I'm
sure you see distinct possibilities here.
LILO comes from the Linux world, so we would expect the following to work:
boot = /dev/disk/ide/0/master/raw install = /etc/lilo/boot.b delay = 50 linear ignore-table image = /system/zbeos label = beos other = /dev/disk/ide/0/master/0_0 loader = /etc/lilo/chain.b label = linux
Or maybe not. Did your Linux administrator arrange to have a partition
boot block for your Linux system? If not, this lilo.conf
has made Linux
unbootable. Tsk.
If Case 3 leaves you dazed and confused, it probably means LILO on the Linux side is running the MBR. However, we are running the MBR. There can be only one.
Rather than lop off each others heads, let BeOS run the MBR, copy the Linux kernel to our file system and add it to our list of bootable images:
boot = /dev/disk/ide/0/master/raw install = /etc/lilo/boot.b delay = 50 linear ignore-table image = /system/zbeos label = beos image = /system/vmlinuz label = linux root = /dev/disk/ide/0/master/0_0 read-only
I will not explain Linux options, except "root".
/dev/disk/ide/0/master/0_0
is the equivalent of
/dev/hda1
on Linux: the
mapping should be obvious. Otherwise, use the Linux major/minor notation,
e.g. 0x0301. You can even specify SCSI drives, so /dev/sda1
is
expressible as 0x0801.
Memtest86 is a cute stand alone program by Chris Brady which performs extended memory testing. It is available from the usual places.
boot = /dev/disk/ide/0/master/raw install = /etc/lilo/boot.b delay = 50 linear ignore-table image = /system/zbeos label = beos image = /system/memtest86
When "label" is omitted, LILO creates something sensible.
We must now face the problem illuminated by Case 4: sometimes we cannot control the MBR. For those of you with commercial boot managers, like Partition Magic or System Commander, the MBR is in their control.
LILO's role, once again, is lowly program loader. The first stage LILO
bootstrap, evicted from the MBR, must reside in a partition boot sector.
The partition is determined automatically from the
/boot
file system.
install = /etc/lilo/boot.b linear ignore-table image = /system/zbeos
You may override the assumed partition using the "boot" option, but make sure it belongs to BeOS. Otherwise, you create a spectacle.
If you are too stingy for a commercial boot manager, and want to take the ideologically pure path, you are ready for a graduate course in LILO.
The plan is to use LILO as a boot manager, meaning it runs the MBR. A configuration file must be prepared and the map installer run.
We also use LILO as a program loader at the partition level. This calls for a different configuration file and map installation.
By separating these functions, we do not lose face like Linux in Case 3. The revolving door of upper management in the MBR suite has no effect on BeOS booting.
Here is "lilo.conf.partition":
boot = /dev/disk/ide/1/master/0_0 map = /var/lilo/map.partition install = /etc/lilo/boot.b linear ignore-table disk = /dev/disk/ide/1/master/raw bios = 0x81 image = /system/zbeos label = beos
Here is "lilo.conf.mbr":
boot = /dev/disk/ide/0/master/raw map = /var/lilo/map.mbr install = /etc/lilo/boot.b delay = 50 linear ignore-table other = /dev/disk/ide/1/master/0_0 loader = /etc/lilo/chain.b label = lilo2
Here are the commands to install the maps. Note the order:
lilo -C /etc/lilo/lilo.conf.partition lilo -C /etc/lilo/lilo.conf.mbr
The scenario has been enlivened by placing the Be file system on the
second IDE interface. A bug in BeOS LILO will surface if your machine has
this configuration, and a CD-ROM drive as slave on the first interface
(/dev/disk/ide/0/slave/...
). The bug relates
to the BIOS drive numbering
scheme. Fortunately, the "disk" and "bios" options allow you to specify
the BIOS drive number explicitly: 0x80 is the first hard drive, 0x81 the
second, and so on.
On a modern BIOS, SCSI hard drives are part of the numbering system. If your SCSI drive is the designated boot device, then it will be 0x80, and the other drives follow. CDROM drives are not numbered.
Needless to say, problems with BIOS drive numbering will be fixed.
This last example makes no use of LILO, but comes in handy if earlier LILO experiments go horribly wrong.
cp /system/zbeos /dev/disk/floppy/raw
This shell command creates a bootable floppy...you can climb down from the window now.
For some, a penny saved is a penny earned: partitioning is unnecessary and wasteful. You can create a Be file system across the bare disk with a shell command like this:
mkbfs /dev/disk/ide/0/master/raw
The MBR exists, the partition boot sector doesn't. Implications are left as an exercise for the reader. Reader? Hello?
URL References:
ftp://sunsite.unc.edu/pub/Linux/system/boot/lilo/ ftp://sunsite.unc.edu/pub/Linux/system/hardware/memtest86-1.4.tar.gz
Greetings, I'm a new face in Newsletter articles. I'm Steven with a "v." (Not to be confused with Steve Sakoman or Stephen Beaulieu.)
I'm part of the long dark arm of Quality Assurance here at Be. We find bugs. We reproduce bugs. We investigate some of the bugs you submit. We don't fix them or assign priorities.
Given my rich experience with all sort of bugs, that's what I'll be talking about in today's article.
While we do our best to hunt down bugs and clear them up, there are bound to be a few that escaped in our latest release. You can help out in the bug wars in a variety of ways. The first and foremost is by writing a good bug report.
Whether you're a registered developer who submits bugs via the special bug submission page on our web site, or you're a regular user who wants to see an annoying bug you found get squashed, your bug is important. To give your bug the best possible chance of being fixed, here are some guidelines to keep in mind when you submit a bug report.
Make sure your bug is reproducible. Try it again on your hardware. Try it on someone else's hardware if you can. If you think it's related to a specific piece of hardware try to swap it out to verify, assuming that it was a hardware fault.
Make sure the bug is reproducible when you've rebooted cleanly. (This would clarify that the death of the app_server when you used Telnet was actually caused by a third-party Foobar program that failed inelegantly in another Workspace, and it started gobbling up resources, which started a slow spiral of death, which happened to culminate when you ran Telnet that day, rather than just being a Telnet bug.)
If the bug is reproducible, remember to include your hardware and software configuration information, including what version of BeOS you are running, and what other applications need to be running to reproduce it. Keep it simple, try to make the bug reproduce in as few steps as possible.
Write the bug report in painfully clear and concise steps. Elaborate on everything and anything you think might help us reproduce your problem. Doug Wright spoke about this Newsletter Issue 101, Developer Workshop: Bring Your Own Bugs:
www.be.com/aboutbe/benewsletter/Issue101.html#Workshop
If you're getting a debug terminal, or you have serial debugging output enabled, you'll want to install your xMAP files. They're on the CDs in optional/xmaps. They'll also be on your hard disk if you installed the optional items, though they will still need to be installed.
The kernel_*.xMAP
file goes in
/system
, and the other
xMAP
files go in
/system/lib
. You'll need
to reboot to make use of them. They're also
useful in your own debugging, so they're good to install anyway.
They're included on Release 3 for both PPC and Intel, and were on the
PR2 CD. They're different for every release, so keep that in mind.
They need to match the libraries, applications, and kernels you have
available. If the xMAP
file isn't current, delete it—it's less
confusing than having an old one.
How do you tell if an xMAP
file is not current? Because of the way
things are built and the fact that we do not date-stamp the world,
xMAP
files have slightly different times than the output file (shared
libraries, applications, the kernel, etc.). Typically, the date-stamp
of the xMAP
files will be after the output file, though this might not
always be the case. If you take them from the BeOS CD you installed
from, they will be valid.
If you're getting a debug terminal we could also happily use the
output of the "sc" command. These are useful to us, regardless of
whether xMAP
s were installed. (It's just
easier for us if the xMAP
s
were installed, 'tis all.)
If you're getting dropped in to the kernel debugger (which I'll talk about shortly) we'll want the output of more commands. First, we'll want "sc", then "regs", and finish off with a nice "ps". This could get long, and most of the time we need only one if you've proven to yourself that it's a reproducible bug.
To be sure you haven't overlooked anything, after you've typed in the bug, but before you've submitted it, reread it and add more information so that it's Reproducible, Clear, and Concise, and Has All the Proper Information.
On rare occasions something leaks out about serial debugging. Serial debugging output is a low-level diagnostic/debugging tool used by the kernel engineers. It has all sorts of commands, all listed with the total available documentation on them, "help".
A BeOS machine to debug.
A machine capable of receiving serial communication at the appropriate settings.
Software that can communicate to serial ports at the appropriate settings, typically anything that could be used to connect to dial up BBSes and shell accounts (not necessary for dumb terminals). Connect works great if your serial output is going to another BeOS machine.
One of the following: a Mac serial cable, a null-modem cable, or serial cable with a null-modem adapter.
A strange bug or stranger interest which requires you to see the serial debug output.
On BeBoxes serial debugging goes out the serial4 port. On PPC machines serial debugging goes out the modem port. On Intel serial debugging goes out serial1.
19200 baud
8 bits
No parity
1 Stop bit
No flow control
Make sure the machine that's going to receive serial debug output is turned on and has the appropriate software running.
Make sure your cables are connected.
Start up the BeOS machine you need to test.
Depending on the receiving hardware, do one of the following:
On BeBoxes press the F1 key and hold it, just after the BeOS logo starts to swirl.
On PPCs press the "delete" key and hold it, just after the BeOS logo starts to swirl. (Please note: This is the "delete" key that's labeled "backspace" on PC-style keyboards.)
On Intel press the F1 key and hold it after you first see the boot-loader message.
If you did it right, you should start seeing debug output, before you get the BeOS boot menu.
If you got the boot menu and no output, you either pressed the key too late, or (more likely) you don't have your cable or comm settings quite right.
If you didn't get the boot menu at all, you probably pressed the key too early. (You might also have a rare keyboard which doesn't send the key back as expected under this circumstance.) You could try breaking into the kernel debugger, but I recommend rebooting and trying again, as it's much easier to check your settings when you have serial debugging on and are in the boot menu.
To check to see that you are connected properly, Select "Rescan" from the menu. If you are currently connected properly, you should start receiving a stream of messages.
Select your boot volume and continue as normal.
Provided that you have the appropriate cables already installed, (preferably verified that they are functioning properly), you can drop into the kernel debugger at any time.
Why drop in to the kernel debugger? A number of reasons, actually, including the following:
You want serial debug output, but you forgot to start it at boot time.
Your machine is hung, has died, or otherwise has ceased functioning, and you want to see what programs were running so you know who to blame.
You suspect your app_server has died, and have no way to Telnet in and verify. (This might be the case if your display wasn't updating, or had gone whacko, though it was still doing hard drive accesses.)
The serial debug output says something bad happened, and you want to gather some more information to send in your bug report.
On BeBoxes press the right-most button on the front of the case. On Power Macs press Command+Power On. On Intel machines press Alt+SysReq (also known as Alt+PrintScreen, this may change in the future).
ps - shows a process listing
run - just shows the currently running processes
regs - shows what the registers contain
sc - does a stack crawl on the current process.
c - continue executing/leave kernel debugging land
help - the documentation for kernel debugging land
Please note: Most, if not all, of the shell commands are designed for use by kernel engineers. The commands above are all you'll ever need to use, unless explicitly told otherwise by someone at Be.
Some of the available commands can make your machine lock up, lose data, and otherwise cause problems, so experiment at your own risk.
Typically, we only want the interesting bits of output. If something's repeated 10,000 times we might want to see one line of what it says, but we don't want to see all 10,000 repeated lines of it.
We are interested in all conditions that cause the machine to lock up, entirely or partially.
For degrading conditions or high debug-output conditions, we are interested in the point at which the output turns from good-normal debug output to the error output.
If the output starts streaming quickly and you know the machine is dying, break into the debugger as soon as possible. This allows you to show us the boundary between the good/bad debug output. (Please note, however, that it's normal to get a lot of output while booting, or while rescanning the device buses.)
Most of the time, giving us the last five lines of the debug output, the output of "sc", and a reproducible test case provides enough output that we'll be able to know what the problem is, (if you want to be extra safe or if the bug is hard to reproduce, include the output of "reg" and "sc"), and often tells us where the problem is and possibly even how to fix it.
The better the test case, (remember: reproducible, simple, clear, and concise), the less you need to worry about the serial debugging output, and the easier it is for us to track down the bugs and fix them.
This week we have the pleasure of fielding a slew of questions regarding
BBitmap
. These questions come to us courtesy of a little known
development house, B&B Technologies, down in Highland, Texas. Lead
developer Ms. Lolita grouses:
To summarize:
In most respects, a BBitmap
is nothing more than a buffer of pixel
data. Setting individual pixels is best performed with the pointer to
the buffer returned by Bits()
. All you DOS people, think segment
0xA000.
Drawing into a BBitmap
may be facilitated by
attaching a BView
,
which in no way hampers your ability to directly manipulate the buffer
by hand.
Importing data from an existing source can be handled by SetBits()
,
but this function is, unfortunately, not as handy as it might seem at
first, due to constraints on the allowable color spaces.
Last week, Netscape made more noise with its endorsement of Linux than it did by posting Navigator source code on the specially created www.mozilla.org site. The official media (The New York Times and others) took notice. They praised the creator, Linus Torvalds, and Linux users who have made Linux into a successful grassroots Unix, with an installed base running into the millions worldwide. Five million is the most commonly used number, but this is difficult to judge, given the product's freeware roots.
Listening to the radio as I drove around the UCSD campus in the rain this past week-end, I noticed that NPR gave prominent play to the event and to the open source software movement. It is nice to see Linux taken seriously by the mainstream media. Perhaps this signals the arrival of Linux on the establishment radar. Seen in this light, if Netscape were to use Linux as their OS platform for enterprise applications, its newfound acceptance would be validated.
Besides its considerable technical virtues, Linux makes much cultural sense for Netscape. The Internet culture is rooted in Unix and in libertarian sharing of software. What better fit for the "liberated" Navigator than an equally unshackled version of Unix? One observer contends the Net runs on free software: sendmail, bind, apache and perl, now we have Linux and Navigator.
In one respect, this results in a re-federation of a very diverse set of people and code around the two related goals of proliferating free software and fighting Microsoft's dominance. Salon has an interesting piece on the topic at www.salonmagazine.com/21st/?st.ne.fd.mnaw. In it, Eric S. Raymond, coauthor of The New Hacker's Dictionary, makes an important distinction between "free" and "open"; while one may not always agree with his perspective, it makes for useful and entertaining reading.
[On a closely related issue, eagle-eyed developers have questioned our adherence to the Open Software pact in relevant parts of our recent release such as the bootloader. Specifically, if we use or modify public domain code, are we putting that source code back into the public domain as we are obligated to do? We're investigating this as I write this and will take whatever corrective action is necessary.]
Back to Netscape, endorsing Linux would complete the manoeuvre started with opening Navigator sources in response to Microsoft's bundling a free browser with Windows. The move accentuates the polarization of the software world. On the right, Microsoft, its proprietary products and its well-publicized business practices, on the left, open software constantly improved by the myriad supporting and sharing programmers.
But, for Netscape, Linux could offer advantages beyond the technical, cultural or political. Consider Netscape's predicament in the enterprise space where Windows NT has been making rather successful inroads. Whenever Netscape succeeds with a product running on Windows NT, there is the risk Microsoft will "climb up their tail pipe," so to speak, by bundling a "free" competitor with the OS as they did with the browser. With Linux, there's nothing of that nature to fear—no one will try and use their control of the OS platform to diminish the prospects of a successful Netscape application.
Is this idle speculation? Perhaps, but note the accent in recent public pronouncements on "open" standards. A little less obvious is the de-emphasis on the "official" Java, because of platform control issues, again. On the Mozilla site, at www.mozilla.org/blue-sky/jvm.html, there is a clear appeal: "It's critical that Mozilla embed a robust, fast, free JVM as quickly as possible. There is no robust, fast, free JVM."
We'd like that too.