Like many engineers, I get a lot of my motivation from having something new on my desk, a gadget, widget (or toy, as my wife prefers to call it). I recently bought a new Sony laptop, which like most on the market uses a Neomagic graphics controller. BeOS runs just fine on this notebook. Mine even came with two 4GB partitions on the 8GB drive--made just for multiple OS's!
BeOS supports the Neomagic family of graphics controllers from the old Neomagic 128 (aka 2070) through to the newer Neomagic 256AV (aka 2200) and the variants in between. Unfortunately, that support is achieved through the "old" style app server add-ons, rather than the newer (and blessed) app server accelerants. Whilst it works it's not "current" and doesn't do some things I'd like - such as centering the display when the resolution is less than the size of the LCD panel, or supporting DPMS to turn the display off when not in use. So, given the hardware and motivation, here's an example of an app server accelerant you can write for the Neomagic chips.
Oh, yes...one last "wrinkle" - there's no public documentation available on these chips, so our information is derived from the sources contained within the XFree86 X-Windows server for Un*x, and from previous experience with programming VGA controllers (the sort of stuff I'm afraid you can't easily find in a single book; you have to get "apprenticed" to a master and learn it that way 8-)
This is a fairly large project, so it'll be split up into a few newsletter articles. Initially we'll look at the significant parts of the kernel level hardware driver, then the parts of the app server accelerant necessary for basic framebuffer access. We'll also cover a couple of bells and whistles: subsequent articles will deal with adding support for a hardware cursor and handling hardware acceleration of 2D operations.
For video cards the kernel driver is a fairly simple beast (in most cases), providing a mechanism to establish that a card is installed and to map sections of the cards Memory into the address space for others to access. The driver may also need to provide a couple of convenience routines for accessing VGA registers and possibly an interrupt handler, if you need to do something special, such as catch the vertical blanking moment.
Taking a look through the source for the driver (driver.c
)
init_hardware()
simply walks the list of PCI devices to establish whether
something significant to this driver is installed. init_driver()
is a
little more interesting. It allocates some per-driver storage and creates
a lock that we can use to serialize access to the driver. Once again, it
looks for installed hardware by calling probe_devices()
. Finally it
optionally adds some extra commands to the kernel debugger. This is a
useful trick that lets us print out significant information from within
the kernel debugger; it's perfectly feasible to use this technique from
any driver.
If the hardware is not installed, init_driver()
will never be called and
the commands never added to the kernel debugger. probe_devices()
again
walks the list of installed hardware,building a
/dev/graphics
entry for
each piece of hardware that is significant to this driver. The name of
the entry follows a set of rules defined by Trey Boudreau, which allows
ls /dev/graphics
to be decoded by humans to determine installed hardware
and its location on the PCI bus. publish_devices()
simply uses the array
created during probe_devices()
to tell the OS what should be presented in
/dev/graphics
.
Only two of the remaining functions deserve any special
attention: the open and control functions. The first time this device is
opened (generally by the app-server) nm_opened()
will call first_open()
.
first_open()
will allocate an area that will be shared between the driver
and the accelerant. It will contain pointers to certain parts of the card
and other useful pieces of configuration information, such as the
cardinfo structure globally referenced as "ci" in both the driver and the
accelerant. first_open()
will also call map_physical_memory()
to map the
device's framebuffer RAM and memory-mapped registers into accessible
memory so that the app server and its accelerant can write to them. You
would set up and install an interrupt handler inside first_open()
if you
needed one.
nm_control()
is where most of the work during operation is performed.
This function provides one standard ioctl()
selector and three private
ones. The standard selector required for all graphics device drivers is
B_GET_ACCELERANT_SIGNATURE
. During the app server startup it scans and
opens each entry in /dev/graphics
and then calls ioctl()
with
B_GET_ACCELERANT_SIGNATURE
for the opened device. Graphics devices should
then return a string with the name of the accelerant for this device. The
app server will load this accelerant and continue the initialization
process through the accelerant.
The three private ioctl()
selectors in our driver are actually quite
standard and will probably be needed by any driver. Since the card_info
structure is stored in an area, if the ID of that area is known it can be
cloned and then shared by multiple applications. It's this mechanism that
allows the accelerant to share data with the driver. Lastly, we need to
be able to read and write VGA registers. These registers reside in the
bottom of system memory (at locations such as 0x3d4). They can only be
written to or read from by kernel software, so we provide two ioctl()
selectors that allow arbitrary access for byte reads and writes to this
area.
The remainder of the kernel driver is largely standard or even
unimplemented. Reading and writing to the graphics device doesn't make as
much sense as you might at first think. Other functions simple reverse he
work of their partners (nm_open()
and nm_close()
). For more information
on general driver "things," Todd Thomas's recent articles on USB drivers
(Parts 1 and 1.01) are a good source
(Developers' Workshop: Writing a USB Video Camera Driver, Part 1)
and
(Developers' Workshop: Writing a USB Video Camera Driver, Part 1.01).
And now on to the app server accelerant. An accelerant has one primary
entry point - get_accelerant_hook()
- which returns the addresses of
other functions as requested by the app server.
The functions are in five groups:
Accelerant initialization and "cloning" - used mostly by the Game Kit
for BWindowScreen
Mode configuration - determining supported screen resolutions and depths, setting a given mode, handling the palette for 8-bit (256 color) modes, and handling Display Power Management System (DPMS).
Cursor management - setting the cursor shape and mask, and moving the location of the cursor onscreen.
Synchronization - reporting which of the app server's BLIT requests have completed.
2D Acceleration - carrying out BLIT requests to use hardware features (if available) to perform fast fills or copy areas of the screen.
At the early stage of accelerant development only the Init function and
mode onfiguration are mandatory in order to actually see the desktop on
screen. If the clone functions are not implemented BWindowScreen
will not
work, and without an implementation of hardware cursor functions
BDirectWindow
will not work fully. Also, there is a significant
performance penalty for not mplementing the 2D acceleration features,
since the system CPU will have o do work that could be off-loaded onto
the graphics chip. But hey! You'll see "something"!
So let's plough on with the work. Our init()
function
uses the GETGLOBALS
ioctl selector to retrieve the area_id of the Card Info structure the
hardware river setup. We call clone_area()
with that area_id so that we
have a shared area of memory that the driver and accelerant can
communicate through. We then set some basic information in that
structure, such as memory size, etc. We also build a list of available
display modes that we can use later during the mode setting process.
Mode configuration can often seem the most complicated part of the
process, and can also be the most frustrating, since problems at this
stage ill nearly always result in either no display at all or an
unreadable display. The app server calls four functions to handle mode
configuration: _get_accelerant_mode_count()
to find how many different
modes are available; _get_mode_list()
to return a complete list of all
available modes; and _propose_display_mode()
when small adjustments have
been made to a previously chosen mode. These adjustments are the sort of
thing that would result from using the slider in the Screen Preferences
panel to adjust the refresh rate, and the call can actually be ignored -
as is the case in this driver. Finally _set_display_mode()
does all the
real work. Ultimately _set_display_mode()
calls SetupCRTC()
in this
sample driver to get all the work done. Let's walk through this function
and look at what it does.
Much current graphics chip programming still carries lots of legacy from VGA (and even earlier) display standards. All of our clock values (particularly the value for horizontal timing) need to be converted from "pixel" values to "character" values; this means dividing by 8, since characters are 8 pixels wide. For simplicity, we'll also extract the values for vertical timing into local variables too, to make the code a little easier to read.
In order to have the electron beam of a CRT paint an image, the beam (a pretty analog device) needs a certain amount of setup time before and after the actual drawing area. This could be considered to be when the beam is scanning in space outside the edge of the picture tube. There is also a certain amount of time required for the beam to retrace from the right edge of the screen to the left edge, and from the bottom to the top. So hsyncstartand hsyncend correspond to "edges" of the area. hdisp is the actual time the beam is "painting" visible data, htotal is the total time required to draw one line and retrace back to the beginning of the next.
Driving all of this is a clock that needs to be correctly programmed to give the appropriate pulse rate to drive the whole analog system. Fortunately for us, in this example we're driving an LCD screen directly (as opposed to driving an external LCD panel through a standard VGA connector), so programming the clock is unnecessary (we'll cover it in another article dealing with simultaneous display on the LCD and an external monitor).
The code should be self-explanatory but there are some points worth noting: The Neomagic chip uses the basic standard VGA registers and adds some extensions. VGA registers are programmed by writing an index value to a particular location and then writing the 1-byte data value to the location after the index. After the data has been written the index is assumed to have incremented by one, so writing to the data location a second time will write to the next register in the table. There are four main groups of registers. The attribute and sequence egisters generally contain the standard values seen in this sample. The CRT controller and Extension registers need to be programmed on a per mode basis in many cases, and it is these registers that are often extended for additional features (as can be seen in the Neomagic chip).
Finally, there are the DAC and palette registers. One section of code in the driver appears to read and discard a value from the DAC register four times before writing 0 to the register. This apparent waste of reads is necessary to "uncover" a particular DAC register before writing a value to it. The palette of the Neomagic chips is 6 bits each for Red, Green, and Blue (except in 24bpp mode when it's 8 bits). The size of the per "gun" values of a palette varies from chip to chip; however if your image seems too dark, washed out, or has a particular colour cast to it, you're probably not shifting the palette values appropriately before writing to them.
After talking extensively about the indexed style of VGA registers, it's worth noting that some vendors have switched to more "regular" memory-mapped registers for their PCI/AGP video controllers (thankfully). 3Dfx is one good example of this, and is a company which has published its register specifications.
Our example driver is at this point functional: it displays a picture, allows you to chose different display modes, and even supports DPMS. However, it's not fast, and doesn't have a hardware cursor. Those are both topics for a future article in the next couple of weeks.