Issue 5-6, February 9, 2000

Be Engineering Insights: A Device Driver Is Worth a Thousand Words

By Brian Swetland

Once upon a time I wrote an article about using USB from user space. Those tools are handy for prototyping and many simple projects, but often you need a real USB driver (perhaps to fit into an existing driver model like mice, keyboards, or audio devices). We'll take a look at the gritty details and some important rules for USB drivers. A sample driver that can dump raw audio data to USB speakers is provided to illustrate the points below.

This article refers to the V2 version of the bus manager that will ship in BeOS 5. The sample code includes header files for this new interface. Please note that this code will not operate correctly under R4.x (and will not even build if you use the system headers).

You can get the source code and follow along if you like: <ftp://ftp.be.com/pub/samples/drivers/usb_speaker.zip>.

The USB 1.1 Specification and specific device class specifications are available online in PDF format: http://www.usb.org/developers/docs.html.

The USB Bus Manager

The Universal Serial Bus is far more dynamic than some of the other busses we support under BeOS. Users can and will hot plug and unplug devices. The dull old "load driver," "look for devices," "publish devices," techniques will not work very well here. We're using USB as a test bed for the next revision of the BeOS driver model, which will treat all busses as dynamic entities and be better prepared for other interesting device events, like power management.

When a USB driver loads, it's expected to register itself with the bus manager using usb->register_driver(). The name of the driver must be provided (so the bus manager can reload it on demand), as well as a list of supported devices. This takes the form of an array of usb_support_descriptor structures. Each entry represents a pattern that may have a value (or the wild card 0) for the class, subclass, protocol, vendor, and device fields. The sample driver provided supports USB audio devices (all devices that have a class value of 1).

Once registered, the driver must request notification of devices that it may be interested in. A set of notify_hooks (currently device_added and device_removed) are installed using usb->install_notify(). These hooks must be uninstalled before the driver unloads using usb->uninstall_notify().

Once the hooks are installed, device_added() will be called every time a device that matches (in its configuration_descriptor or an interface descriptor) the pattern in one of the supplied support descriptors appears. If there are already such devices plugged in, these calls will occur before install_notify returns.

Adding and Removing Devices

The device_added() hook receives a usb_device pointer (the opaque handle that represents a particular USB device on the bus). The driver may further investigate the device (for example, our sample only supports audio devices that support 16-bit stereo streams) and indicate whether it wants to keep it.

If B_OK is returned, the driver may use the usb_device handle and any associated resources until device_removed() is called. Once device_removed() returns, the driver MAY NOT use these resources again. The cookie parameter of device_added provides a value that will be passed to device_removed. If device_added() returns B_ERROR, the driver receives no further notification and may not use the usb_device.

When usb->uninstall_notify() is called, any devices that still exist are "removed" (device_removed is called).

Putting Theory into Practice

The sample driver maintains a list of devices it knows about (by way of device_added() calls). The list is protected by a lock and also generates the list of device names when publish_devices() is called. Each device entry contains information needed for transfers, a count of how many instances of it are open, and a unique number allocated on creation. When a device is removed, its entry is removed from the list, but only actually removed from memory if the open count is zero.

Devices, Interfaces, Endpoints, and Pipes

As mentioned above, each device is represented by a usb_device pointer (an opaque object). The properties of a device, its configurations, interfaces, and endpoints are described by various descriptors defined by the USB specification (Chapter 9, in particular, is very useful). The header USB_spec.h contains constants and structures for the various standard descriptors.

Every device has at least one—and possibly several—configurations. usb->get_nth_configuration() may be used to obtain a configuration_info object that describes a particular configuration. usb->set_configuration() is used to select a configuration and inform the device of that choice. A configuration value of NULL means "unconfigure." usb->get_configuration() returns the current configuration (or NULL if the device is not configured).

The configuration_info structure contains the configuration descriptor, as well as an array of interface_list structures. Each interface_list represents one interface, which may have several alternates. An alternate interface often provides a slightly different version of the same resource (for example, an audio device may provide 16-bit stereo, 8-bit mono, and several other formats).

The interface_list also indicates the currently active interface (the default is always alt[0]). This value may only be set by usb->set_alt_interface() and must never be changed by hand. IMPORTANT: set_alt_interface() MUST happen before set_configuration()—this is a limitation in the current USB stack.

Each interface has a descriptor, a number of endpoints (which may be zero), and a list of "generic" descriptors—descriptors specific to the interface (audio interfaces provide a descriptor that describes the audio data format, for example).

Each endpoint structure includes the endpoint descriptor (which indicates the type, direction, max bandwidth, as per the USB spec) as well as a pointer to a usb_pipe. Initially, every endpoint's usb_pipe pointer is NULL, but when a device is configured (with usb->set_configuration()), the pipes of the active interfaces are valid handles used to send or receive data.

Here's an example of a typical audio device (usually the audio streaming interface has several alternates with various other supported formats):

usb_device
  configuration[0]
    interface[0]
      (class 1, subclass 1: Audio Control)
      alternate[0]
    interface[1]
      (class 1, subclass 2: Audio Streaming)
      alternate[0]
        (default, uses no bandwidth)
      alternate[1]
        endpoint[0]
        (isochronous out, 56 bytes per packet max)
        generic[0..2]
        (audio descriptors, format 8-bit mono)
      alternate[2]
        endpoint[0]
        (isochronous out, 224 bytes per packet max)
        generic[0..2]
        (audio descriptors, format 16-bit stereo)
    interface[2]
      (class 3: Human Interface (volume buttons))
      alternate[0]
        endpoint[0]
        (interrupt in, 1 byte per packet max)
        generic[0]
        (HID descriptor)

Sending and Receiving Data

One very important thing to remember with USB drivers is that you can't use stack-allocated structures with the queue_*() IO functions. They are completed in a service thread that doesn't have access to the stack that the operations were initiated from. If you use stack-allocated structures, your driver will crash. Also, the current stack requires that data buffers for bulk, isochronous, and interrupt transactions be contiguous in physical memory. This limitation may be lifted in a future version, but for now, be careful.

Every device has a pipe to endpoint 0, the default control pipe. The usb->send_request() call is used to send control requests (which involve an 8-byte command sent from the host, optionally followed by some number of bytes sent to or received from the device). Control requests are used to configure devices, clear error conditions, and do other housekeeping. Many of the convenience functions provided by the bus manager (set/clear feature, get_descriptor, and so on) result in control requests.

For any other type of transaction you'll need a usb_pipe object, which will be allocated to each endpoint in each active interface when set_configuration() succeeds as described above.

The usb->queue_interrupt() and usb->queue_bulk() calls may be used to enqueue requests to interrupt or bulk endpoints, respectively. A callback must be provided to allow the stack to inform your driver when the requests complete. The direction of data transfer is dictated by the direction of the endpoint (as detailed in its descriptor). The callback is called on completion of the transaction (with success or failure) or if the transfer is cancelled with usb->cancel_queued_transfers().

Isochronous transfers are more complicated, and require that the driver inform the stack ahead of time so that adequate resources may be dedicated. Since isochronous transfers provide guaranteed bandwidth, the stack needs to pre-allocate various transfer descriptors specific to the host controller to insure that everything is handled in a timely fashion.

usb->set_pipe_policy() is employed to configure an isochronous pipe in this fashion. It uses the provided buffer count to determine how many requests may be outstanding at once, the duration in milliseconds to determine the number of frames of data that will be provided, and the sample size to insure that frames are always a correct multiple of the sample size.

usb->queue_isochronous() starts an isochronous transfer on the next frame. While bulk and interrupt transfers provide a simple actual_length parameter to their callback, which indicates how much data has been sent or received (in the event of a short transfer), isochronous pipes use run-length encoding records to describe which data actually made it intact (since it's possible that only bits and pieces of an inbound isochronous stream arrived intact). The USB_rle.h header has extensive comments explaining these structures.

In the current stack, it takes 10ms for the first isochronous transfer on a pipe to start. Provided you keep queuing up additional transfers, there will be one packet per frame.

About USB Transactions

Isochronous transactions get dedicated bandwidth and happen every frame. Delivery is not guaranteed, but timing is.

Interrupt transactions are scheduled to happen on a polling interval of N milliseconds (no less often than once every N frames). They give the device (for example, a mouse) an opportunity to send some data (up to the specified max packet size) or a NAK, indicating no data is ready.

Control requests actually consist of two or three transactions—a setup phase to send the 8-byte control message, an optional data phase for the device to send or receive data, and an acknowledge phase where the data (or lack thereof) is acknowledged or an error is signaled via a STALL condition.

Bulk transactions guarantee delivery but not timing—they use as much bandwidth as is available (and not allocated to interrupt or isochronous transfers) to send or receive data.

Every endpoint specifies a max packet size—the USB stack will never violate this size and will break up longer queued requests into as many individual packets as required to complete the request. The exception is isochronous transactions, which must declare buffering information ahead of time with set_pipe_policy()

The Sample Driver (usb_speaker)

A full USB Audio driver is a fairly complex beast—the sample provided works with USB Audio Spec-compliant devices, but only if they provide a 16-bit stereo interface. The sample only handles 44KHz sample rates -- you can use it by copying a raw 44.1KHz, 16-bit stereo audio file (perhaps a CD track saved to disk) to /dev/misc/usb_speaker/0, provided you have a compatible USB audio device plugged in.


Developers' Workshop: The Quick Version of Drag'n'Drop

By Ken McDonald

This week's article is all about making use of Drag'n'Drop with BeOS, an apparently simple task that has some unseen (but useful) complications. This is going to be a "high-level" article. By some strange coincidence, I've just finished writing the detailed documentation for drag'n'drop, which should be on the Be Web Site soon, so you can look there for code examples, and all the picky details.

First, a bit of history, which will be particularly important to those of you who have been programming BeOS for a long time, and think that the way you did drag'n'drop way back when is the way you should do it now. Here's the history:

  1. In earlier versions of the BeOS, dragging an object from one place to another corresponded to sending a message from one place to the other.

  2. Now, it's different.

I did say a _bit_ of history.

Why Did We Complicate Drag'n'Drop?

The old method of drag'n'drop was simple. I might, for example, have some text selected in a word-processing application. Assuming the application supported drags, I could, onscreen, drag the text selection "onto" a Tracker folder. Internally, this would result in a BMessage object being sent from the application to Tracker; among other things, the message would contain a string corresponding to the dragged text. Tracker, upon receiving that message, could inspect it, say, "Aha! Here's some text data", and make a clipping file out of it.

Now, what happens if the user drags that text onto the trashcan, rather than onto the desktop or a Tracker folder? Intuitively, it would make sense that this should do the same thing as hitting the "delete" key--it should delete the selected text from within the application. However, this requires an action on the part of the word-processing application; Tracker can't "reach in" and yank out that text. Unfortunately, since (under the old-style drag'n'drop) only one message was sent, from the application to Tracker, there's no way Tracker can request this action of the word processor. Too bad.

A different facet of this communications problem can be seen by thinking about a slightly different scenario; dragging selected text from a Web browser to a word processor. There are two obvious formats the browser could provide this text in; it could provide the text as "plain" text, losing all of the formatting imparted by the HTML markup tags, or it could provide it as "HTML text", keeping the HTML markup tags. Unfortunately, the browser has no way to decide between the two without knowing more about the receiving application; if the receiver "understands" HTML, the browser should probably provide all of the text, including the markup, but if the receiver doesn't understand HTML, then it would make more sense for the browser to strip out the HTML tags before bundling up the text and sending it off to the word processor. Figuring this out requires some sort of negotiation between the browser and word processor, or more generally, between the sending application and receiving application.

The two above shortcomings of "old-style" drag'n'drop--the inability of the receiver to request actions of the sender, and the inability to negotiate an "optimal" data format between the sender and receiver--led to the more recent drag'n'drop protocol, negotiated drag'n'drop.

A Negotiated Settlement; There will be Peace in Our Time

Negotiated drag'n'drop causes (up to) three BMessages to be sent, for every drag'n'drop the user performs. Here's what the dataflow looks like:

<http://www-classic.be.com/aboutbe/benewsletter/resources/dragdrop.jpg>

The whole purpose of negotiated drag'n'drop is (surprise!) negotiation, and the communication to accomplish that negotiation is accomplished via a predetermined set of message fields in the messages that are flung back and forth--these fields, and what goes in them, are the negotiation protocol. I won't go into all of the gory details (that's what the reference material is for), but let's go back to our example of dragging text from a Web browser to a word processor to see some of the things that might happen.

When the user drags and drops the selected text from the web browser to the word processor, the first message (which is called the _drag message_ in the diagram) will be sent. This message will contain a bunch of information in various message fields; of interest to us right now are the "be:actions" and "be:types" message fields. "be:actions" contains a list of actions that the sending application (the web browser, in this case) is willing to perform on the dragged data. In this example, "be:actions" might contain the B_COPY_TARGET and B_TRASH_TARGET values, indicating that the web browser is willing to either provide a copy of the dragged data, or to delete it. "be:types" is a list of MIME types indicating the data formats that the browser is willing to provide the text in; obvious values for this list are "text/plain" for plaintext, and "text/html" for text including HTML markup, but what goes in here depends solely on what the sender application is willing and able to provide. Note that we don't send any "real" data in the drag message, i.e. the text itself isn't sent yet.

Now, the word processor receives the drag message, and according to the negotiation protocol, it knows it can make two choices from the options provided in the drag message; it can choose the action it will request the browser to perform, and it can request the data format it wants data in. If our word processor were really a trashcan, it might request the B_TRASH_TARGET action, but it isn't, so it will request the B_COPY_TARGET action, stating that it wants a copy of the data for its own use. As for the data format, well, that's up to it and what it can handle; if it can parse HTML, then choosing a format of "text/html" will let it preserve some or all of the formatting of the original text, otherwise a choice of "text/plain" will still get the text itself across, and is simple to use. Let's say it chooses "text/plain"; it then bundles up the chosen B_COPY_TARGET action and the chosen "text/plain" data format into a new BMessage, and shoots that message back to the browser as the negotiation message.

When the browser gets the negotiation message, it figure out what the other application wants, and (in this case) puts a plaintext copy of the selected text into a new BMessage, and sends that message--the data message--back to the word processor. End of process.

Even More...

Actually, you can do even more than this with negotiated drag'n'drop. You can pass data through a file, rather than through a BMessage, convenient for passing around those 100 megabyte video clips that don't fit too well in memory. You can exercise finer control over which data formats are used for the interchange of data. And, by the way, the Translation Kit works just great when used in conjunction with drag'n'drop.

But all that deserves its own explanation, in the reference material. (Which is to say, I want to wrap this article up). Once you know the basics of the drag'n'drop negotiation, and the reasons for that negotiation (as you do now, right?), all of the details fall into place easily.

Creative Commons License
Legal Notice
This work is licensed under a Creative Commons Attribution-Non commercial-No Derivative Works 3.0 License.