Once upon a time I wrote an article about using USB from user space. Those tools are handy for prototyping and many simple projects, but often you need a real USB driver (perhaps to fit into an existing driver model like mice, keyboards, or audio devices). We'll take a look at the gritty details and some important rules for USB drivers. A sample driver that can dump raw audio data to USB speakers is provided to illustrate the points below.
This article refers to the V2 version of the bus manager that will ship in BeOS 5. The sample code includes header files for this new interface. Please note that this code will not operate correctly under R4.x (and will not even build if you use the system headers).
You can get the source code and follow along if you like: <ftp://ftp.be.com/pub/samples/drivers/usb_speaker.zip>.
The USB 1.1 Specification and specific device class specifications are available online in PDF format: http://www.usb.org/developers/docs.html.
The Universal Serial Bus is far more dynamic than some of the other busses we support under BeOS. Users can and will hot plug and unplug devices. The dull old "load driver," "look for devices," "publish devices," techniques will not work very well here. We're using USB as a test bed for the next revision of the BeOS driver model, which will treat all busses as dynamic entities and be better prepared for other interesting device events, like power management.
When a USB driver loads, it's expected to register itself with the bus
manager using
.
The name of the driver must be
provided (so the bus manager can reload it on demand), as well as a list
of supported devices. This takes the form of an array of
usb_support_descriptor structures. Each entry represents a pattern that
may have a value (or the wild card 0) for the class, subclass, protocol,
vendor, and device fields. The sample driver provided supports USB audio
devices (all devices that have a class value of 1).
usb
->register_driver()
Once registered, the driver must request notification of devices that it
may be interested in. A set of notify_hooks (currently device_added
and
device_removed
) are installed using
. These hooks
must be uninstalled before the driver unloads using
usb
->install_notify()
.
usb
->uninstall_notify()
Once the hooks are installed, device_added()
will be called every time a
device that matches (in its configuration_descriptor or an interface
descriptor) the pattern in one of the supplied support descriptors
appears. If there are already such devices plugged in, these calls will
occur before install_notify
returns.
The device_added()
hook receives a usb_device pointer (the opaque handle
that represents a particular USB device on the bus). The driver may
further investigate the device (for example, our sample only supports
audio devices that support 16-bit stereo streams) and indicate whether it
wants to keep it.
If B_OK
is returned, the driver may use the usb_device handle and any
associated resources until device_removed()
is called. Once
device_removed()
returns, the driver MAY NOT use these resources again.
The cookie
parameter of device_added
provides a value that will be passed
to device_removed
. If device_added()
returns
B_ERROR
, the driver receives
no further notification and may not use the usb_device.
When
is called, any devices that still exist are
"removed" (device_removed is called).
usb
->uninstall_notify()
The sample driver maintains a list of devices it knows about (by way of
device_added()
calls). The list is protected by a lock and also generates
the list of device names when publish_devices()
is called. Each device
entry contains information needed for transfers, a count of how many
instances of it are open, and a unique number allocated on creation. When
a device is removed, its entry is removed from the list, but only
actually removed from memory if the open count is zero.
As mentioned above, each device is represented by a usb_device pointer
(an opaque object). The properties of a device, its configurations,
interfaces, and endpoints are described by various descriptors defined by
the USB specification (Chapter 9, in particular, is very useful). The
header USB_spec.h
contains constants and structures for the various
standard descriptors.
Every device has at least one—and possibly several—configurations.
may be used to obtain a configuration_info
object that describes a particular configuration.
usb
->get_nth_configuration()
is used to select a configuration and inform the
device of that choice. A configuration value of usb
->set_configuration()
NULL
means "unconfigure."
returns the current configuration (or usb
->get_configuration()
NULL
if
the device is not configured).
The configuration_info structure contains the configuration descriptor, as well as an array of interface_list structures. Each interface_list represents one interface, which may have several alternates. An alternate interface often provides a slightly different version of the same resource (for example, an audio device may provide 16-bit stereo, 8-bit mono, and several other formats).
The interface_list also indicates the currently active interface (the
default is always alt[0]). This value may only be set by
and must never be changed by hand. IMPORTANT:
usb
->set_alt_interface()
set_alt_interface()
MUST happen before
set_configuration()
—this is a
limitation in the current USB stack.
Each interface has a descriptor, a number of endpoints (which may be zero), and a list of "generic" descriptors—descriptors specific to the interface (audio interfaces provide a descriptor that describes the audio data format, for example).
Each endpoint structure includes the endpoint descriptor (which indicates
the type, direction, max bandwidth, as per the USB spec) as well as a
pointer to a usb_pipe. Initially, every endpoint's usb_pipe pointer is
NULL
, but when a device is configured (with
), the
pipes of the active interfaces are valid handles used to send or receive
data.
usb
->set_configuration()
Here's an example of a typical audio device (usually the audio streaming interface has several alternates with various other supported formats):
usb_device configuration[0] interface[0] (class 1, subclass 1: Audio Control) alternate[0] interface[1] (class 1, subclass 2: Audio Streaming) alternate[0] (default, uses no bandwidth) alternate[1] endpoint[0] (isochronous out, 56 bytes per packet max) generic[0..2] (audio descriptors, format 8-bit mono) alternate[2] endpoint[0] (isochronous out, 224 bytes per packet max) generic[0..2] (audio descriptors, format 16-bit stereo) interface[2] (class 3: Human Interface (volume buttons)) alternate[0] endpoint[0] (interrupt in, 1 byte per packet max) generic[0] (HID descriptor)
One very important thing to remember with USB drivers is that you can't
use stack-allocated structures with the queue_*()
IO functions. They are
completed in a service thread that doesn't have access to the stack that
the operations were initiated from. If you use stack-allocated
structures, your driver will crash. Also, the current stack requires that
data buffers for bulk, isochronous, and interrupt transactions be
contiguous in physical memory. This limitation may be lifted in a future
version, but for now, be careful.
Every device has a pipe to endpoint 0, the default control pipe. The
call is used to send control requests (which involve
an 8-byte command sent from the host, optionally followed by some number
of bytes sent to or received from the device). Control requests are used
to configure devices, clear error conditions, and do other housekeeping.
Many of the convenience functions provided by the bus manager (set/clear
feature, get_descriptor, and so on) result in control requests.
usb
->send_request()
For any other type of transaction you'll need a usb_pipe object, which
will be allocated to each endpoint in each active interface when
set_configuration()
succeeds as described above.
The
and
usb
->queue_interrupt()
calls may be used to
enqueue requests to interrupt or bulk endpoints, respectively. A callback
must be provided to allow the stack to inform your driver when the
requests complete. The direction of data transfer is dictated by the
direction of the endpoint (as detailed in its descriptor). The callback
is called on completion of the transaction (with success or failure) or
if the transfer is cancelled with
usb
->queue_bulk()
.
usb
->cancel_queued_transfers()
Isochronous transfers are more complicated, and require that the driver inform the stack ahead of time so that adequate resources may be dedicated. Since isochronous transfers provide guaranteed bandwidth, the stack needs to pre-allocate various transfer descriptors specific to the host controller to insure that everything is handled in a timely fashion.
is employed to configure an isochronous pipe in
this fashion. It uses the provided buffer count to determine how many
requests may be outstanding at once, the duration in milliseconds to
determine the number of frames of data that will be provided, and the
sample size to insure that frames are always a correct multiple of the
sample size.
usb
->set_pipe_policy()
usb->queue_isochronous() starts an isochronous transfer on the next frame. While bulk and interrupt transfers provide a simple actual_length parameter to their callback, which indicates how much data has been sent or received (in the event of a short transfer), isochronous pipes use run-length encoding records to describe which data actually made it intact (since it's possible that only bits and pieces of an inbound isochronous stream arrived intact). The USB_rle.h header has extensive comments explaining these structures.
In the current stack, it takes 10ms for the first isochronous transfer on a pipe to start. Provided you keep queuing up additional transfers, there will be one packet per frame.
Isochronous transactions get dedicated bandwidth and happen every frame. Delivery is not guaranteed, but timing is.
Interrupt transactions are scheduled to happen on a polling interval of N milliseconds (no less often than once every N frames). They give the device (for example, a mouse) an opportunity to send some data (up to the specified max packet size) or a NAK, indicating no data is ready.
Control requests actually consist of two or three transactions—a setup phase to send the 8-byte control message, an optional data phase for the device to send or receive data, and an acknowledge phase where the data (or lack thereof) is acknowledged or an error is signaled via a STALL condition.
Bulk transactions guarantee delivery but not timing—they use as much bandwidth as is available (and not allocated to interrupt or isochronous transfers) to send or receive data.
Every endpoint specifies a max packet size—the USB stack will never
violate this size and will break up longer queued requests into as many
individual packets as required to complete the request. The exception is
isochronous transactions, which must declare buffering information ahead
of time with set_pipe_policy()
A full USB Audio driver is a fairly complex beast—the sample provided
works with USB Audio Spec-compliant devices, but only if they provide a
16-bit stereo interface. The sample only handles 44KHz sample rates --
you can use it by copying a raw 44.1KHz, 16-bit stereo audio file
(perhaps a CD track saved to disk) to
/dev/misc/usb_speaker/0
, provided
you have a compatible USB audio device plugged in.
This week's article is all about making use of Drag'n'Drop with BeOS, an apparently simple task that has some unseen (but useful) complications. This is going to be a "high-level" article. By some strange coincidence, I've just finished writing the detailed documentation for drag'n'drop, which should be on the Be Web Site soon, so you can look there for code examples, and all the picky details.
First, a bit of history, which will be particularly important to those of you who have been programming BeOS for a long time, and think that the way you did drag'n'drop way back when is the way you should do it now. Here's the history:
In earlier versions of the BeOS, dragging an object from one place to another corresponded to sending a message from one place to the other.
Now, it's different.
I did say a _bit_ of history.
The old method of drag'n'drop was simple. I might, for example, have some
text selected in a word-processing application. Assuming the application
supported drags, I could, onscreen, drag the text selection "onto" a
Tracker folder. Internally, this would result
in a BMessage
object being
sent from the application to Tracker; among other things, the message
would contain a string corresponding to the dragged text. Tracker, upon
receiving that message, could inspect it, say, "Aha! Here's some text
data", and make a clipping file out of it.
Now, what happens if the user drags that text onto the trashcan, rather than onto the desktop or a Tracker folder? Intuitively, it would make sense that this should do the same thing as hitting the "delete" key--it should delete the selected text from within the application. However, this requires an action on the part of the word-processing application; Tracker can't "reach in" and yank out that text. Unfortunately, since (under the old-style drag'n'drop) only one message was sent, from the application to Tracker, there's no way Tracker can request this action of the word processor. Too bad.
A different facet of this communications problem can be seen by thinking about a slightly different scenario; dragging selected text from a Web browser to a word processor. There are two obvious formats the browser could provide this text in; it could provide the text as "plain" text, losing all of the formatting imparted by the HTML markup tags, or it could provide it as "HTML text", keeping the HTML markup tags. Unfortunately, the browser has no way to decide between the two without knowing more about the receiving application; if the receiver "understands" HTML, the browser should probably provide all of the text, including the markup, but if the receiver doesn't understand HTML, then it would make more sense for the browser to strip out the HTML tags before bundling up the text and sending it off to the word processor. Figuring this out requires some sort of negotiation between the browser and word processor, or more generally, between the sending application and receiving application.
The two above shortcomings of "old-style" drag'n'drop--the inability of the receiver to request actions of the sender, and the inability to negotiate an "optimal" data format between the sender and receiver--led to the more recent drag'n'drop protocol, negotiated drag'n'drop.
Negotiated drag'n'drop causes (up to) three BMessage
s to be sent, for
every drag'n'drop the user performs. Here's what the dataflow looks like:
<http://www-classic.be.com/aboutbe/benewsletter/resources/dragdrop.jpg>
The whole purpose of negotiated drag'n'drop is (surprise!) negotiation, and the communication to accomplish that negotiation is accomplished via a predetermined set of message fields in the messages that are flung back and forth--these fields, and what goes in them, are the negotiation protocol. I won't go into all of the gory details (that's what the reference material is for), but let's go back to our example of dragging text from a Web browser to a word processor to see some of the things that might happen.
When the user drags and drops the selected text from the web browser to
the word processor, the first message (which is called the _drag message_
in the diagram) will be sent. This message will contain a bunch of
information in various message fields; of interest to us right now are
the "be:actions" and "be:types" message fields. "be:actions" contains a
list of actions that the sending application (the web browser, in this
case) is willing to perform on the dragged data. In this example,
"be:actions" might contain the B_COPY_TARGET
and B_TRASH_TARGET
values,
indicating that the web browser is willing to either provide a copy of
the dragged data, or to delete it. "be:types" is a list of MIME types
indicating the data formats that the browser is willing to provide the
text in; obvious values for this list are "text/plain" for plaintext, and
"text/html" for text including HTML markup, but what goes in here depends
solely on what the sender application is willing and able to provide.
Note that we don't send any "real" data in the drag message, i.e. the
text itself isn't sent yet.
Now, the word processor receives the drag message, and according to the
negotiation protocol, it knows it can make two choices from the options
provided in the drag message; it can choose the action it will request
the browser to perform, and it can request the data format it wants data
in. If our word processor were really a trashcan, it might request the
B_TRASH_TARGET
action, but it isn't, so it will
request the B_COPY_TARGET
action, stating that it wants a copy of the data for its own use. As for
the data format, well, that's up to it and what it can handle; if it can
parse HTML, then choosing a format of "text/html" will let it preserve
some or all of the formatting of the original text, otherwise a choice of
"text/plain" will still get the text itself across, and is simple to use.
Let's say it chooses "text/plain"; it then bundles up the chosen
B_COPY_TARGET
action and the chosen "text/plain" data format into a new
BMessage
, and shoots that message back to the browser as the negotiation
message.
When the browser gets the negotiation message, it figure out what the
other application wants, and (in this case) puts a plaintext copy of the
selected text into a new BMessage
, and sends that message--the data
message--back to the word processor. End of process.
Actually, you can do even more than this with negotiated drag'n'drop. You
can pass data through a file, rather than through a BMessage
, convenient
for passing around those 100 megabyte video clips that don't fit too well
in memory. You can exercise finer control over which data formats are
used for the interchange of data. And, by the way, the Translation Kit
works just great when used in conjunction with drag'n'drop.
But all that deserves its own explanation, in the reference material. (Which is to say, I want to wrap this article up). Once you know the basics of the drag'n'drop negotiation, and the reasons for that negotiation (as you do now, right?), all of the details fall into place easily.