Welcome to Kernel Debugging Land...
In and Out
Unless the system entered KDL by itself, you can normally do that by invoking the keyboard shortcut Alt-SysReq-D. Note that in KDL your keyboard may not work. PS/2 keyboards always do, USB keyboards connected via UHCI controllers do only, if one has entered KDL via the keyboard shortcut at least once. USB OHCI is not supported at the moment.
Once the kernel debugger is running, the normal activities of the system are
completely halted. Until it is exited, no thread will make any progress on any
CPU, and hardware interrupts are disabled. So while in KDL (Kernel Debugging
Land) one can leisurely examine an unmoving snapshot of the whole system.
After using one of the commands to leave the kernel debugger
(continue
, exit
, es
), the system
continues its normal operation. If it had encountered
a serious problem before (like a page fault in the kernel), the odds are that
it will immediately drop back into KDL, though. In this case the commands
reboot
or shutdown
are the only way out. They
cannot shutdown the system orderly though, meaning that data not written back
to disk yet will be lost.
The Basic Infrastructure
Command Line Editing and Output
The kernel debugger has a simple command line interface. One can enter a command, and, after it has been executed, one is prompted for the next input. Basic command line editing is supported, that is HOME, END, the Cursor Left/Right keys, as well as DELETE and BACKSPACE work as expected. Ctrl-K clears the line beginning at the cursor position. Even a short command history is available. Cursor Up/Down navigate through the previously entered commands. Page Up/Down move to the previous/next line in the history that starts with the characters before the cursor.
Pressing TAB after typing a part of a command name will try to complete the
name of the command, if possible. Ambiguities will be pointed out. Pressing TAB
when the cursor is positioned after the command name and the space following it
will invoke the command with the argument --help
, which will, in
most cases, print usage information for the command. Editing of the command
line can then be continued.
The way the text that has been printed "scrolls", when the bottom of the
screen is reached, needs some getting used to, since it doesn't scroll at all,
but the cursor simply wraps around to the top of the screen, overwriting the
oldest output. By default the output of a debugger command stops just before
it overwrites what it has printed before, waiting for a key to be pressed
before continuing. Pressing Q will cause the remaining output of that command
to be discarded silently. The paging
command toggles whether the
output stops in such cases or continues without asking the user.
Help
When looking for a command with a certain functionality without knowing its
name, the help
command can actually help. It lists all available
commands, each with a short description. Giving help
the first
characters of a command name, the list will be restricted to the matching
commands (e.g. "help a
" lists all commands starting with "a").
A growing number of commands support being invoked with the argument
--help
. Those that do print a more detailed description how
they can be used. As mentioned before, pressing TAB while editing the command
line for such a command will print that info, too.
A Simple Calculator
Since kernel debugger sessions often involve lots of numbers flying around,
basic arithmetic expressions are supported. The expr
command
will evaluate them for you and print the result.
kdebug> expr (0x898afe36 + 17 * 4 - 0x8014d184) / 32 4960615 (0x4bb167)
Internally the numbers are represented as unsigned 64 bit integer values (and
expr
prints them that way), which means that negative numbers
are shown as huge positive numbers. Most operations involving negatives work
fine nonetheless, exceptions are division (/
) and modulo
(%
).
kdebug> expr 7 + 3 * -4 18446744073709551611 (0xfffffffffffffffb) kdebug> expr -(7 + 3 * -4) 5 (0x5) kdebug> expr -6 / -3 0 (0x0)
People familiar with the C language will know the unary *
operator.
It dereferences a given address pointer, i.e. looks up the value that is stored
at that address. The kernel debugger supports that operator, too. By default
it reads four bytes at the given address and interprets them as a number, but
the number of bytes to be read (1, 2, 4, or 8) can be specified explicitely
in curly braces. An error message will be printed, when trying to dereference
an address that does not exist.
kdebug> expr *0x80000000 1179403647 (0x464c457f) kdebug> expr *{2}0x80000000 17791 (0x457f) kdebug> expr *0x17 failed to dereference address 0x00000017, at position: 0, in expression: *0x17
While it is nice to be able to print the value of an arithmetic expression,
it is often not necessary to do that, since one usually just wants to pass
the value to a command, and the value of an expression can be provided
as an argument to a command by simply putting that expression in parentheses.
For example the db
command, which takes an address and a number
of bytes and prints that many bytes read from the memory starting at the given
address, can be invoked like this:
kdebug> db (0x80000000 + 100) (6 * 4) [0x80000064] ................ 09 00 00 00 09 00 00 00 04 00 00 00 01 00 00 00 [0x80000074] ........ 01 00 00 00 00 00 00 00
Variables
Since it's not possible to scroll the output on the screen, values that
might be of interest later will probably already be overwritten when they are
needed. Fortunately variables can be defined to hold values for later use.
They can be set by the assignment operator (=
), modified by the
operators +=
, -=
, *=
, /=
,
%=
, and unset via unset
. The vars
command prints all variables that are defined.
kdebug> a = 7 kdebug> b = 3 + a kdebug> a -= 2 kdebug> expr c = b - a 5 (0x5) kdebug> unset a kdebug> vars b: 10 (0xa) c: 5 (0x5) _thread: 2148328320 (0x800ce380) _threadID: 1 (0x1) _team: 2424574464 (0x90841200) _teamID: 1 (0x1) _cpu: 0 (0x0) _: 5 (0x5)
In the example several additional variables with a name starting with an
underscore are listed. Those are called temporary variables and in this case
(except _
) they have been set automatically when the kernel debugger
was entered. _thread
is the pointer to the current thread's
kernel internal management structure, _threadID
its ID,
_team
the current team's kernel structure, _teamID
its ID, and _cpu
the index of the CPU the kernel debugger is
running on.
Several commands also set temporary variables as a side effect, e.g.
sem
, which prints information about a semaphore, also sets
the variable _owner
(the ID of the team owning the semaphore)
and others. The temporary variable _
is by convention used as
the return value of a command (if it has one). In the example above
expr
set it to 5, which was the value of the expression it
evaluated.
The number of temporary variables is limited. When the limit has been
reached and another variable is defined, the least recently used variable
is evicted automatically. This holds only for temporary variables, though;
non temporary variables have a limit too, but they are fully controlled by the
user. When that limit is hit, no new variable can be defined until another one
has been unset
explicitly.
The Most Popular Commmands
sc
sc
("stack crawl"), alias bt
("back trace"), alias
where
, prints a stack trace of the thread with the given ID or the
address of the thread's kernel structure. With no argument given, the current
thread on the current CPU is selected.
kdebug> sc stack trace for thread 0x53 "event loop" kernel stack: 0x90746000 to 0x9074a000 user stack: 0x70145000 to 0x70185000 frame caller <image>:function + offset 90749c54 (+ 52) 8008187f <kernel>:invoke_debugger_command + 0x00cf 90749c88 (+ 64) 80082620 <kernel>:_ParseCommand__16ExpressionParserRi + 0x01f8 90749cc8 (+ 48) 80082012 <kernel>:EvaluateCommand__16ExpressionParserPCcRi + 0x01de 90749cf8 (+ 228) 80083734 <kernel>:evaluate_debug_command + 0x0088 90749ddc (+ 64) 80080680 <kernel>:kernel_debugger_loop__Fv + 0x0184 90749e1c (+ 32) 800811b5 <kernel>:kernel_debugger + 0x00c9 90749e3c (+ 192) 800810e1 <kernel>:panic + 0x0029 90749efc (+ 48) 90771e19 </boot/beos/system/add-ons/kernel/bus_managers/ps2>:ps2_interrupt + 0x00cd 90749f2c (+ 64) 80027798 <kernel>:int_io_interrupt_handler + 0x00ac 90749f6c (+ 48) 8008f75a <kernel>:hardware_interrupt + 0x007a 90749f9c (+ 12) 800928a6 <kernel>:int_bottom_user + 0x005a (nearest) iframe at 0x90749fa8 (end = 0x9074a000) eax 0x13 ebx 0x2cdc78 ecx 0x180f7300 edx 0x29cd3 esi 0x6a edi 0x92e3f054 ebp 0x70184acc esp 0x90749fdc eip 0x2900f5 eflags 0x202 vector: 0x21, error code: 0x0 90749fa8 (+ 0) 002900f5 <_APP_>:_CopyToFront__C11HWInterfacePUcUlllll + 0x0279 70184acc (+ 176) 0028f653 <_APP_>:CopyBackToFront__11HWInterfaceRC5BRect + 0x028b 70184b7c (+ 48) 0028f3bd <_APP_>:Invalidate__11HWInterfaceRC5BRect + 0x0049 70184bac (+ 416) 00288c66 <_APP_>:CopyRegion__13DrawingEngineP7BRegionll + 0x0686 70184d4c (+ 208) 00247839 <_APP_>:MoveWindowBy__7DesktopP11WindowLayerffl + 0x02e5 70184e1c (+ 112) 002808ac <_APP_>:MouseMoved__11WindowLayerP8BMessageG6BPointPlb + 0x0244 70184e8c (+ 96) 002434ea <_APP_>:Filter__11MouseFilterP8BMessagePP11EventTargetPlT1 + 0x016a 70184eec (+ 144) 0024ef46 <_APP_>:_EventLoop__15EventDispatcher + 0x02ae 70184f7c (+ 48) 0024f657 <_APP_>:_event_looper__15EventDispatcherPv + 0x001f 70184fac (+ 48) 00783264 <libroot.so>:_get_next_team_info + 0x005c (nearest) 70184fdc (+ 0) 70184fec 762:event loop_53_stack@0x70145000 + 0x3ffec
The stack trace is so interesting, because it shows exactly in which function the thread is currently, from what function that function has been called, and so on. Also shown are the addresses of the so called stack frames, which are the base addresses of the respective functions' parameters and local variables.
Roughly in the middle the example stack trace above is interrupted by an iframe (interrupt frame). An iframe ends up on the stack when a userland thread performs a syscall (system call), or an exception (page fault, division by zero) or a hardware interrupt occurs. The bottommost iframe shown for a userland thread is always the one for the transition to the kernel. In the example we can see that it happened due to a hardware interrupt, which is unsurprising as it was caused by pressing the Alt-SysReq-D combo. An iframe captures the processor registers at the time, so together with a disassembly of the respective function it is possible to reconstruct, what exactly is happening there.
teams
teams
lists all existing teams, with their name, ID, and parent
team.
kdebug> teams team id parent name 0x90841200 1 0x00000000 kernel_team 0x90b0c200 128 0x909df600 sh 0x909c0200 68 0x90906800 input_server 0x908f4e00 41 0x90841200 registrar 0x909c0e00 78 0x90841200 Tracker 0x908f9600 47 0x90841200 debug_server 0x909d3600 79 0x90841200 Deskbar 0x90906200 48 0x90841200 net_server 0x90906800 49 0x90841200 app_server 0x909df600 81 0x90841200 Terminal 0x909e4000 82 0x90841200 media_server 0x90aaf200 114 0x909e4000 media_addon_server 0x909e4a00 84 0x90841200 midi_server 0x909f0200 85 0x90841200 print_server 0x90906e00 61 0x90841200 syslog_daemon
Usually this is the first command used, when looking for a problem in a certain program, since the listing yields its team ID.
threads
threads
lists all threads of a specified team, or all existing
threads, if no argument is given.
kdebug> threads 78 thread id state sem/cv cpu pri stack team name 0x90b2d000 138 waiting 3921 - 5 0x99be1000 78 TrashWatcher 0x90b14000 139 waiting 3950 - 5 0x99be5000 78 ClipboardRefsWatcher 0x90b28000 140 waiting 3961 - 5 0x99bd9000 78 AutoMounter 0x90915000 78 waiting 2768 - 10 0x906d1000 78 Tracker 0x90b1e000 144 waiting 3995 - 15 0x99bf5000 78 w>Desktop 0x90abc000 115 waiting 3220 - 15 0x99b26000 78 w>Tracker Status 0x90b43000 147 zzz 120 - 5 0x99d00000 78 TrackerTaskLoop
For each thread a compact one line info is printed. It consists of the thread's kernel structure pointer, its ID, current state (running, waiting, etc.), the ID of the semaphore or condition variable it is currently waiting on (if any), and other info.
sem
sem
prints information about a semaphore, which can be specified
by ID, address of its kernel structure, or by name (though semaphore names need
not be unique).
debug> sem 2768 SEM: 0x9930e700 id: 2768 (0xad0) name: 'AppLooperPort' owner: -1 count: -1 queue: 78 last acquired by: 78, count: 1 last released by: 78, count: 1
Of special interest, particularly when analyzing deadlock situations, are the "last acquired" and "last released" rows. "last acquired" shows the ID of the thread that last acquired the semaphore successfully. This number is negated, if the semaphore was released afterwards. In a case of deadlock looking what the last acquiring thread is doing, will usually give some clue about what's the problem. In a simple situation, it would wait for a semaphore that the first thread acquired last.
Userland deadlocks involving benaphore-style BLocker
s (the
default) can't be analyzed that easily, since the respective sempahores will
not have been acquired by the threads holding the locks. In most cases stack
traces of the team's threads will help understanding those deadlocks as well,
though.
Kernel Tracing
A quite powerful debugging feature of the Haiku kernel is kernel tracing. The kernel can reserve a specified amount of memory as tracing buffer and use it to record information when certain events occur. In the kernel debugger the recorded entries can be filtered and printed, thus providing an analysis tool for tracking down bugs and other problems. One of the great advantages of kernel tracing over usual debug output into a file or via a serial connection is that it comes with a significantly lower performance penalty, which in most cases is in fact even less than the error of measurement. Also very nice is, that the data are available right in the kernel debugger, so that one can analyze them in combination with the halted system.
Enabling It
Kernel tracing is a compile time feature, that is it has to be enabled when
building Haiku. This can be done in the header file
build/user_config_headers/tracing_config.h
(create the folder and copy build/config_headers/tracing_config.h
which contains the default settings) by changing the value of
the ENABLE_TRACING
macro to the value 1
. Furthermore
one needs to specify, what kinds of information shall be traced, which is done
by setting the respective macros to a value greater than 0:
BLOCK_CACHE_TRANSACTION_TRACING
- Enables tracing of block cache transactions. This is most likely only of interest for developers trying to track down block cache or file system bugs.
BMESSAGE_TRACING
- Enables tracing of BMessage sending. Whenever a BMessage is sent, the message destination (team, port, token), the delivery method (direct, remote), and the message's "what" field will be recorded.
KERNEL_HEAP_TRACING
- Enables tracing of memory allocations in the kernel. For every call to
malloc()
,free()
, orrealloc()
in the kernel the given parameters and the result will be recorded. PAGE_ALLOCATION_TRACING
- Enables tracing of page allocations in the kernel. It will be recorded how pages are allocated, freed, reserved, stolen, or cleaned by the page scrubber.
RUNTIME_LOADER_TRACING
- Enables tracing of runtime loader activities. It will be recorded what executables, libraries, and add-ons are loaded.
SIGNAL_TRACING
- Enables tracing of signals. It will be recorded when a signal is sent, handled, masked, or a signal handler is installed.
SYSCALL_TRACING
- Enables tracing of syscalls. It will be recorded when a userland thread
entered the kernel to execute a syscall (including the syscall's parameters)
and when it is about to exit the kernel again. Usually one also wants to define the macro
SYSCALL_TRACING_IGNORE_KTRACE_OUTPUT
. It skips tracing of the _kern_ktrace_output() syscall, which itself adds a tracing entry. TEAM_TRACING
- Enables tracing of team operations. It will be recorded when a team is
fork()
ed orexec()
ed and when a team waits for its children. USER_MALLOC_TRACING
- Enables tracing of memory allocations in userland applications. For every call to
malloc()
,free()
, orrealloc()
, etc. the given parameters and the result will be recorded. Note, that this has a considerable performance impact.
The MAX_TRACE_SIZE
macro defines the size of the tracing
buffer (in bytes). The default of 1 MB is a little tight, if a lot of
information is recorded. The tracing buffer is a ring buffer, i.e. when full
it will start overwriting the oldest entries.
The traced
Command
The traced
kernel debugger command prints the recorded tracing
information. Since usually there will be many recorded entries, there are
several parameters to specify which subset shall be printed. The entries in
the tracing buffer are indexed; the oldest one in the buffer has index 1.
The first parameter of traced
is the index of an entry and the
second parameter the number of entries to be printed.
kdebug> traced 120 10 120. [ 75] 62203012: syscall pre: _kern_read_port_etc(0x1025, 0x7003ff58, 0x7003ff24, 0x0, 0x0, 0x0) 121. [ 75] 62203037: syscall post: _kern_read_port_etc() -> 0x0 122. [ 75] 62203330: syscall pre: _kern_release_sem(0x857) 123. [ 75] 62203352: syscall post: _kern_release_sem() -> 0x0 124. [ 75] 62203545: syscall pre: _kern_ktrace_output(0x7003f9ac) 125. [ 75] 62203558: user: BMessage send remote: team: -1, port: 4136, token: 0, message: '_MMV' 126. [ 75] 62203567: syscall post: _kern_ktrace_output() -> 0x20 127. [ 75] 62203683: syscall pre: _kern_write_port_etc(0x1028, 0x706a7070, 0x1801c100, 0x121, 0x8, 0x1) 128. [ 75] 62203724: syscall post: _kern_write_port_etc() -> 0x0 129. [ 75] 62203871: syscall pre: _kern_port_buffer_size_etc(0x1025, 0x0, 0x0) printed 10 entries within range 120 to 129 (10 of 20655 total, 189725 ever)
The common information listed for each entry are the index, (in square brackets) the ID of the thread logging the entry, and the system time (in microseconds) when the entry was logged. The following data are specific to the type of the entry. A final line at the end of the listing tells us how many entries have been printed and from what index range (later examples will show that these information are not always that obvious), as well as how many entries are currently in the buffer and how many have been logged so far.
The second parameter to the command can be omitted; it defaults to 30. If given and negative, it changes the meaning of the first parameter. Instead of specifying the first entry to print, it will refer to the last one, and the negated value of the second parameter defines the number of entries to list.
kdebug> traced 120 -5 116. [ 117] 62202452: syscall pre: _kern_write_port_etc(0x1025, 0x1, 0x0, 0x0, 0x8, 0x0) 117. [ 117] 62202490: syscall post: _kern_write_port_etc() -> 0x0 118. [ 117] 62202637: syscall pre: _kern_ioctl(0x13, 0x2773, 0x70143f54, 0x58cca4) 119. [ 75] 62202872: syscall post: _kern_port_buffer_size_etc() -> 0x0 120. [ 75] 62203012: syscall pre: _kern_read_port_etc(0x1025, 0x7003ff58, 0x7003ff24, 0x0, 0x0, 0x0) printed 5 entries within range 116 to 120 (5 of 20655 total, 189725 ever)
A special value for the first parameter is 0 (or any negative value). It refers to the last entry in the buffer, and furthermore causes the second parameter to be interpreted as if negative. This makes it easy to print the last recorded entries.
Looking through tens or hundreds of thousands entries using
traced
with only those two parameters is quite impractical.
Hence there's a simple but powerful filter mechanism. A filter is passed to
traced
as a list of additional parameters preceded by the keyword
filter
. It can consist of several basic filters which can be
combined via boolean operators to a complex expression. Those operators are
and
, or
, and not
, and they are used
in prefix notation, meaning that the operator comes first and is followed by
its one respectively two arguments. This needs a little getting used to, but
has the advantages that it doesn't require parentheses (which are already
reserved for numeric kernel debugger expressions) and that it keeps parsing
very simple. The basic filters are:
thread <thread id>
- Matches those entries recorded by the thread with the specified ID.
team <team id>
- Matches those entries recorded by a thread of the team with the specified ID.
#<pattern>
- Matches those entries whose printed representation contains the
specified case-sensitive pattern (actually not a real pattern, just a string).
If the string contains spaces it must be put in quotes (including the
#
).
Very useful in combination with filters is the optional third numeric
parameter of traced
. It specifies how many entries shall at most
be searched for matches (as opposed to the second parameter which limits the
number of entries to be printed). If omitted the same value as given for the
second parameter is used. The special value -1 means that all entries in the
buffer shall be searched, which is usually what one wants, but can potentially
take a long time, when the tracing buffer is large and contains many entries.
In the following example we search for entries from the team 41 containing the
text "BMessage", printing five at a time at maximum.
kdebug> traced 0 5 -1 filter and team 41 #BMessage 16337. [ 44] 70393163: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 18402. [ 44] 71396774: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 19248. [ 44] 72402239: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 19875. [ 44] 73404928: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 20489. [ 44] 74408450: user: BMessage send direct: port: 4, token: 1, message: 'rgir' printed 5 entries within range 16337 to 20655 (4319 of 20655 total, 189725 ever) kdebug> traced 6027. [ 44] 65359596: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 6635. [ 44] 66363060: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 8968. [ 44] 67366540: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 11422. [ 44] 68378551: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 13845. [ 44] 69386518: user: BMessage send direct: port: 4, token: 1, message: 'rgir' printed 5 entries within range 6027 to 16336 (10310 of 20655 total, 189725 ever) kdebug> 611. [ 44] 62346221: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 2770. [ 44] 63351796: user: BMessage send direct: port: 4, token: 1, message: 'rgir' 5155. [ 44] 64355661: user: BMessage send direct: port: 4, token: 1, message: 'rgir' printed 3 entries within range 1 to 6026 (6026 of 20655 total, 189725 ever)
The example also shows another feature of the traced
command.
When invoked without any parameters it continues the search of the previous
invocation. In this case the search continues backward (start index 0 implies
backward search), still listing at most five entries at a time. In this form
(and only in this form) traced
is continuable, meaning that just
hitting enter will invoke it again. This makes browsing through entries quite
convenient.
traced
supports some switch options which can be given before
the first numerical argument. One is --printteam
which causes the
ID of the team that logged the entry to be printed as well (after colon in the
square brackets). When --difftime
is specified only for the first
entry the absolute system time is printed. For subsequent entries the time
difference to the previously printed entry will be shown. While this is not
exactly a profiling option, it can at least be used to get a first
impression.
kdebug> traced --printteam 0 6 -1 filter #BMessage 87852. [ 69: 69] 25882947: user: BMessage send remote: team: 82, port: 8252, token: -2, message: ' 87960. [ 75: 69] 25952695: user: BMessage send remote: team: -1, port: 4136, token: 0, message: '_MUP' 87973. [ 88: 49] 25954926: user: BMessage send remote: team: 82, port: 28753, token: -2, message: '_MUP' 87980. [ 88: 49] 25955064: user: BMessage send remote: team: 80, port: 32872, token: -2, message: '_MUP' 88148. [ 141: 78] 26090266: user: BMessage send remote: team: -1, port: 28, token: -2, message: 'Iuim' 88161. [ 69: 69] 26090779: user: BMessage send remote: team: 78, port: 8245, token: -2, message: ' printed 6 entries within range 87852 to 88451 (600 of 88451 total, 88451 ever) kdebug> traced --difftime 0 7 -1 filter thread 43 88197. [ 43] 26159620: syscall pre: _kern_snooze_etc(0x186a0, 0x0, 0x8) 88254. [ 43] 100557: syscall post: _kern_snooze_etc() -> 0x0 88255. [ 43] 70: syscall pre: _kern_snooze_etc(0x186a0, 0x0, 0x8) 88406. [ 43] 100593: syscall post: _kern_snooze_etc() -> 0x0 88407. [ 43] 65: syscall pre: _kern_snooze_etc(0x186a0, 0x0, 0x8) 88440. [ 43] 100656: syscall post: _kern_snooze_etc() -> 0x0 88441. [ 43] 71: syscall pre: _kern_snooze_etc(0x186a0, 0x0, 0x8) printed 7 entries within range 88197 to 88451 (255 of 88451 total, 88451 ever)
An Example Debug Session
The previous sections introduced all that is needed to hunt down quite a
variety of bugs and problems. Not in all cases the kernel debugger is the
best choice, though. For easily reproducible bugs in userland applications simple
printf()
debugging or using gdb as a source level debugger might be
a better choice. Some problems, however -- particularly ones that are hard to
reproduce, lead to situations that make userland debugging impossible (e.g. when
not being able to start the Terminal anymore), or are race conditions that
disappear when adding too much debug output -- can often better be analyzed
with the kernel debugger.
This section documents a debug session that uncovers
bug #1745 which occurs
in revision 23864. It's the first
time I encountered the bug and, thanks to always having kernel tracing enabled
for syscalls, signals, teams, and BMessage
s, all the information
needed for the analysis were already available. The symptoms of the bug are that
right after booting the Deskbar hangs and Tracker doesn't even show up.
Here we go...
PANIC: keyboard requested halt. Welcome to Kernel Debugging Land... Running on CPU 0 kdebug> teams team id parent name 0x9083e200 1 0x00000000 kernel_team 0x908f0c00 36 0x9083e200 registrar 0x90992e00 69 0x9083e200 Tracker 0x908f5a00 40 0x9083e200 debug_server 0x909b2e00 72 0x9083e200 Deskbar 0x909cb000 73 0x9083e200 sh 0x90900000 43 0x9083e200 net_server 0x90900600 44 0x9083e200 app_server 0x909dd000 77 0x9083e200 Terminal 0x909dd600 78 0x9083e200 media_server 0x909f4400 80 0x9083e200 midi_server 0x90a40a00 81 0x9083e200 print_server 0x90afd200 115 0x909dd000 sh 0x909cba00 87 0x909cb000 ProcessController 0x90900c00 56 0x9083e200 syslog_daemon 0x90b08c00 121 0x909dd600 media_addon_server 0x909b1c00 61 0x90900600 input_server kdebug> tracker=69 kdebug> deskbar=72 kdebug> threads (tracker) thread id state sem/cv cpu pri stack team name 0x90b25800 128 waiting 3834 - 5 0x90784000 69 TrashWatcher 0x90ac7800 97 waiting 2974 - 15 0x906ed000 69 w>Tracker Status 0x90b40800 129 waiting 3866 - 5 0x90788000 69 ClipboardRefsWatcher 0x90b3f800 130 waiting 3877 - 5 0x9079b000 69 AutoMounter 0x90919000 69 waiting 2595 - 10 0x9063d000 69 Tracker 0x90b48000 133 waiting 3917 - 15 0x99a0f000 69 w>Desktop
teams
tells us that, although it didn't show up, Tracker is
actually running. According to threads
all of its usual threads
are there, too. Let's see, what is keeping the desktop window ("w>Desktop")
from showing any icons:
kdebug> sem 3917 SEM: 0x99318e70 id: 3917 (0xf4d) name: 'TrackerWindow' owner: 69 count: -1 queue: 133 last acquired by: 69, count: 1 last released by: 133, count: 1 kdebug> sc 133 stack trace for thread 133 "w>Desktop" kernel stack: 0x99a0f000 to 0x99a13000 user stack: 0x70145000 to 0x70185000 frame caller <image>:function + offset 99a12e48 (+ 32) 8002d352 <kernel>:context_switch__FP6threadT0 + 0x0026 99a12e68 (+ 64) 8002d5e6 <kernel>:scheduler_reschedule + 0x0272 99a12ea8 (+ 96) 8002ea5d <kernel>:switch_sem_etc + 0x039d 99a12f08 (+ 64) 8002f303 <kernel>:_user_acquire_sem_etc + 0x0027 99a12f48 (+ 96) 80091a13 <kernel>:x86_sysenter + 0x00ef (nearest) iframe at 0x99a12fa8 (end = 0x99a13000) eax 0xc ebx 0x766048 ecx 0x70184df0 edx 0xffff0104 esi 0xffffffff edi 0x7fffffff ebp 0x70184e2c esp 0x99a12fdc eip 0xffff0104 eflags 0x203 vector: 0x63, error code: 0x0 99a12fa8 (+ 0) ffff0104 70184e2c (+ 64) 002e7f7c <libbe.so>:_LockComplete__7BLooperP7BLooperlllx + 0x0034 70184e6c (+ 80) 002e7e89 <libbe.so>:_Lock__7BLooperP7BLooperlx + 0x01c1 70184ebc (+ 48) 002e6e81 <libbe.so>:Lock__7BLooper + 0x002d 70184eec (+ 128) 003be877 <libbe.so>:task_looper__7BWindow + 0x0137 70184f6c (+ 64) 002e82cb <libbe.so>:_task0___7BLooperPv + 0x0053 70184fac (+ 48) 006e9264 <libroot.so>:_get_next_team_info + 0x005c (nearest) 70184fdc (+ 0) 70184fec 1261:w>TrackerWindow_133_stack@0x70145000 + 0x3ffec kdebug> desktop=133
The semaphore the window thread is waiting for belongs to the BWindow
lock. The thread is in the window's message dispatcher loop, trying to lock
the window, so that it can dispatch the next message. Someone else already has
the lock and apparently doesn't release it. Unfortunately the lock in question
is a benaphore-style BLocker, so that sem
s "last acquired" output
usually isn't of much help. In this case it shows thread 69, Tracker's main
thread, as last acquirer. It may or may not be the actual owner of the lock; at
any rate it doesn't harm to have a closer look what the thread is up to:
kdebug> sc (tracker) stack trace for thread 69 "Tracker" kernel stack: 0x9063d000 to 0x90641000 user stack: 0x7efe7000 to 0x7ffe7000 frame caller <image>:function + offset 90640dd8 (+ 32) 8002d352 <kernel>:context_switch__FP6threadT0 + 0x0026 90640df8 (+ 64) 8002d5e6 <kernel>:scheduler_reschedule + 0x0272 90640e38 (+ 96) 8002ea5d <kernel>:switch_sem_etc + 0x039d 90640e98 (+ 64) 8002e692 <kernel>:acquire_sem_etc + 0x0026 90640ed8 (+ 80) 8002bd6c <kernel>:port_buffer_size_etc + 0x00ec 90640f28 (+ 32) 8002c859 <kernel>:_user_port_buffer_size_etc + 0x001d 90640f48 (+ 96) 80091a13 <kernel>:x86_sysenter + 0x00ef (nearest) iframe at 0x90640fa8 (end = 0x90641000) eax 0x8f ebx 0x766048 ecx 0x7ffe5f80 edx 0xffff0104 esi 0xffffffff edi 0x7fffffff ebp 0x7ffe5fac esp 0x90640fdc eip 0xffff0104 eflags 0x216 vector: 0x63, error code: 0x0 90640fa8 (+ 0) ffff0104 7ffe5fac (+ 48) 002eeea3 <libbe.so>:_SendFlattenedMessage__8BMessagePvlllx + 0x011b (nearest) 7ffe5fdc (+ 144) 002eec88 <libbe.so>:_SendMessage__C8BMessagelllP8BMessagexx + 0x01f8 7ffe606c (+ 80) 0030cbc7 <libbe.so>:SendMessage__CQ28BMessage7PrivatelllP8BMessagexx + 0x0043 7ffe60bc (+ 80) 002f5ea4 <libbe.so>:SendMessage__C10BMessengerP8BMessageT1xx + 0x0070 7ffe610c (+ 240) 0032c5d8 <libbe.so>:get_deskbar_frame__FP5BRect + 0x00c0 7ffe61fc (+ 144) 0060d2b4 <libtracker.so>:PlacePose__Q28BPrivate9BPoseViewPQ28BPrivate5BPoseR5BRect + 0x013c 7ffe628c (+ 192) 00608065 <libtracker.so>:CreatePoses__Q28BPrivate9BPoseViewPPQ28BPrivate5ModelPQ28BPrivate8PoseInfolPPQ28BPrivate5BPosebPlP5BRectT5 + 0x06a5 7ffe634c (+ 80) 0060783e <libtracker.so>:CreatePose__Q28BPrivate9BPoseViewPQ28BPrivate5ModelPQ28BPrivate8PoseInfobPlP5BRectT3 + 0x0056 7ffe639c (+ 96) 00613e5f <libtracker.so>:EntryCreated__Q28BPrivate9BPoseViewPC8node_refT1PCcPl + 0x018b 7ffe63fc (+ 256) 006076e1 <libtracker.so>:CreateVolumePose__Q28BPrivate9BPoseViewP7BVolumeb + 0x0111 7ffe64fc (+ 352) 00606fe6 <libtracker.so>:AddRootPoses__Q28BPrivate9BPoseViewbT1 + 0x0252 7ffe665c (+ 112) 00605f5f <libtracker.so>:AddPoses__Q28BPrivate9BPoseViewPQ28BPrivate5Model + 0x010f 7ffe66cc (+ 96) 006036a7 <libtracker.so>:InitCommon__Q28BPrivate9BPoseView + 0x0383 7ffe672c (+ 64) 006032d0 <libtracker.so>:Init__Q28BPrivate9BPoseViewPQ28BPrivate19AttributeStreamNode + 0x003c 7ffe676c (+ 80) 0057f832 <libtracker.so>:RestoreState__Q28BPrivate16BContainerWindow + 0x00ce 7ffe67bc (+ 224) 0057f390 <libtracker.so>:Init__Q28BPrivate16BContainerWindowPC8BMessage + 0x0920 7ffe689c (+ 128) 00592d6f <libtracker.so>:Init__Q28BPrivate11BDeskWindowPC8BMessage + 0x0237 7ffe691c (+ 528) 00649263 <libtracker.so>:ReadyToRun__Q28BPrivate8TTracker + 0x044b 7ffe6b2c (+ 528) 002dd27d <libbe.so>:DispatchMessage__12BApplicationP8BMessageP8BHandler + 0x0369 7ffe6d3c (+ 96) 002e8831 <libbe.so>:task_looper__7BLooper + 0x0299 7ffe6d9c (+ 64) 002db742 <libbe.so>:Run__12BApplication + 0x008a 7ffe6ddc (+ 416) 002009eb <_APP_>:main + 0x002f 7ffe6f7c (+ 48) 00200897 <_APP_>:_start + 0x005b 7ffe6fac (+ 48) 001007be 689:runtime_loader_seg0ro@0x00100000 + 0x7be 7ffe6fdc (+ 0) 7ffe6fec 674:Tracker_main_stack@0x7efe7000 + 0xffffec
Currently it is obviously waiting for a synchronous reply to a message
it just sent (_user_port_buffer_size_etc()
in the kernel part of
the stack trace). The stack trace also shows that the thread is still in the
BApplication
's ReadyToRun()
method and invoked
BDeskWindow::Init()
from there.
This is at least a good reason to lock the window. So who doesn't reply to the
message?
kdebug> traced 0 -10 -1 thread (tracker) 80228. [ 69] 71666364: syscall pre: _kern_set_port_owner(0x1029, 0x48) 80229. [ 69] 71666387: syscall post: _kern_set_port_owner() -> 0x0 80230. [ 69] 71666438: syscall pre: _kern_get_port_info(0x1029, 0x7ffe5f18) 80231. [ 69] 71666628: syscall post: _kern_get_port_info() -> 0x0 80232. [ 69] 71666716: syscall pre: _kern_ktrace_output(0x7ffe5b0c) 80233. [ 69] 71666727: user: BMessage send remote: team: -1, port: 4144, token: -2, message: 'PGET' 80234. [ 69] 71666754: syscall post: _kern_ktrace_output() -> 0x20 80235. [ 69] 71666769: syscall pre: _kern_write_port_etc(0x1030, 0x706a7070, 0x180ab8a0, 0x125, 0x8, 0x7fffffffffffffff) 80236. [ 69] 71666863: syscall post: _kern_write_port_etc() -> 0x0 80237. [ 69] 71666904: syscall pre: _kern_port_buffer_size_etc(0x1029, 0x8, 0x7fffffffffffffff) printed 10 entries within range 80228 to 90162 (9935 of 90162 total, 90162 ever) kdebug> port 4144 PORT: 0x80135900 id: 4144 name: "AppLooperPort" owner: 72 capacity: 100 read_sem: 2662 write_sem: 2663 read_sem count: 2 write_sem count: 98 total count: 7 kdebug> threads 72 thread id state sem/cv cpu pri stack team name 0x90ac1800 95 waiting 2637 - 5 0x906e5000 72 TrackerTaskLoop 0x90ad1800 101 waiting 2637 - 15 0x906fd000 72 w>Twitcher 0x909c9000 72 waiting 3275 - 10 0x90669000 72 Deskbar 0x90b06800 119 waiting 2637 - 5 0x90735000 72 Expando Window Watcher 0x90b07000 120 waiting 2637 - 15 0x90739000 72 w>Deskbar
As recorded by trace entry 80233, the message was sent to port 4144. The
owner of the port is team 72, the Deskbar, and the port itself is, according
to its name, the BApplication
looper port. The threads
listing for the Deskbar team looks suspicious. All threads save the main thread
wait for the same semaphore, which belongs to...
kdebug> sem 2637 SEM: 0x99309e70 id: 2637 (0xa4d) name: 'BLooperList lock' owner: 72 count: -4 queue: 120 119 101 95 last acquired by: 0, count: 0 last released by: 0, count: 0
... the BLooperList
lock. The BLooperList
is a list
of all BLooper
s of a team. It's main purpose is to allow the
invocation of certain methods (Lock()
,
PostMessage()
) on BLooper
pointers that are not
necessarily valid anymore.
Those methods simply look up the pointer in the BLooperList
before
doing anything with it that might cause an invalid memory access. The
list is protected by a lock, which must be held when working with the list and
also can be held for a short time to ensure that no looper will be deleted during
that time. The lock is an innermost lock, however, i.e. while holding it no
other lock may be acquired in order to prevent potential deadlocks. Being a
BLocker
and thus supporting nested locking, it is also clear that
none of the threads waiting for the lock is its current owner, which leaves only
the Deskbar main thread. Let's see, what it is doing and why it is waiting:
kdebug> sc 72 stack trace for thread 72 "Deskbar" kernel stack: 0x90669000 to 0x9066d000 user stack: 0x7efe7000 to 0x7ffe7000 frame caller <image>:function + offset 9066ce48 (+ 32) 8002d352 <kernel>:context_switch__FP6threadT0 + 0x0026 9066ce68 (+ 64) 8002d5e6 <kernel>:scheduler_reschedule + 0x0272 9066cea8 (+ 96) 8002ea5d <kernel>:switch_sem_etc + 0x039d 9066cf08 (+ 64) 8002f303 <kernel>:_user_acquire_sem_etc + 0x0027 9066cf48 (+ 96) 80091a13 <kernel>:x86_sysenter + 0x00ef (nearest) iframe at 0x9066cfa8 (end = 0x9066d000) eax 0xc ebx 0x7af048 ecx 0x7ffe66e0 edx 0xffff0104 esi 0xffffffff edi 0x7fffffff ebp 0x7ffe671c esp 0x9066cfdc eip 0xffff0104 eflags 0x203 vector: 0x63, error code: 0x0 9066cfa8 (+ 0) ffff0104 7ffe671c (+ 64) 00477bdc <libbe.so>:AcquireLock__7BLockerxPl + 0x0064 7ffe675c (+ 48) 004779b3 <libbe.so>:Lock__7BLocker + 0x002f 7ffe678c (+ 48) 003546bd <libbe.so>:__9BAutolockR7BLocker + 0x002d 7ffe67bc (+ 64) 0033ce30 <libbe.so>:AddMessage__13BMessageQueueP8BMessage + 0x0034 7ffe67fc (+ 48) 0032af20 <libbe.so>:AddMessage__Q28BPrivate20BDirectMessageTargetP8BMessage + 0x0060 7ffe682c (+ 144) 00337911 <libbe.so>:_SendMessage__C8BMessagelllxbR10BMessenger + 0x0319 7ffe68bc (+ 64) 00355b70 <libbe.so>:SendMessage__CQ28BMessage7PrivatelllxbR10BMessenger + 0x0040 7ffe68fc (+ 64) 0033ed01 <libbe.so>:SendMessage__C10BMessengerP8BMessageG10BMessengerx + 0x0069 7ffe693c (+ 112) 0033ec19 <libbe.so>:SendMessage__C10BMessengerP8BMessageP8BHandlerx + 0x0071 7ffe69ac (+ 96) 00330ba3 <libbe.so>:_PostMessage__7BLooperP8BMessageP8BHandlerT2 + 0x010f 7ffe6a0c (+ 48) 0032f67c <libbe.so>:PostMessage__7BLooperP8BMessage + 0x0024 7ffe6a3c (+ 176) 0021c417 <_APP_>:MessageReceived__7TBarAppP8BMessage + 0x021b 7ffe6aec (+ 48) 0032f808 <libbe.so>:DispatchMessage__7BLooperP8BMessageP8BHandler + 0x0074 7ffe6b1c (+ 528) 003263f8 <libbe.so>:DispatchMessage__12BApplicationP8BMessageP8BHandler + 0x04e4 7ffe6d2c (+ 96) 00331831 <libbe.so>:task_looper__7BLooper + 0x0299 7ffe6d8c (+ 64) 00324742 <libbe.so>:Run__12BApplication + 0x008a 7ffe6dcc (+ 432) 0021aeff <_APP_>:main + 0x002f 7ffe6f7c (+ 48) 0021adab <_APP_>:_start + 0x005b 7ffe6fac (+ 48) 001007be 727:runtime_loader_seg0ro@0x00100000 + 0x7be 7ffe6fdc (+ 0) 7ffe6fec 723:Deskbar_main_stack@0x7efe7000 + 0xffffec kdebug> traced 0 -10 -1 thread 72 64131. [ 72] 63166139: syscall pre: _kern_port_count(0x1030) 64132. [ 72] 63166161: syscall post: _kern_port_count() -> 0x0 64133. [ 72] 63166217: syscall pre: _kern_acquire_sem_etc(0xa65, 0x1, 0x8, 0x7fffffffffffffff) 64134. [ 72] 63166234: syscall post: _kern_acquire_sem_etc() -> 0x0 64135. [ 72] 63166506: syscall pre: _kern_get_port_info(0x2051, 0x7ffe67f8) 64136. [ 72] 63166539: syscall post: _kern_get_port_info() -> 0x0 64137. [ 72] 63166651: syscall pre: _kern_ktrace_output(0x7ffe63ec) 64138. [ 72] 63166666: user: BMessage send direct: port: 16474, token: 12, message: 'exst' 64139. [ 72] 63166690: syscall post: _kern_ktrace_output() -> 0x20 64140. [ 72] 63166723: syscall pre: _kern_acquire_sem_etc(0xccb, 0x1, 0x8, 0x7fffffffffffffff) printed 10 entries within range 64131 to 90162 (26032 of 90162 total, 90162 ever) kdebug> port 16474 PORT: 0x801360e0 id: 16474 name: "Deskbar" owner: 72 capacity: 100 read_sem: 3277 write_sem: 3278 read_sem count: 26 write_sem count: 74 total count: 23 kdebug> sem 3275 SEM: 0x99311610 id: 3275 (0xccb) name: 'BMessageQueue Lock' owner: 72 count: -1 queue: 72 last acquired by: 0, count: 0 last released by: 0, count: 0
According to the stack trace the thread is trying to send a message.
traced
tells us that it is a "direct" (i.e. intra-application)
message to port 16474, which belongs to the "Deskbar" window. Intra-app
messages are not really sent to a port, though. Instead they are directly
added to the target's message queue. The semaphore the thread is waiting for
belongs to that BMessageQueue
. First of all, looking through the
source code of the functions listed in the stack trace, reveals where the
BLooperList
lock is being held although it shouldn't:
BLooper::_PostMessage(BMessage *msg, BHandler *handler, BHandler *replyTo) { AutoLocker<BLooperList> listLocker(gLooperList); if (!listLocker.IsLocked()) return B_ERROR; if (!gLooperList.IsLooperValid(this)) return B_BAD_VALUE; // Does handler belong to this looper? if (handler && handler->Looper() != this) return B_MISMATCHED_VALUES; status_t status; BMessenger messenger(handler, this, &status); if (status == B_OK) status = messenger.SendMessage(msg, replyTo, 0); return status; }
The lock is held until the end of the method, including the line where
messenger.SendMessage()
is called. This is a bug -- the lock must
be released after the construction of the BMessenger
at the
latest.
This is only one problem, though. Just as the BLooperList
lock a
BMessageQueue
lock should not be held while trying to acquire
another lock. That is the main thread should not block on such a lock for a
long time, since the owner should release it eventually. Aside from direct
message deliveries, the message queue is normally only used within the looper it
belongs to. So it's likely that in this case the window thread is holding it
although it ought not to:
kdebug> sc 120 stack trace for thread 120 "w>Deskbar" kernel stack: 0x90739000 to 0x9073d000 user stack: 0x700c3000 to 0x70103000 frame caller <image>:function + offset 9073ce48 (+ 32) 8002d352 <kernel>:context_switch__FP6threadT0 + 0x0026 9073ce68 (+ 64) 8002d5e6 <kernel>:scheduler_reschedule + 0x0272 9073cea8 (+ 96) 8002ea5d <kernel>:switch_sem_etc + 0x039d 9073cf08 (+ 64) 8002f303 <kernel>:_user_acquire_sem_etc + 0x0027 9073cf48 (+ 96) 80091a13 <kernel>:x86_sysenter + 0x00ef (nearest) iframe at 0x9073cfa8 (end = 0x9073d000) eax 0xc ebx 0x7af048 ecx 0x701024d0 edx 0xffff0104 esi 0xffffffff edi 0x7fffffff ebp 0x7010250c esp 0x9073cfdc eip 0xffff0104 eflags 0x207 vector: 0x63, error code: 0x0 9073cfa8 (+ 0) ffff0104 7010250c (+ 64) 00477bdc <libbe.so>:AcquireLock__7BLockerxPl + 0x0064 7010254c (+ 48) 004779b3 <libbe.so>:Lock__7BLocker + 0x002f 7010257c (+ 48) 003322d0 <libbe.so>:Lock__Q28BPrivate11BLooperList + 0x0020 701025ac (+ 48) 0034c21c <libbe.so>:Lock__Q28BPrivatet25AutoLockerStandardLocking1ZQ28BPrivate11BLooperListPQ28BPrivate11BLooperList + 0x0024 701025dc (+ 48) 0034c0c1 <libbe.so>:Lock__Q28BPrivatet10AutoLocker2ZQ28BPrivate11BLooperListZQ28BPrivatet25AutoLockerStandardLocking1ZQ28BPrivate11BLooperList + 0x0031 7010260c (+ 48) 0034bef4 <libbe.so>:__Q28BPrivatet10AutoLocker2ZQ28BPrivate11BLooperListZQ28BPrivatet25AutoLockerStandardLocking1ZQ28BPrivate11BLooperListRQ28BPrivate11BLooperListbT2 + 0x0044 7010263c (+ 64) 00330d24 <libbe.so>:_Lock__7BLooperP7BLooperlx + 0x005c 7010267c (+ 48) 0032fe81 <libbe.so>:Lock__7BLooper + 0x002d 701026ac (+ 48) 004009bc <libbe.so>:Flush__C7BWindow + 0x0020 701026dc (+ 48) 003f3ee1 <libbe.so>:Flush__C5BView + 0x0029 7010270c (+ 112) 003fdda0 <libbe.so>:_Draw__5BViewG5BRect + 0x011c 7010277c (+ 736) 004033e9 <libbe.so>:DispatchMessage__7BWindowP8BMessageP8BHandler + 0x1cb9 70102a5c (+ 64) 00404a29 <libbe.so>:UpdateIfNeeded__7BWindow + 0x00f9 70102a9c (+ 112) 00229cee <_APP_>:AddTeam__15TExpandoMenuBarP5BListP7BBitmapPcT3 + 0x032a 70102b0c (+ 160) 00228f3d <_APP_>:MessageReceived__15TExpandoMenuBarP8BMessage + 0x0195 70102bac (+ 48) 0032f808 <libbe.so>:DispatchMessage__7BLooperP8BMessageP8BHandler + 0x0074 70102bdc (+ 736) 004037c0 <libbe.so>:DispatchMessage__7BWindowP8BMessageP8BHandler + 0x2090 70102ebc (+ 48) 002220ec <_APP_>:DispatchMessage__10TBarWindowP8BMessageP8BHandler + 0x0054 70102eec (+ 128) 00407a66 <libbe.so>:task_looper__7BWindow + 0x0326 70102f6c (+ 64) 003312cb <libbe.so>:_task0___7BLooperPv + 0x0053 70102fac (+ 48) 00732264 <libroot.so>:_get_next_team_info + 0x005c (nearest) 70102fdc (+ 0) 70102fec 1156:w>Deskbar_120_stack@0x700c3000 + 0x3ffec
After browsing through the source code of the functions listed in the stack
trace, the erroneous code appears to be in
BWindow::UpdateIfNeeded()
:
BMessageQueue *queue = MessageQueue(); queue->Lock(); // First process and remove any _UPDATE_ message in the queue // With the current design, there can only be one at a time BMessage *msg; for (int32 i = 0; (msg = queue->FindMessage(i)) != NULL; i++) { if (msg->what == _UPDATE_) { BWindow::DispatchMessage(msg, this); // we need to make sure that no overridden method is called // here; for BWindow::DispatchMessage() we now exactly what // will happen queue->RemoveMessage(msg); delete msg; break; // NOTE: "i" would have to be decreased if there were // multiple _UPDATE_ messages and we would not break! } } queue->Unlock();
BWindow::DispatchMessage()
is invoked while holding the message
queue lock. DispatchMessage()
calls various functions, including
BView::Draw()
which can be overridden by application developers. So,
although the comment in UpateIfNeeded()
suggests that calling
DispatchMessage()
is OK, it definitely is not.
To sum up our findings: Tracker hangs because it's still in its
initialization phase and waits forever for a reply message from the Deskbar.
The Deskbar in turn has run into a standard deadlock with the
BLooperList
lock and the deskbar window's message queue lock
involved. Main and window thread mutually wait for the lock the respective
other thread holds.
Conclusion
Haiku's kernel debugger is a powerful tool whose usefulness does not end in the kernel domain. It can also help with certain userland problems, that would otherwise be tricky to tackle. This document gave an introduction to the basic features of the kernel debugger, its most commonly used commands, and the kernel tracing mechanism. It concluded with a documented debug session showing how to apply the previously presented information to track down a bug in practice.
I hope after reading this document, Haiku developers feel encouraged to also consider whether using the kernel debugger could lead them quicker to a solution when they face a problem in Haiku. Particularly keeping the inexpensive kernel tracing enabled at all times might provide helpful information right when it's needed.