Like many BeOS developers, I experience daily frustration at the lack of
debugging tools on the BeOS. Unlike most developers, though, I'm in a
position to do more than just rant (which still occasionally happens), so
I've done some work on MALLOC_DEBUG
to try to help developers fix some of
the types of memory problems I've seen during NetPositive development.
This code will be built into BeOS R4, but you can use it now if you download the R3.1 debugging libroot from ftp://ftp.be.com/pub/experimental/tools/libroot_debug.zip.
That archive also contains a more detailed description of MALLOC_DEBUG
and how to use it; you should look it up if you'd like to take full
advantage of MALLOC_DEBUG
.
If you want a primer on the original MALLOC_DEBUG
mechanism, see
Dominic's article Be Engineering Insights: The Woes of Memory Allocation.
To summarize, MALLOC_DEBUG
works by hooking into the C library's malloc()
and free()
calls to catch some common memory violations: reading from
uninitialized blocks, reading blocks after they've been freed, freeing
blocks twice, or writing off the boundary of a block. Provided that you
don't override new and delete and do your own suballocation, MALLOC_DEBUG
works for both malloc-allocated blocks and C++ class instances.
MALLOC_DEBUG
trashes the block with garbage after it's been initialized,
to make sure your program doesn't depend on the block being in a certain
initial state. It trashes it again when it's freed so you don't depend on
the data still being there afterward. It trashes the block with
odd-numbered values so your application faults immediately if you try to
dereference a pointer within the block. It adds padding before and after
the block and checks the padding when the block is freed to make sure you
haven't written off the end of the block.
You turn on MALLOC_DEBUG
by setting some environment variables before
starting your program. If your application doesn't do anything illegal
with memory, MALLOC_DEBUG
won't adversely affect your program's
operation, other than to slow it down slightly and use a bit more memory
to store its extra information.
The best new feature is that MALLOC_DEBUG
now records in every block the
last seven levels of the call stack where the block allocation took
place. When the old MALLOC_DEBUG
detected an error, it just told you the
error type and gave you the block's address. If the block's identity
wasn't obvious from its content, it was difficult to figure out what the
problem was. Now, you can find out how the block was allocated and
immediately see what it is.
If your program trips up MALLOC_DEBUG
, it prints out the call stack in
the debugger message shown when the debugger is invoked. (This
information is no longer printed to stdout but appears in the debugger
instead.) The call stack consists of seven return addresses; convert them
into symbolic names using the wh command in the debugger.
The new MALLOC_DEBUG
mechanism adds levels of debugging instead of the
previous simple on/off switch. The debugging level sets how strict you
want it to be and how much runtime overhead you're willing to incur. You
can assign a debugging level from 1 to 10 through the MALLOC_DEBUG
environment variable. This is the same environment variable you used
before to turn on the old MALLOC_DEBUG
; if you set MALLOC_DEBUG=true
as
you did before, it defines the debugging level to its lowest value, 1,
which gives you equivalent functionality.
Right now, only three levels of debugging are defined: 1, 5, and 10,
leaving room to add future features. Level 1 is equivalent to the old
MALLOC_DEBUG
mechanism: it fills the block with garbage upon allocation
and after it is freed, and checks to see if the block is freed twice, or
its boundaries are violated.
Level 5 does all the Level 1 checks, and adds an extra step to do a
better job of catching blocks after they're freed: when you call free()
on a block, the block is trashed and placed on a "purgatory" list instead
of being returned immediately to the heap. The block stays on this list
until enough other blocks are freed; then it's pushed off of the list and
recycled. It does this to catch cases where your program writes to or
reads from a block after it has been freed.
As an example, let's say there's a bug in your program where you have an instance of a class. You maintain an old pointer to it and write to it after it's been deleted (which is easy to do in heavily threaded applications with poorly managed object lifetimes). Sometimes, the memory the instance used to occupy is free memory, and your error will likely go undetected. Sometimes, though, the memory has been recycled and is now occupied by another instance of the same class, or a different class. An illegal memory write now affects data in a different data structure. An illegal memory read reads data from a different class instance. With no means of detecting errors like these, you'll probably spend a lot of time looking in the wrong place for the problem.
However, if the freed block is trashed, placed on a purgatory list, and stays there awhile, it gives your program ample opportunity to try to read from the block (and see trashed data) or write to it. After some time, hopefully after all the dangling pointers have gone away, the block falls off the list, where it is checked to make sure you haven't written to it, and then it is recycled.
You can set the size of the purgatory list through the
MALLOC_DEBUG_FREE_LIST_SIZE
environment variable (the default is 1000
blocks). The value is block based, not byte based; if you have a high
turnover rate of large blocks, this chews up memory pretty fast. Adjust
the value up or down to determine the amount of time blocks spend in
purgatory and to tune memory usage during debugging.
Normally, MALLOC_DEBUG
only performs its checks when blocks are
allocated, realloced, freed or when they drop off the purgatory list and
are recycled. This means that MALLOC_DEBUG
usually only catches a memory
violation a long time after it has occurred; in fact, if your program
never frees a block, it will never be checked at all.
You can prevent this by turning the debugging level all the way up to 10.
At the highest level, MALLOC_DEBUG
performs all the checks of lower
levels. It also periodically checks every currently allocated block and
every purgatory block to make sure that nothing illegal has happened. The
MALLOC_DEBUG_CHECK_FREQUENCY
environment variable determines how often
this full check occurs; by default, it takes place every 1000 calls to
malloc()
/realloc()
/free()
(Individual blocks are still always checked
when freed or recycled, as they were before, regardless of this setting.)
As you can imagine, this can be a pretty time-consuming operation; with
the period at 1000, the impact on performance is small, but the latency
between an illegal operation and its detection is fairly large. If you're
having trouble tracking down where a problem happens, you can crank this
value down to something smaller, even all the way down to 1, which
performs a heap consistency check *every* time, but is excruciatingly
slow. When MALLOC_DEBUG
does detect the error, keep in mind that the
memory call where the error is detected may still be some distance away
(and possibly in a different thread) from the bug that actually caused
the problem; all you can hope to do is minimize the distance.
One problem to be aware of is that the new MALLOC_DEBUG
exposes a bug in
the Interface Kit of R3.x. You'll get a "Block written to after being
freed" exception on a BView that has BScrollBars targeting it when you
delete the window that contains the view. This will be fixed in R4, but
until then, you can work around it by removing the scroll bars from the
window and deleting them before deleting the targeted BView.
That's all I have time for, because The Man is beckoning me to crawl back
in my cage and fix some bugs, so I can't tell you about the values
MALLOC_DEBUG
trashes blocks with or other interesting technical details.
Look in the debugging libroot archive for more information. I hope that
MALLOC_DEBUG
helps you find some bugs. If you have ideas about how we
could make it better, let us know.
Ready for another article about 3D on the BeOS? The BeOS Release 4 OpenGL implementation has been heavily modified from the previous R3.1 version. We've added support for single-buffer rendering, reduced memory usage, and fixed some bugs.
Single buffering is perhaps the greatest improvement for R4. OpenGL now
uses the BDirectWindow
protocol to provide single buffering. It still
works with regular BWindow
s, but at a substantial performance penalty. To
provide this functionality, two new member functions have been added to
BGLView
:
BGLView
::DirectConnected
( direct_buffer_info *info
);BGLView
::EnableDirectMode
( boolenabled
);
DirectConnected()
must be called from
the BDirectWindow
hook function with
the same name. This keeps the BGLView
in sync with the current direct
window information. It's as simple as adding the following function to
your code:
voidmyDirectWindow
::DirectConnected
( direct_buffer_info *info
) { if(m_glview
)m_glview
->DirectConnected
(info
); }
EnableDirectMode()
is present to allow your application to enable and
disable direct window drawing without having to modify the
direct_buffer_info information. By default, direct mode is disabled.
Much effort is placed on having an OpenGL implementation that performs well. Two factors limit OpenGL performance. The first is the geometry processing (triangle) rate. This is the rate at which incoming vertex data can be processed and sent to the triangle, line, or point drawing hardware or software. The performance of this portion is generally independent of the size of the primitives sent to OpenGL. It is primarily dependent on the number of these primitives and other factors such as per vertex calculations like lighting or texture coordinate generation.
The second factor is the fill rate—the number of pixels that can be drawn in a given period of time, usually a second. This depends almost entirely on the state of the GL pixel pipeline. For software rendering, disabling most of the pipeline and rendering only flat, shaded triangles generally gives the best performance. Smooth shading, texturing, fogging, depth testing, stenciling, blending, and alpha testing each reduce performance somewhat.
Most of the R4 effort has gone into geometry processing optimizations. The processing speed should be greatly increased from R3. The greatest improvement is in the specular lighting code; specular lights should now have much less impact on performance. Another big gain is in quick clipping of primitives that are completely off screen.
We utilized some advantages of Intel processors and didn't ignore the drawing code either. We now have a shiny new MMX filler and some PII-specific depth testing code. For those with other processors, don't worry—OpenGL detects your CPU and uses the right code. Those who've used our prior OpenGL implementations may be wondering why most of the effort went into the geometry portion and not the primitive rendering code that takes most of the processor time. That can be summed up in one word:
While hardware support is not in R4, it's still on schedule for R5. Our implementation of OpenGL now has the hooks to support hardware acceleration. Continued incremental improvements to the software engine will never approach the performance provided by even a $50 3D-video card. All the geometry improvements will become much more visible once hardware acceleration is available. Can you say 200+ fps for GLTeapot?
What can you expect once hardware acceleration is available? Some existing OpenGL functions that were good for performance will suddenly become very bad:
BGLView
::CopyPixelsOut
BGLView
::CopyPixelsIn
These functions will not be the ideal way to move data into or out of a
BGLView
. A better solution is to use
glReadPixels
and glWritePixels
that can be pipelined by the accelerator. The
CopyPixelsOut
function forces a pipeline flush.
CopyPixelsIn
may not force the flush but must push and
pop the entire pixel state to get the correct behavior. Because your
application knows the current GL state you can save and restore only the
needed portion of the state and call glWritePixels
.
This function will always return NULL
starting with R4. All drawing in a
BGLView
should be done with GL commands. Mixing AppServer and GL is
extremely bad for the performance of both. This function is mostly used
for displaying text in a BGLView
. Below is an example of displaying text
using only GL commands.
One way to draw text in OpenGL is to create the font as a texture and then draw it using standard GL quads. The example below uses GL to draw the letter B. It uses the app_server to create the font and GL to draw it.
intObjectView
::round
( intin
) { inttempCount
= 0; while(in
> 7 ) {in
>>= 1;tempCount
++; } returnin
<<tempCount
; } voidObjectView
::makeFontMipmap
( intmaxSize
, charc
) { /* Get a fixed font */BFont
font
(be_fixed_font
); intsize
=maxSize
; intlevel
= 0; floatfontSize
=maxSize
; font_heightfh
; /* Calculate the largest font which will fit */ /* into the specified size */ do {fontSize
/= 1.05;font
.SetSize
(fontSize
);font
.GetHeight
( &fh
); } while (fh
.leading
>=size
); floatx
=size
/4
; /* Round Y to ensure all but the last 3 mipmaps land on */ /* integer values */ floaty
=round
(size
- (fh
.descent
+fh
.leading
* 0.05) ); /* Reduce the size of the font until its fits the new */ /* location */ do {fontSize
/= 1.05;font
.SetSize
(fontSize
);font
.GetHeight
( &fh
); } while (y
<fh
.ascent
); /* Create each mipmap for the font */ while(size
>= 1 ) {font
.SetSize
(fontSize
);makeFontLevel
(size
,level
, &font
,x
,y
,c
);size
/= 2;level
++;x
/= 2.0;y
/= 2.0;fontSize
/= 2; } } voidObjectView
::makeFontLevel
( intsize
, intlevel
,BFont
*font
, floatx
, floaty
, charc
) { /* Create a bounding rect for the bitmap */BRect
boundingRect
( 0, 0,size
-1,size
-1 ); GLubyte *bits
; /* Create a gray scale bitmap to hold the font */BBitmap
bitmap
(boundingRect
,B_CMAP8
,true
,false
); /* Create an embedded view */BView
view
(boundingRect
, "Font view",B_FOLLOW_NONE
, 0);bitmap
.Lock
();bitmap
.AddChild
( &view
); /* Set the background to bright white */ /* Could be done with app server call */bits
= (GLubyte *)bitmap
.Bits
(); for( intct
=0;ct
<size
*size
;ct
++ )bits
[ct
] = 255; /* Draw the character into the bitmap at the specified */ /* location */view
.SetFont
(font
);view
.DrawChar
(c
,BPoint
(x
,y
) );view
.Sync
(); /* Invert the bitmap to make an intensity map where the */ /* text is intense and the background is not. */ for( intct
=0;ct
<size
*size
;ct
++ )bits
[ct
] = 255 -bits
[ct
]; /* Load the intensity map into GL */glTexImage2D
(GL_TEXTURE_2D
,level
,GL_INTENSITY4
,size
,size
, 0,GL_LUMINANCE
,GL_UNSIGNED_BYTE
,bitmap
.Bits
() ); /* Clean up */bitmap
.RemoveChild
( &view
);bitmap
.Unlock
(); } voidObjectView
::DrawFrame
(boolnoPause
) { if(initCount
< 1 ) return;LockGL
(); /* Enable texturing */glEnable
(GL_TEXTURE_2D
); /* Set texturing to clamp to prevent repeating the */ /* texture if invalid texture coordinates were given */glTexParameteri
(GL_TEXTURE_2D
,GL_TEXTURE_WRAP_S
,GL_CLAMP
);glTexParameteri
(GL_TEXTURE_2D
,GL_TEXTURE_WRAP_T
,GL_CLAMP
); /* Set filters. This configures for trilinear filtering */glTexParameteri
(GL_TEXTURE_2D
,GL_TEXTURE_MAG_FILTER
,GL_LINEAR
);glTexParameteri
(GL_TEXTURE_2D
,GL_TEXTURE_MIN_FILTER
,GL_LINEAR_MIPMAP_LINEAR
); /* Colored text is created with GL_MODULATE. */ /* The intensity map determines the brightness and the */ /* vertexes specify the color */glTexEnvi
(GL_TEXTURE_ENV
,GL_TEXTURE_ENV_MODE
,GL_MODULATE
); /* Make the character texture */makeFontMipmap
( 128, 'B' ); /* Draw the texture */glBegin
(GL_QUADS
);glColor3f
( 1.0, 0.0, 0.0 );glTexCoord2f
( 0.0, 0.0 );glVertex2f
( -1.0, 1.0 );glColor3f
( 1.0, 0.5, 0.0 );glTexCoord2f
( 1.0, 0.0 );glVertex2f
( 1.0, 1.0 );glColor3f
( 0.0, 0.0, 1.0 );glTexCoord2f
( 1.0, 1.0 );glVertex2f
( 1.0, -1.0 );glColor3f
( 0.0, 0.5, 1.0 );glTexCoord2f
( 0.0, 1.0 );glVertex2f
( -1.0, -1.0 );glEnd
();UnlockGL
(); };
This could be improved by creating the fonts in advance and binding them
to texture objects using glBindTexture
.
Q: | "Now that DR12 (excuse me, R4) is more than a twinkle in Eddie's eye, can I ask about new features without cramming my interlocutant down the Bocca de la Verita?" -- Amfortas, Monsalvat, Spain |
Good of you to write, Mr. Amfortas (where'd you find Shroud of Turin stationery?). To answer your only question first (without actually answering it), here you go:
Between the dum and dee of finding a handler's looper and locking the
fellow, there lies a race. Consider the mayhem were the handler is
removed from the looper between the two calls. Rare? You bet, but the
best bugs are just so. Solve the problem with BHandler
's
new LockLooper()
function. In a single call the looper is cornered and quartered. So,
where you now have this (to examplicate the commonest):
window
=view
->Window
(); if (window
->Lock
()) { ...window
->Unlock
(); }
...you will, in R4, type thus:
if (view
->LockLooper
()) { ...view
->UnlockLooper
(); }
Are you jealously interested in the other apps that the user is
sneaking about with when you're not looking? To get this information, in
R3, you had to pester the roster like a five-year-old in the back seat on
his way to grandma's. Now, roster will pester you: BRoster
's
StartWatching()
and StopWatching()
functions will let you register for
notifications of application launchings, activations, and deaths. All
just gossip, in my book.
As Pulse()
is the apian genua, the
new BMessageRunner
is a feline
huzzah. Mr. Message Runner sends a message, and then sends it again some
moments later, and again and again and again, automatically,
continuously, obsessively.
You like BFile
, but you miss being able to
close()
when you're done.
The lack of a proper goodbye feels as caddish as leaving a twenty on the
table and lying about calling, you blackguard. Now we give you your
cupcake, and yet another, so you can eat one and have the remainder. Look
for BNode::Dup()
, the call with the po-po-posixy name. It duplicates the
node's file descriptor so you can have your way with it and properly
close()
it when you're done. It doesn't
actually affect the BNode
's
descriptor, so you may still feel a bit roguish, but at least you can go
through the motions.
BResources
, the Flying Dutchman of the Storage Kit, has shifted its
sails once again. What started out as a happy-go-lucky structure that
could tack into nearly any file was trimmed to make way for attributes a
few releases back. In R4 we'll trim again: You can use a BResources
object to *read* an application's resources, but you mustn't *write* the
data. Writing resources (signatures, icons, etc.) is the job of
professionals, such as FileTypes, IconThingummy (what *do* we call it
these days?), and the new xres tool (which I'm not going to talk about).
How many times has mounting a volume evoked that feeling of presque vu? Turn that "almost" into a certainty by examining the volume's new "be:volume_id" attribute. 64 bits is better than fingerprints and costs less than DNA.
The BGLView
class, your interface to
OpenGL, has learned the
BDirectWindow
secret handshake. Also,
OpenGL
no longer speaks pig latin when asked to
back-cull. It's a z-axis world; live in it.
That should help. By the way, wasn't Kundry an intern?
It's almost here. We'll soon begin rounds of beta testing for the upcoming Release 4 of the BeOS. And that seems like a good opportunity to state our position or intentions on the topic of release classifications.
First, an explanation of the terms. It used to be that "alpha" meant something that occasionally worked and represented what you wanted the product to do. "Beta" meant "feature complete," including undocumented features—a.k.a. bugs.
Cynics say that rounds of beta testing are used to progressively approximate a commercial product, one that the customer will pay for and not return in bankrupting numbers. As with any language artifacts, "alpha" and "beta" have lost some of their categorical meanings as they've evolved. Beta testing is now an opportunity to add and delete features as the product moves toward commercial completion. Some features prove too problematic to fix in reasonable time. Others that seemed like a good idea might be rejected by real users. Missing functions in an earlier beta are now feasible or clamored for.
With the Web, and the Software Valet client in the BeOS, we have ideal tools for a more fluid beta testing process. I mentioned "real users" and the clamor for certain features. In an ideal world, we have a perfect QA organization with testing programs that probe every tendril in our software and take it where no human would dare tread. The more mundane reality is that QA engineers are too sophisticated and know too much, including unconscious knowledge. As a result, they, or their programs, don't tread where normal human beings naturally go. How did you do that, and why? I don't know, replies the customer, already annoyed.
I know about this. Because of an apparently innate ability to misuse software and washer-dryers, I'm used to being on the receiving end of such questions. For example, on a certain legacy operating system, the number of bytes remaining on disk is displayed in a window title. I once "managed" to replace the comma separator in the number with a J. My hard disk was promptly confiscated. I promise we won't do this at Be. We might, though, just beg to borrow your system to make sure we can reproduce a problem we were unable to create unaided.
Regarding the clamor for features, we're a little nervous. Hopefully, the BeOS Release 4 will show that we've been listening to assertive software developers and users. On the other hand, with a much larger feature set, we're likely to get even more vigorous feedback—some of which will feed the next round of fixes or improvements.
We like this, especially when we don't like it. The pain means the critics have touched something important that we'd better attend to.