Be Newsletters - Volume 3: 1998

Issue 3-39, September 30, 1998

Be Engineering Insights: Using the New MALLOC_DEBUG

By Scott Barta

Like many BeOS developers, I experience daily frustration at the lack of debugging tools on the BeOS. Unlike most developers, though, I'm in a position to do more than just rant (which still occasionally happens), so I've done some work on MALLOC_DEBUG to try to help developers fix some of the types of memory problems I've seen during NetPositive development.

This code will be built into BeOS R4, but you can use it now if you download the R3.1 debugging libroot from ftp://ftp.be.com/pub/experimental/tools/libroot_debug.zip.

That archive also contains a more detailed description of MALLOC_DEBUG and how to use it; you should look it up if you'd like to take full advantage of MALLOC_DEBUG.

If you want a primer on the original MALLOC_DEBUG mechanism, see Dominic's article Be Engineering Insights: The Woes of Memory Allocation. To summarize, MALLOC_DEBUG works by hooking into the C library's malloc() and free() calls to catch some common memory violations: reading from uninitialized blocks, reading blocks after they've been freed, freeing blocks twice, or writing off the boundary of a block. Provided that you don't override new and delete and do your own suballocation, MALLOC_DEBUG works for both malloc-allocated blocks and C++ class instances.

MALLOC_DEBUG trashes the block with garbage after it's been initialized, to make sure your program doesn't depend on the block being in a certain initial state. It trashes it again when it's freed so you don't depend on the data still being there afterward. It trashes the block with odd-numbered values so your application faults immediately if you try to dereference a pointer within the block. It adds padding before and after the block and checks the padding when the block is freed to make sure you haven't written off the end of the block.

You turn on MALLOC_DEBUG by setting some environment variables before starting your program. If your application doesn't do anything illegal with memory, MALLOC_DEBUG won't adversely affect your program's operation, other than to slow it down slightly and use a bit more memory to store its extra information.

What's New

The best new feature is that MALLOC_DEBUG now records in every block the last seven levels of the call stack where the block allocation took place. When the old MALLOC_DEBUG detected an error, it just told you the error type and gave you the block's address. If the block's identity wasn't obvious from its content, it was difficult to figure out what the problem was. Now, you can find out how the block was allocated and immediately see what it is.

If your program trips up MALLOC_DEBUG, it prints out the call stack in the debugger message shown when the debugger is invoked. (This information is no longer printed to stdout but appears in the debugger instead.) The call stack consists of seven return addresses; convert them into symbolic names using the wh command in the debugger.

Debugging Levels

The new MALLOC_DEBUG mechanism adds levels of debugging instead of the previous simple on/off switch. The debugging level sets how strict you want it to be and how much runtime overhead you're willing to incur. You can assign a debugging level from 1 to 10 through the MALLOC_DEBUG environment variable. This is the same environment variable you used before to turn on the old MALLOC_DEBUG; if you set MALLOC_DEBUG=true as you did before, it defines the debugging level to its lowest value, 1, which gives you equivalent functionality.

Right now, only three levels of debugging are defined: 1, 5, and 10, leaving room to add future features. Level 1 is equivalent to the old MALLOC_DEBUG mechanism: it fills the block with garbage upon allocation and after it is freed, and checks to see if the block is freed twice, or its boundaries are violated.

Level 5 does all the Level 1 checks, and adds an extra step to do a better job of catching blocks after they're freed: when you call free() on a block, the block is trashed and placed on a "purgatory" list instead of being returned immediately to the heap. The block stays on this list until enough other blocks are freed; then it's pushed off of the list and recycled. It does this to catch cases where your program writes to or reads from a block after it has been freed.

As an example, let's say there's a bug in your program where you have an instance of a class. You maintain an old pointer to it and write to it after it's been deleted (which is easy to do in heavily threaded applications with poorly managed object lifetimes). Sometimes, the memory the instance used to occupy is free memory, and your error will likely go undetected. Sometimes, though, the memory has been recycled and is now occupied by another instance of the same class, or a different class. An illegal memory write now affects data in a different data structure. An illegal memory read reads data from a different class instance. With no means of detecting errors like these, you'll probably spend a lot of time looking in the wrong place for the problem.

However, if the freed block is trashed, placed on a purgatory list, and stays there awhile, it gives your program ample opportunity to try to read from the block (and see trashed data) or write to it. After some time, hopefully after all the dangling pointers have gone away, the block falls off the list, where it is checked to make sure you haven't written to it, and then it is recycled.

You can set the size of the purgatory list through the MALLOC_DEBUG_FREE_LIST_SIZE environment variable (the default is 1000 blocks). The value is block based, not byte based; if you have a high turnover rate of large blocks, this chews up memory pretty fast. Adjust the value up or down to determine the amount of time blocks spend in purgatory and to tune memory usage during debugging.

Normally, MALLOC_DEBUG only performs its checks when blocks are allocated, realloced, freed or when they drop off the purgatory list and are recycled. This means that MALLOC_DEBUG usually only catches a memory violation a long time after it has occurred; in fact, if your program never frees a block, it will never be checked at all.

You can prevent this by turning the debugging level all the way up to 10. At the highest level, MALLOC_DEBUG performs all the checks of lower levels. It also periodically checks every currently allocated block and every purgatory block to make sure that nothing illegal has happened. The MALLOC_DEBUG_CHECK_FREQUENCY environment variable determines how often this full check occurs; by default, it takes place every 1000 calls to malloc()/realloc()/free() (Individual blocks are still always checked when freed or recycled, as they were before, regardless of this setting.)

As you can imagine, this can be a pretty time-consuming operation; with the period at 1000, the impact on performance is small, but the latency between an illegal operation and its detection is fairly large. If you're having trouble tracking down where a problem happens, you can crank this value down to something smaller, even all the way down to 1, which performs a heap consistency check *every* time, but is excruciatingly slow. When MALLOC_DEBUG does detect the error, keep in mind that the memory call where the error is detected may still be some distance away (and possibly in a different thread) from the bug that actually caused the problem; all you can hope to do is minimize the distance.

One problem to be aware of is that the new MALLOC_DEBUG exposes a bug in the Interface Kit of R3.x. You'll get a "Block written to after being freed" exception on a BView that has BScrollBars targeting it when you delete the window that contains the view. This will be fixed in R4, but until then, you can work around it by removing the scroll bars from the window and deleting them before deleting the targeted BView.

That's all I have time for, because The Man is beckoning me to crawl back in my cage and fix some bugs, so I can't tell you about the values MALLOC_DEBUG trashes blocks with or other interesting technical details. Look in the debugging libroot archive for more information. I hope that MALLOC_DEBUG helps you find some bugs. If you have ideas about how we could make it better, let us know.

Be Engineering Insights: Changes in the OpenGL® World

By Jason Sams

Ready for another article about 3D on the BeOS? The BeOS Release 4 OpenGL implementation has been heavily modified from the previous R3.1 version. We've added support for single-buffer rendering, reduced memory usage, and fixed some bugs.

Single Buffering

Single buffering is perhaps the greatest improvement for R4. OpenGL now uses the BDirectWindow protocol to provide single buffering. It still works with regular BWindows, but at a substantial performance penalty. To provide this functionality, two new member functions have been added to BGLView:

BGLView::DirectConnected( direct_buffer_info *info );
BGLView::EnableDirectMode( bool enabled );

DirectConnected() must be called from the BDirectWindow hook function with the same name. This keeps the BGLView in sync with the current direct window information. It's as simple as adding the following function to your code:

void
myDirectWindow::DirectConnected( direct_buffer_info *info )
{
  if( m_glview )
    m_glview->DirectConnected( info );
}

EnableDirectMode() is present to allow your application to enable and disable direct window drawing without having to modify the direct_buffer_info information. By default, direct mode is disabled.

Performance

Much effort is placed on having an OpenGL implementation that performs well. Two factors limit OpenGL performance. The first is the geometry processing (triangle) rate. This is the rate at which incoming vertex data can be processed and sent to the triangle, line, or point drawing hardware or software. The performance of this portion is generally independent of the size of the primitives sent to OpenGL. It is primarily dependent on the number of these primitives and other factors such as per vertex calculations like lighting or texture coordinate generation.

The second factor is the fill rate—the number of pixels that can be drawn in a given period of time, usually a second. This depends almost entirely on the state of the GL pixel pipeline. For software rendering, disabling most of the pipeline and rendering only flat, shaded triangles generally gives the best performance. Smooth shading, texturing, fogging, depth testing, stenciling, blending, and alpha testing each reduce performance somewhat.

Most of the R4 effort has gone into geometry processing optimizations. The processing speed should be greatly increased from R3. The greatest improvement is in the specular lighting code; specular lights should now have much less impact on performance. Another big gain is in quick clipping of primitives that are completely off screen.

We utilized some advantages of Intel processors and didn't ignore the drawing code either. We now have a shiny new MMX filler and some PII-specific depth testing code. For those with other processors, don't worry—OpenGL detects your CPU and uses the right code. Those who've used our prior OpenGL implementations may be wondering why most of the effort went into the geometry portion and not the primitive rendering code that takes most of the processor time. That can be summed up in one word:

Hardware

While hardware support is not in R4, it's still on schedule for R5. Our implementation of OpenGL now has the hooks to support hardware acceleration. Continued incremental improvements to the software engine will never approach the performance provided by even a $50 3D-video card. All the geometry improvements will become much more visible once hardware acceleration is available. Can you say 200+ fps for GLTeapot?

What can you expect once hardware acceleration is available? Some existing OpenGL functions that were good for performance will suddenly become very bad:

BGLView::CopyPixelsOut
BGLView::CopyPixelsIn

These functions will not be the ideal way to move data into or out of a BGLView. A better solution is to use glReadPixels and glWritePixels that can be pipelined by the accelerator. The CopyPixelsOut function forces a pipeline flush. CopyPixelsIn may not force the flush but must push and pop the entire pixel state to get the correct behavior. Because your application knows the current GL state you can save and restore only the needed portion of the state and call glWritePixels.

BGLView::EmbeddedView

This function will always return NULL starting with R4. All drawing in a BGLView should be done with GL commands. Mixing AppServer and GL is extremely bad for the performance of both. This function is mostly used for displaying text in a BGLView. Below is an example of displaying text using only GL commands.

Text in GL

One way to draw text in OpenGL is to create the font as a texture and then draw it using standard GL quads. The example below uses GL to draw the letter B. It uses the app_server to create the font and GL to draw it.

int ObjectView::round( int in )
{
  int tempCount = 0;
  while( in > 7 )
  {
    in >>= 1;
    tempCount ++;
  }
  return in << tempCount;
}

void ObjectView::makeFontMipmap( int maxSize, char c )
{
  /* Get a fixed font */
  BFont font( be_fixed_font );

  int size = maxSize;
  int level = 0;
  float fontSize = maxSize;
  font_height fh;

  /* Calculate the largest font which will fit */
  /* into the specified size                   */
  do
  {
    fontSize /= 1.05;
    font.SetSize( fontSize );
    font.GetHeight( &fh );
  } while ( fh.leading >= size );

  float x = size / 4;
  /* Round Y to ensure all but the last 3 mipmaps land on */
  /* integer values                                       */
  float y = round( size - (fh.descent + fh.leading * 0.05) );

  /* Reduce the size of the font until its fits the new */
  /* location */
  do
  {
    fontSize /= 1.05;
    font.SetSize( fontSize );
    font.GetHeight( &fh );
  } while ( y < fh.ascent );

  /* Create each mipmap for the font */
  while( size >= 1 )
  {
    font.SetSize( fontSize );
    makeFontLevel( size, level, &font, x, y, c );
    size /= 2;
    level ++;
    x /= 2.0;
    y /= 2.0;
    fontSize /= 2;
  }

}

void ObjectView::makeFontLevel( int size, int level,
  BFont *font, float x, float y, char c )
{
  /* Create a bounding rect for the bitmap */
  BRect boundingRect( 0, 0, size-1, size-1 );
  GLubyte *bits;

  /* Create a gray scale bitmap to hold the font */
  BBitmap bitmap( boundingRect, B_CMAP8, true, false );

  /* Create an embedded view */
  BView view( boundingRect, "Font view", B_FOLLOW_NONE, 0);

  bitmap.Lock();
  bitmap.AddChild( &view );

  /* Set the background to bright white */
  /* Could be done with app server call */
  bits = (GLubyte *)bitmap.Bits();
  for( int ct=0; ct<size*size; ct++ )
    bits[ct] = 255;

  /* Draw the character into the bitmap at the specified */
  /* location                                            */
  view.SetFont( font );
  view.DrawChar( c, BPoint( x, y ) );
  view.Sync();

  /* Invert the bitmap to make an intensity map where the */
  /* text is intense and the background is not. */
  for( int ct=0; ct<size*size; ct++ )
    bits[ct] = 255 - bits[ct];

  /* Load the intensity map into GL */
  glTexImage2D( GL_TEXTURE_2D, level, GL_INTENSITY4, size,
    size, 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, bitmap.Bits() );

  /* Clean up */
  bitmap.RemoveChild( &view );
  bitmap.Unlock();
}

void ObjectView::DrawFrame(bool noPause)
{
  if( initCount < 1 )
    return;
  LockGL();

  /* Enable texturing */
  glEnable( GL_TEXTURE_2D );

  /* Set texturing to clamp to prevent repeating the   */
  /* texture if invalid texture coordinates were given */
  glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_WRAP_S,
    GL_CLAMP );
  glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_WRAP_T,
    GL_CLAMP );

  /* Set filters. This configures for trilinear filtering */
  glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER,
    GL_LINEAR );
  glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER,
    GL_LINEAR_MIPMAP_LINEAR );

  /* Colored text is created with GL_MODULATE.           */
  /* The intensity map determines the brightness and the */
  /* vertexes specify the color                          */
  glTexEnvi( GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE,
    GL_MODULATE );

  /* Make the character texture */
  makeFontMipmap( 128, 'B' );

  /* Draw the texture */
  glBegin( GL_QUADS );

  glColor3f( 1.0, 0.0, 0.0 );
  glTexCoord2f( 0.0, 0.0 );
  glVertex2f( -1.0, 1.0 );

  glColor3f( 1.0, 0.5, 0.0 );
  glTexCoord2f( 1.0, 0.0 );
  glVertex2f( 1.0, 1.0 );

  glColor3f( 0.0, 0.0, 1.0 );
  glTexCoord2f( 1.0, 1.0 );
  glVertex2f( 1.0, -1.0 );

  glColor3f( 0.0, 0.5, 1.0 );
  glTexCoord2f( 0.0, 1.0 );
  glVertex2f( -1.0, -1.0 );
  glEnd();

  UnlockGL();
};

This could be improved by creating the fonts in advance and binding them to texture objects using glBindTexture.

Conclusion

Much has already changed in the OpenGL world and much remains to be done. We are now very close to tremendous performance gains through hardware acceleration and BeOS Release 4 has paved the way. Thanks, and let's create those great looking 3D apps.

Developers Workshop: Fal Parsi

By Doug Fulton

Q:	"Now that DR12 (excuse me, R4) is more than a twinkle in Eddie's eye, can I ask about new features without cramming my interlocutant down the Bocca de la Verita?" -- Amfortas, Monsalvat, Spain

Good of you to write, Mr. Amfortas (where'd you find Shroud of Turin stationery?). To answer your only question first (without actually answering it), here you go:

Between the dum and dee of finding a handler's looper and locking the fellow, there lies a race. Consider the mayhem were the handler is removed from the looper between the two calls. Rare? You bet, but the best bugs are just so. Solve the problem with BHandler's new LockLooper() function. In a single call the looper is cornered and quartered. So, where you now have this (to examplicate the commonest):
```
window = view->Window();
if (window->Lock()) {
  ...
  window->Unlock();
}
```
...you will, in R4, type thus:
```
if (view->LockLooper()) {
  ...
  view->UnlockLooper();
}
```
Are you jealously interested in the other apps that the user is sneaking about with when you're not looking? To get this information, in R3, you had to pester the roster like a five-year-old in the back seat on his way to grandma's. Now, roster will pester you: BRoster's StartWatching() and StopWatching() functions will let you register for notifications of application launchings, activations, and deaths. All just gossip, in my book.
As Pulse() is the apian genua, the new BMessageRunner is a feline huzzah. Mr. Message Runner sends a message, and then sends it again some moments later, and again and again and again, automatically, continuously, obsessively.
You like BFile, but you miss being able to close() when you're done. The lack of a proper goodbye feels as caddish as leaving a twenty on the table and lying about calling, you blackguard. Now we give you your cupcake, and yet another, so you can eat one and have the remainder. Look for BNode::Dup(), the call with the po-po-posixy name. It duplicates the node's file descriptor so you can have your way with it and properly close() it when you're done. It doesn't actually affect the BNode's descriptor, so you may still feel a bit roguish, but at least you can go through the motions.
BResources, the Flying Dutchman of the Storage Kit, has shifted its sails once again. What started out as a happy-go-lucky structure that could tack into nearly any file was trimmed to make way for attributes a few releases back. In R4 we'll trim again: You can use a BResources object to *read* an application's resources, but you mustn't *write* the data. Writing resources (signatures, icons, etc.) is the job of professionals, such as FileTypes, IconThingummy (what *do* we call it these days?), and the new xres tool (which I'm not going to talk about).
How many times has mounting a volume evoked that feeling of presque vu? Turn that "almost" into a certainty by examining the volume's new "be:volume_id" attribute. 64 bits is better than fingerprints and costs less than DNA.
The BGLView class, your interface to OpenGL, has learned the BDirectWindow secret handshake. Also, OpenGL no longer speaks pig latin when asked to back-cull. It's a z-axis world; live in it.

That should help. By the way, wasn't Kundry an intern?

Beta Season

By Jean-Louis Gassée

It's almost here. We'll soon begin rounds of beta testing for the upcoming Release 4 of the BeOS. And that seems like a good opportunity to state our position or intentions on the topic of release classifications.

First, an explanation of the terms. It used to be that "alpha" meant something that occasionally worked and represented what you wanted the product to do. "Beta" meant "feature complete," including undocumented features—a.k.a. bugs.

Cynics say that rounds of beta testing are used to progressively approximate a commercial product, one that the customer will pay for and not return in bankrupting numbers. As with any language artifacts, "alpha" and "beta" have lost some of their categorical meanings as they've evolved. Beta testing is now an opportunity to add and delete features as the product moves toward commercial completion. Some features prove too problematic to fix in reasonable time. Others that seemed like a good idea might be rejected by real users. Missing functions in an earlier beta are now feasible or clamored for.

With the Web, and the Software Valet client in the BeOS, we have ideal tools for a more fluid beta testing process. I mentioned "real users" and the clamor for certain features. In an ideal world, we have a perfect QA organization with testing programs that probe every tendril in our software and take it where no human would dare tread. The more mundane reality is that QA engineers are too sophisticated and know too much, including unconscious knowledge. As a result, they, or their programs, don't tread where normal human beings naturally go. How did you do that, and why? I don't know, replies the customer, already annoyed.

I know about this. Because of an apparently innate ability to misuse software and washer-dryers, I'm used to being on the receiving end of such questions. For example, on a certain legacy operating system, the number of bytes remaining on disk is displayed in a window title. I once "managed" to replace the comma separator in the number with a J. My hard disk was promptly confiscated. I promise we won't do this at Be. We might, though, just beg to borrow your system to make sure we can reproduce a problem we were unable to create unaided.

Regarding the clamor for features, we're a little nervous. Hopefully, the BeOS Release 4 will show that we've been listening to assertive software developers and users. On the other hand, with a much larger feature set, we're likely to get even more vigorous feedback—some of which will feed the next round of fixes or improvements.

We like this, especially when we don't like it. The pain means the critics have touched something important that we'd better attend to.