Google Summer of Code Conclusion
Google Summer of Code 2010 is now over. It was a wonderful experience, and I learned a lot about Haiku’s internals, about file system development, and about myself. I successfully completed my proposal to a point an initial version of the Ext3 file system is available to the Haiku kernel for testing. There are some things that remain to be completed, like sparse files, proper revoke support, multi-transaction truncation and some more thorough testing, but overall, it was successful.
The first thing that is very noticeable and that limits very much Ext3 support is proper Journal mode support. Because the file cache doesn’t support transactions (yet), it would require a hack to make the file data journalled and therefore the only mode supported is writeback, which unfortunately is also the less secure of three modes, because corrupt data can appear on the file system. I still want to finish this, but because it depends on an external part of the system, it will be a while until it gets completely finished.
File write operations all work correctly (in all tests done so far, if something appears wrong, please contact me and/or file a bug report on Trac), but large file truncation can be a little problematic. This should only happen in absurdly large files, but nevertheless is something to be considered. Basically, since truncation changes a lot of blocks (the inode table, single/double/triple indirect blocks) it can become large enough to exceed the journal log size. The way to handle this is to allow the truncation to be split into smaller transactions. The code to do this isn’t very complex, but it takes a little work to determine when to split the transaction. Another related issue is that shrinking a file is done in reverse order in Ext3 to make sure it is consistent across file system failures, and currently my code is doing it in the wrong order. Also another small fix to do.
Creating files and directories are almost fully finished. Before the GSoC deadline, I had adding files to a directory working up to when a directory resize was required. Afterwards this bug has been fixed, and was the final blocker for actually committing all of the code. There is still one last thing to do, which is expanding the directory index tree when the branches need to be split. This requires a little more work because if the tree becomes saturated, a new level of indirection is required. I’ve already started to implement this, but it’s still not finished. This is the item that’s top priority in my todo list.
One other thing that can cause problems is revoking blocks. Under certain situations, data can get corrupt when the journal is replayed. For this to happen, file meta-data must be written to a block (causing it to be written to the journal), then that block must be freed and then reallocated for a file. If a crash happens and these operations haven’t been checkpointed (ie. written to disk and removed from the log), they will get replayed, but since file data written to the block isn’t journalled, the meta-data will erroneously overwrite the file data, causing corruption. To fix this, proper revoke support must be implemented. Basically, the block allocator (I know it’s late, but I’ll talk more about it on another blog post) must be aware of blocks that have been freed and that haven’t been checkpointed yet. Then, if the block allocator reallocates this block, it will mark in the journal to ignore the previous data when replaying the log. Sounds simple, but it is actually a complex operation between the journal, the block allocator and the revoke manager, and therefore it hasn’t been implemented yet.
As you can see, some things are missing to mark the project complete. I will work hard to finish them as soon as possible, but my time lately has been rather limited because vacation is over. I’ll keep posting here in the blog as things advance. I intended for these posts to serve as an initial documentation for the code, so I hope the posts so far have been clear. I’ve already finished writing a blog post about the block allocator, but because of the revoke problem I’ve postponed it for a while. As soon as I fix it, I’ll post a more complete entry, and then continue writing about the new parts.
Working on Haiku for the Google Summer of Code was definitely one of the best things I’ve ever done. It was a wonderful experience, and I realized I would have done better if I had planned better. Another thing that I felt I could improve is in trusting less my code instincts. Sometimes my instincts told me how to do something, but if I had stopped to analyse things better and to create detailed tests, I would have seen that there were unexpected factors that should have been considered before coding. Well, lessons learned, and I have certainly become a better programmer.
I have fulfilled a dream and a life objective. I always wanted to contribute to open source software, and Haiku and Google pushed my forward to start contributing. For that, I am very, very thankful. I would also like to thank Jérôme for helping me out along the way, and all the other mentors for helping us students in the mentor pool. I would also like to thank Matt Madia for his hard work organizing everything, because without him, it probably wouldn’t have been possible to realize everything. Thank you Carol Smith, for organizing the Google Summer of Code, and for being so supportive. And finally, thank you to all Haiku users and contributors! You’ve all helped in making this possible!