The FAT File System

Code Organization

bsd/

C code ported from FreeBSD.

fs/msdosfs/

The FreeBSD FAT driver. Only minor changes have been made from the FreeBSD code. The exceptions are msdosfs_vfsops.c and msdosfs_vnops.c. Most of the original content of these files (BSD hook functions) was removed for the port (although some of the hook functions in kernel_interface.cpp are adapted from these).

kern/ and libkern/

Heavily modified and simplified versions of FreeBSD kernel source files.

sys/ and vm/

Heavily modified and simplified versions of FreeBSD kernel headers, provided for compatability. These are adapted to support this driver specifically and are not meant for general BSD compatability.

kernel_interface.cpp

Hook functions. Code in this file and the other top-level files is adapted from several sources: the FreeBSD driver, the BFS driver, and the original Haiku FAT driver.

dosfs.h

Header for shared definitions across all (C and C++) driver code.

support.*

Supporting functions for C++ driver code.

debug.*

Adapted from the BFS equivalent.

mkdos.*

Volume initialization, mostly unchanged from the original Haiku driver.

vcache.*

The vcache keeps track of inode assignments and ties them to direntry locations. Mostly unchanged from the original Haiku driver.

fssh_defines.h

Macros that are not provided by the fs_shell interface.

FAT Data Types

The central FAT-specific structs used in the driver include:

struct msdosfsmount

The FAT private volume. Note: the address provided to the VFS is actually that of the corresponding BSD struct mount.

struct denode

The FAT private node. Note: the address provided to the VFS is actually that of the corresponding BSD struct vnode.

struct direntry

The directory entry that corresponds to a node, as stored on disk.

struct winentry

A modified directory entry that stores some or all of a long file name on disk.

BSD Compatability

The BSD structs and functions that are used in the driver for compatability include the following. In each, some member variables were removed for simplicity and others were added.

struct vnode

The BSD VFS node. The address provided to the Haiku VFS by publish_vnode and dosfs_read_vnode points to this object. In FreeBSD, the driver also accesses the vnode (in a separate volume) of the device being mounted. In the port, the mount hook creates a special struct vnode to fill in for this. That struct vnode is unique in that its private data is NULL, it has v_type VBLK, and its v_bufobj member is set up.

struct mount

The VFS volume that corresponds to msdosfsmount. The address provided to the Haiku VFS by dosfs_mount points to this object.

struct cdev

Details of the device being mounted.

struct buf

Analagous to a Haiku block cache block, but with public metadata. bread() or getblkx() will get a buf, and bwrite() or brelse() will put it. In the ported driver, this system is a wrapper for the the block cache. It is also set up to use the the file cache if a BSD function accesses regular file data, although that doesn’t happen under the current implementation.

struct bufobj

The ‘parent’ of all struct bufs associated with a device.

Inode Numbers

Since the FAT filesystem doesn’t store inode numbers on disk, they must be generated by the driver. The original Haiku FAT driver generated inode numbers based on the location of the file’s directory entry when possible, and assigned “artificial” (arbitrary) numbers when the location-based number was not viable. In this driver, that framework is carried over, but the math used to generate location-based numbers is that used in FreeBSD.

The driver will attempt to assign a location-based inode number as follows:
regular files:
parent is the FAT12/16 root directory:

index of direntry

otherwise:

(cluster containing the direntry - 2) * (bytes per cluster) + (index of direntry) + (max root directory entries)

directory files:

(cluster number of directory - 2) * (bytes per cluster) + (max root directory entries)

The index is directory-relative when dealing with the FAT12/16 root directory. Otherwise it is cluster-relative. The (max root directory entries) term prevents collisions between FAT12/16 root directory entries and other directory entries. Note that directory files’ inode numbers are based on the location of the “.” direntry in the directory’s own data, not the location of the directory’s direntry in its parent’s data.

If the location-based number is already taken by a moved or deleted node, an artificial number is assigned. These numbers are assigned sequentially starting with ARTIFICIAL_VNID_BITS, which is set greater than that maximum possible location-based number.

The vcache maps inode to location and vice versa. Unlike the original Haiku driver, all nodes are listed in the vcache, not just those with artificial numbers. This provides a useful way to check which nodes are currently constructed.

Locking

vnode::v_vnlock is read- or write-locked when vnode or denode member variables are read/written. In addition, when entries are being added to or removed from a directory, that directory’s v_vnlock is write-locked. Its v_vnlock is also write-locked when msdosfs_lookup_ino is called to find a entry within that directory, because the directory’s denode::de_fndoffset and de_fndcnt will be set as part of the output of that function. If a direntry is being modified in place, the v_vnlock of the entry’s node, not the parent’s, is locked.

mosdosfsmount::pm_fatlock is write-locked during changes to the FAT itself and when data clusters are being allocated.

mount::mnt_mtx is locked in functions that operate at the volume level and in some functions that operate at the node level, but in which locking a single node might not be sufficient.

Caches

The file cache is used for regular files. The block cache is used for directory files, the FAT, and the FAT32 fsinfo sector.

The driver’s present use of the block cache to work with directory files is inefficient. The ported BSD code is designed to read and write directory files in cluster-size blocks. Because FAT data clusters are offset from the start of the volume by an arbitrary number of sectors (occupied by the FAT etc.), data clusters are liable to be offset from the cluster-size blocks that Haiku’s block cache can provide. When a BSD function needs a cluster-size block, the driver gets multiple 512-byte cached blocks and copies them into another buffer to create a contiguous cluster, and vice versa when writing.

Limitations

In FreeBSD, the FAT driver relies on libiconv for character conversion, and has only limited internal support for non-ASCII characters in the short filename stored in a direntry. In the port, libiconv is not available (except in the userlandfs module) and the driver can have trouble reading filenames containing characters that are not in OEM code page 850. This can result in dosfs_walk failing to find an entry with the name reported by dosfs_readdir; in ls this would generate a “No such file or directory” error, while in Tracker the file would simply not appear. It could also prevent a user from copying a file from another filesystem.

The initialize hook only supports media with a 512-byte sector size.

The driver will refuse to mount a volume larger than 32 gigabytes or a volume with a sector size other than 512 bytes because it hasn’t been tested under those conditions.

Tracker’s restore command, for items in the trash, does not work on FAT files because it relies on attributes. The user must manually move the file to the desired directory instead.

The volume name is normally stored in the boot sector and in a false directory entry in the root directory. If the false directory entry was not created when a volume was initialized, the driver will not add one later. If given a new label to write, the driver will update the boot sector label only.

If a file is truncated while asyncronous IO is in progress, so that the read/write goes beyond the EOF, an error message “PageWriteWrapper: Failed to write page” may be printed to the syslog by the virtual memory system. The failure occurs in the area that has already been deleted from the file, so in effect there is no data loss. This error message could probably be avoided if file_cache_set_size were only used when the node is locked, but it seems necessary to unlock the node first in order to eliminate a possible deadlock (producible in the fsx test) in which file_cache_set_size is waiting for VM page events, while dosfs_io is waiting for the node lock.

The fsx test sometimes complains of non-zero data past EOF when it does a mapped read or write. This appears to happen when fsx has just changed the size of the file, and then checks for non-zero data before the file cache has had time to zero its new last page beyond EOF.

FAT Reference Material

FAT32 File System Specification (December 6, 2000)

https://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/fatgen103.doc

FAT File System (September 11, 2008)

https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-2000-server/cc938438(v=technet.10)

How FAT Works (October 8, 2009)

https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc776720(v=ws.10)

FAT Type and Cluster Size Depends on Logical Drive Size (November 16, 2006)

http://web.archive.org/web/20130315020207/http://support.microsoft.com/kb/67321/en-us