Home
Developer Resources
QNX RTOS v4
QNX RTOS v4 Knowledge Base

QNX RTOS v4 Knowledge Base

Foundry27
Foundry27
QNX RTOS v4 project
Resources

QNX RTOS v4 Knowledge Base

Title Minimizing filesystem corruption from unexpected shutdowns in QNX4
Ref. No. QNX.000009561
Category(ies) Filesystem, Configuration
Issue Our QNX4 filesystem needs to be able to be shut down (complete power off) at any time during operation.  The system also needs to be able to restart without any operator interaction.

QNX's Fsys filesystem is designed to be 'fault resistant'. It is not a 'fault tolerant' filesystem. It is not based on transactions, but is based on a series of integral pieces:

-bitmap file to handle allocation/deallocation of blocks
-root blocks, directory blocks, inode tables

All these structures are deemed 'critical' structures in that if corruption occurs in them, there is possibility of data loss and loss of filesystem integrity.

The QNX System Architecture Book page 91-92 describes this in more detail.  The attempt is always made to try to stay in a 'sane' state.




Solution QNX does not believe that there is any 100% safe solution that does not involve a UPS and some form of shutdown procedure.

Any filesystem not based on redundant transactions will have problems. And it is not yet determined that a transaction based filesystem could hit the 100% mark.

To become more fault resistant all of the following approaches could be used:

1. only do a chkfsys when you know you have enough power up to run to completion

2. pregrow all files so that no growing of extents and no updating of directory entries occur on a grow.

3. use Fsys options and driver options to minimize cache and write immediately to disk.

      e.g. Fsys -A -c0K &
          Fsys.ata -w 1000000 &    # for 1 second busy wait

4. program in such a manner that data is synched to disk as soon as possible.

  Details follow in appendix A.

Appendix A:
-----------
Use fd-based I/O rather than FILE * based I/O when possible.

For synchronization with the OS the former is preferred; the extra level of buffering provided by FILEs can get in the way, and routines like fflush() are misleading in terms of not doing anything much with respect to robustness.

If you need, for example, the text support routines of a FILE, you should first set up an fd yourself, and fdopen() it to get a FILE; also you can use fileno() to get the fd of a FILE for passing to lower-level IO calls.  You may also want to look into setvbuf() too.

Ok, so once you are fd-based, you can control the performance/reliability ratio of file output a number of ways:
(i)  Give open() the O_SYNC flag; this will make all writes synchronous, and block until completed;
(ii) Use the fsync() call periodically; this will flush any dirty disk blocks to the physical device, and block until completed.

It's worth saying again that if your file access is FILE-based, you must first cleanse your local dirty stdio buffers with fflush(f) before calling fsync(fileno(f)).  Think of the file access of being layered:
    physical disk <-> Fsys cache <-> stdio lib <-> your app
You have to move the data to be written securely all the way along (its normal/default path is more lesiurely).

There are also global Fsys options, in particular the '-d' delay, that can be used to reduce the window of dirty cache data, as well as options on a per-mount basis.

Chkfsys...

Chkfsys can only recover data back to a known state, i.e. the last time the on-disk inode (the structure which maintains all the information about a file) was updated.  Under normal circumstances, the on-disk inode is only updated periodically; such as when a file is first created (it will show a size of 0), and when the file is closed.

There are other events (some dictated by POSIX and others by the design of Fsys) which will cause the inode to be updated.  Examples are: when an extent grows or a new extent is created, when stat/fstat is called for that file, when fsync/fdatasync is called for that file, sometime after sync is called (there is no guaranteed time for this) or before a write operation completes if O_SYNC or O_DSYNC is in effect.

If you want guaranteed recoverable write operations, you will have to open the file with O_SYNC or O_DSYNC.  With one of those flags set, a call to write won't return until the data has been written to disk (actually, passed to the controller by the driver -- if the controller buffers the writes then all bets are off, but this is unusual) and the inode has been similarly updated if required.  When this is done, then chkfsys will be able to recover up to the last successful write operation.  Of course, this slows down file writing considerably.

A faster alternative is to write the data normally, and periodically, at suitable synchronization points, call fsync or fdatasync on that file.  These calls tell Fsys to flush any buffered writes for the specified file and, if required, update the on-disk inode.  This could be faster because you could issue a number of (buffered) writes followed by a single synchronization call.  Similar to the previous suggestion, chkfsys would then be able to recover to the last time you called one of these functions.