DragonFly kernel List (threaded) for 2004-12
Re: Description of the Journaling topology
:I think that there is a basic synchronisation issue in such topology.
:Due to buffering, delays, etc it is possible that in some cases
:filesystem will commit changes to the permanent storage before
:appropriate journaling entry is created, i.e.:
:1. App executes unlink("foo").
:2. Kernel sends appropriate VOP to the filesystem and to the journal.
:3. Filesystem commits metadata update, journal entry still sits
:somewhere in the buffer.
:4. App executes open("foo", O_CREAT).
:5. Kernel sends appropriate VOP to the filesystem and to the journal.
:6. Journaling system commits unlink() entry to the storage.
:7. Filesystem commits metadata update, machine crashes before journal
:entry for open() is committed.
:On reboot, kernel tries to replay journal as a result already created
:file foo is lost. The same situation may happen for subsequent write's
:and other operations - due to jounrnal lagging behing storage it is
:possible that in the case of failure some data already written to the
:storage is lost.
:How you are going to address this issue?
Solving this issue requires the filesystem to be aware of the journal's
existance, which I've mentioned in past posts. The filesystem would
have to buffer related disk operations until it gets positive
confirmation that the related journal entries have been committed.
This is similar to what softupdates does, but the implementation
would not have to be anywhere near as sophisticated.
Baring that you might not be able to guarentee that an incremental
playback from the journal would be sufficient to fully recover the
filesystem. But even in that case A full restore from backups and full
playback from the journal would be able to fully recover the
filesystem up to N seconds prior to the crash. It would just take longer.
So the basic property of being able to restore within N seconds is
still guarenteeable even without a journal-aware filesystem.