DragonFly commits List (threaded) for 2008-07
DragonFly BSD
DragonFly commits List (threaded) for 2008-07
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: cvs commit: src/sys/vfs/hammer hammer.h hammer_btree.c hammer_cursor.h hammer_disk.h hammer_inode.c hammer_ioctl.h hammer_mirror.c hammer_object.c hammer_subs.c hammer_vnops.c


To: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxxxxx>
From: Michael Neumann <mneumann@xxxxxxxx>
Date: Thu, 10 Jul 2008 12:22:56 +0200

Matthew Dillon wrote:
dillon 2008/07/09 21:44:33 PDT

DragonFly src repository

Modified files:
sys/vfs/hammer hammer.h hammer_btree.c hammer_cursor.h hammer_disk.h hammer_inode.c hammer_ioctl.h hammer_mirror.c hammer_object.c hammer_subs.c hammer_vnops.c Log:
HAMMER 60J/Many: Mirroring
Finish implementing the core mirroring algorithm. The last bit was to add
support for no-history deletions on the master. The same support also covers
masters which have pruned records away prior to the mirroring operation.
As with the work done previously, the algorithm is 100% queue-less and
has no age limitations. You could wait a month, and then do a mirroring
update from master to slave, and the algorithm will efficiently handle it.
The basic issue that this commit tackles is what to do when records are
physically deleted from the master. When this occurs the mirror master
cannot provide a list of records to delete to its slaves.
The solution is to use the mirror TID propagation to physically identify
swaths of the B-Tree in which a deletion MAY have taken place. The
mirroring code uses this information to generate PASS and SKIP mrecords.
A PASS identifies a record (sans its data payload) that remains within
the identified swath and should already exist on the target. The
mirroring target does a simultanious iteration of the same swath on the
target B-Tree and deletes records not identified by the master.
A SKIP is the heart of the algorithm's efficiency. The same mirror TID
stored in the B-Tree can also identify large swaths of the B-Tree for which
*NO* deletions have taken place (which will be most of the B-Tree). One
SKIP Record can identify an arbitrarily large swath. The target uses
the SKIP record to skip that swath on the target. No scan takes place.
SKIP records can be generated from any internal node of the B-Tree and cover
that node's entire sub-tree.
This also provides us with the feature where the retention policy can be
completely different between a master and a mirror, or between mirrors.
When the slave identifies a record that must be deleted through the above
algorithm it only needs to mark it as historically deleted, it does not
have to physically delete the record.

I do not understand your last sentence. Does the deletion happen inside the mirroring? When reading your sentence for the first time I thought that I could mount the slave read/write and delete a record. But, I think what you mean is the following situation:

A slave has a record which is then deleted on the master. When mirroring
occurs the slave historically-deletes that record to stay in sync with
the master. It does not physically delete it. This means that the time
when the record is physically deleted depends on the retention policy,
which might differ from master to slave.

So this means that we could maintain weekly snapshots on the slave,
while the master would retain dayly snapshots up to the last two weeks.
But care must be taken to not prune before mirror. When keeping two
weeks on the master this should be no problem. Wow! Nice feature!

Regards,

Michael



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]