DragonFly users List (threaded) for 2011-05
Easy way to find identify files which share some content/blocks
now that Dragonfly's HAMMER has got deduplication I ask myself if there
is a simple way to identify "pairs" or groups of files which share a lot
of data, i.e. are mostly identical.
I have a rather large repository of downloaded pictures, which contain
a lot of dupes in multiple locations. I have no problems finding those
given some time and a shell prompt.
I'm interested in identifying broken files. Broken in the sense that
A is an incomplete version of B (some bytes missing), or B a damaged
version of A (some additional bytes at the end).
Is there a way to get to something like this:
"File A shares 1234 (98.3%) data blocks with file B"
"File A shares xxxx (xx.x%) data blocks with file C"
Getting a step closer helps too.
Thanks for any insights.