• Welcome to the Lightroom Queen Forums! We're a friendly bunch, so please feel free to register and join in the conversation. If you're not familiar with forums, you'll find step by step instructions on how to post your first thread under Help at the bottom of the page. You're also welcome to download our free Lightroom Quick Start eBooks and explore our other FAQ resources.
  • Dark mode now has a single preference for the whole site! It's a simple toggle switch in the bottom right-hand corner of any page. As it uses a cookie to store your preference, you may need to dismiss the cookie banner before you can see it. Any problems, please let us know!

Detecting bit rot experiments and jottings

Status
Not open for further replies.

LRList001

Active Member
Joined
Dec 23, 2012
Messages
456
Lightroom Experience
Intermediate
Lightroom Version
6.x
I've been doing some experiments on detecting bit rot.

One problem with a backup is knowing when it is needed. For archival photographic images and data, this is effectively an open-ended time period. So, some kind of consistency checker might be useful.

This requirement is different from cryptographic integrity, (which has to be non-reversible). Detecting bit rot, which is assumed to be random and hardware related, not deliberate, merely requires a check that has a high probability of detecting a change, any change. It is the nature of bit rot that the operating system will not know about the change, so it is no good looking at meta-data such as date of last modification, or size of file.



I have run three tests.

1/ Work out a simple check-sum for the file (the longer the word size, the higher the probability of detecting a change, 32 bits is probably long enough). Advantage, I have total control over the results, where and when it runs and how discrepancies are reported. Disadvantage, it is slow. Multi-threading isn't going to help much, it is disk speed that matters. Disk reading optimisation would likely make it somewhat faster.

2/ Use the (Windows 10) built-in certutil command (certutil -hashfile <filename> <hashtype>) (hashtype can be eg MD5, SHA256). Advantage, much faster than 1/ (call it 10x faster), but output not intended for batch use. Still, performance is such that it is 2-4 seconds per file. Again, multi-threading isn't going to help, it is disk speed that matters. Recently accessed files are fast (about instant), because they are cached by the OS, however, that isn't going to work for 1,000s of them.

3/ Use LR's DNG validation. I don't have many DNGs, I managed to get the few I have processed at roughly 300/min. So, performance ok, might take a day to analyse the number of files I have, however, I'm not currently using DNG.

There are products out there that aim to detect changes to file systems. However, so far as I know, they work mostly by trapping write events or detecting changes to the meta-data. Trapping for write events will be how the real-time ones work. Ie, no use for detecting bit rot. What I'm after is a tool that certificates each file and then at some slow, background rate, runs through all the files it is monitoring looking for a change and then has a good notification method.

So, perhaps the answer is to convert my 5* source images to DNG and use LR's validation from time to time. That does at least give some long-term reassurance for my more highly rated images.
 
While very useful, Validator doesn't solve a problem I didn't mention, which is that I have a lot of files outside LR. I ran the DNG validation to get some performance figures and then was musing on how I might use it. Thanks for the suggestion.
 
I've been doing some experiments on detecting bit rot
You peaked my interest since I'm a techy and have never heard of 'bit rot'. There has been a lot written about it and my take, from some quick research, is that it happens on the media holding the file system. This could be traditional magnetic disk, flash, DVD, SSD's etc. Actually, I don't believe punch cards could suffer from bit rot so maybe it's time these were resurrected ;-)

Someone correct me if I'm wrong but 'bit rot' MAY appear in an individual file because of an underlying media failure where the file resides. Since media failure will always be occurring (that's why we have bad sectors) today's modern file systems will ensure that when a file is written or updated it is successfully updated. This obviously doe not help when media deteriorates just by sitting there.

The solution for anticipating media failure at rest is multiple backups. We now get into the physicality of these backups. No sense if having them in the same building/cloud structure. One reason for this is because you don't access a file on a disk directly from a program. You have to go through the file system that has to talk to the appropriate drivers for the media. If there is a problem with the file system, it doesn't matter if a single file has bit rot.

If correct, it seems to me that checking for bit rot is a file system or media manufacturer test. This gets into the MTBF and life-span of media types e.g. DVD around 20 years. It also introduces needs such as RAID technologies and multiple nodes that a file gets written to in a Cloud storage solution.

Yes, you could check an individual file for bit rot. We used to do this in past when both disk and networking were not as robust as they are now. I would suggest is that if you have media bit rot on a single media more than one file would be impacted.

There are products out there that aim to detect changes to file systems
Not sure if these are the traditional disk utilities but it would be the file system that would receive the notice of a bad sector for bit rot.

I'm also thinking that SMART disk technology would be helpful here in predicting failures. Bit Rot is only one problem.

My 2 cents.
 
Lloyd Chambers has a command-line java utility called Integrity Checker which can be used to monitor for bit rot and also compare one folder tree with another (for example, to make sure your backups actually copied correctly, and aren't experiencing bit rot themselves).

https://diglloydtools.com/integritychecker.html
 
Someone correct me if I'm wrong but 'bit rot' MAY appear in an individual file because of an underlying media failure where the file resides. Since media failure will always be occurring (that's why we have bad sectors) today's modern file systems will ensure that when a file is written or updated it is successfully updated. This obviously doe not help when media deteriorates just by sitting there.

I would not tie bit rot specifically to media failure, it can also be caused by driver errors in disk or raid controllers for example; really anything that can cause an uncommanded change.

Also, it is not true that "today's modern file systems will ensure that when a file is written or updated it is successfully updated" at least not if you include windows and mac and typical linux. It may TRY to ensure that, but if you include the entire stack up through the application then there are many failures of design that let errors occur without being handled properly. Most windows (and I suspect Mac) programs for example generically trap write errors and at best give a non-detailed failure error, leaving no indication of which file and whether or not it was damaged.

There are file systems explicitly designed to detect classic bet rot, i.e. changes that do not go through the normal file I/O routines. ReFS in Windows, zfs, btrfs are three that come to mind. None are mainstreamed, in fact Windows seemed to have forgotten about ReFS, it never appears to have matured where it could replace NTFS. These all have maintenance functions that do the sort of checksum discussed above on a routine basis.

I disagree that checksum validation is too slow, and that multiple threads won't help. I wrote the LR Validator mentioned above and it will run (if I recall) 7 threads trying to get the disk drives busy (admittedly I have SSD). Checksum calculation is pretty compute intensive so threading is good, it also provides overlap of the update on the disk with the calculations keeping both busy rather than alternating. It's plenty fast for occasional runs. It is not, however, suitable for people that use TIFF and JPG since it detects changes if you do something like edit-original in photoshop. But it works for raw. I wish LR would do such a tool, but they seem satisfied with DNG. The main drawback of my approach is that it is not in line with "real" updates from LR and Photoshop, so it cannot distinguish an uncommanded change from one made (for example) in photoshop.

Note you will have that problem with any checksum program that is not tightly integrated with ALL programs that can change data in non-raw files (and in raw files for writing metadata back into the file vs xmp).

The only backup tool I've found that does checksum comparison routinely is Goodsync. It's one reason I use it (the UI is awful; complete and powerful but not intuitive, but it's a great tool otherwise).

I also use Teracopy anytime I am moving large amounts of data between drives because it will do a verify after copy. Copying files is, I think,by far the most likely time you can introduce errors.

End of rambling...
 
Last edited:
Status
Not open for further replies.
Back
Top