LRList001
Active Member
- Joined
- Dec 23, 2012
- Messages
- 456
- Lightroom Experience
- Intermediate
- Lightroom Version
- 6.x
I've been doing some experiments on detecting bit rot.
One problem with a backup is knowing when it is needed. For archival photographic images and data, this is effectively an open-ended time period. So, some kind of consistency checker might be useful.
This requirement is different from cryptographic integrity, (which has to be non-reversible). Detecting bit rot, which is assumed to be random and hardware related, not deliberate, merely requires a check that has a high probability of detecting a change, any change. It is the nature of bit rot that the operating system will not know about the change, so it is no good looking at meta-data such as date of last modification, or size of file.
I have run three tests.
1/ Work out a simple check-sum for the file (the longer the word size, the higher the probability of detecting a change, 32 bits is probably long enough). Advantage, I have total control over the results, where and when it runs and how discrepancies are reported. Disadvantage, it is slow. Multi-threading isn't going to help much, it is disk speed that matters. Disk reading optimisation would likely make it somewhat faster.
2/ Use the (Windows 10) built-in certutil command (certutil -hashfile <filename> <hashtype>) (hashtype can be eg MD5, SHA256). Advantage, much faster than 1/ (call it 10x faster), but output not intended for batch use. Still, performance is such that it is 2-4 seconds per file. Again, multi-threading isn't going to help, it is disk speed that matters. Recently accessed files are fast (about instant), because they are cached by the OS, however, that isn't going to work for 1,000s of them.
3/ Use LR's DNG validation. I don't have many DNGs, I managed to get the few I have processed at roughly 300/min. So, performance ok, might take a day to analyse the number of files I have, however, I'm not currently using DNG.
There are products out there that aim to detect changes to file systems. However, so far as I know, they work mostly by trapping write events or detecting changes to the meta-data. Trapping for write events will be how the real-time ones work. Ie, no use for detecting bit rot. What I'm after is a tool that certificates each file and then at some slow, background rate, runs through all the files it is monitoring looking for a change and then has a good notification method.
So, perhaps the answer is to convert my 5* source images to DNG and use LR's validation from time to time. That does at least give some long-term reassurance for my more highly rated images.
One problem with a backup is knowing when it is needed. For archival photographic images and data, this is effectively an open-ended time period. So, some kind of consistency checker might be useful.
This requirement is different from cryptographic integrity, (which has to be non-reversible). Detecting bit rot, which is assumed to be random and hardware related, not deliberate, merely requires a check that has a high probability of detecting a change, any change. It is the nature of bit rot that the operating system will not know about the change, so it is no good looking at meta-data such as date of last modification, or size of file.
I have run three tests.
1/ Work out a simple check-sum for the file (the longer the word size, the higher the probability of detecting a change, 32 bits is probably long enough). Advantage, I have total control over the results, where and when it runs and how discrepancies are reported. Disadvantage, it is slow. Multi-threading isn't going to help much, it is disk speed that matters. Disk reading optimisation would likely make it somewhat faster.
2/ Use the (Windows 10) built-in certutil command (certutil -hashfile <filename> <hashtype>) (hashtype can be eg MD5, SHA256). Advantage, much faster than 1/ (call it 10x faster), but output not intended for batch use. Still, performance is such that it is 2-4 seconds per file. Again, multi-threading isn't going to help, it is disk speed that matters. Recently accessed files are fast (about instant), because they are cached by the OS, however, that isn't going to work for 1,000s of them.
3/ Use LR's DNG validation. I don't have many DNGs, I managed to get the few I have processed at roughly 300/min. So, performance ok, might take a day to analyse the number of files I have, however, I'm not currently using DNG.
There are products out there that aim to detect changes to file systems. However, so far as I know, they work mostly by trapping write events or detecting changes to the meta-data. Trapping for write events will be how the real-time ones work. Ie, no use for detecting bit rot. What I'm after is a tool that certificates each file and then at some slow, background rate, runs through all the files it is monitoring looking for a change and then has a good notification method.
So, perhaps the answer is to convert my 5* source images to DNG and use LR's validation from time to time. That does at least give some long-term reassurance for my more highly rated images.