Sort order on renamed files not as expected

Replytoken · Jul 6, 2020

I recently renamed some files that I scanned and I am not sure about LR's sort logic. As I do not have the exact dates for most of these images, they are being named with as much date as I have. For example, an image form the 1940's would be named as 194x-xx-xxxx-(unique serial number). An image with a specific date would be named as 1946-04-29-(unique serial number). What I am not understanding is how LR decided to sort the images. I expected this:

1946-xx-...
194x-xx-...
1954-xx-...
195x-xx-...

But what LR gave me was:

194x-xx-...
195x-xx-...
1946-xx-...
1954-xx-...

Any idea what logic they are using for sorting? I put the names into an Excel spreadsheet and they sorted like the first list above, so I am a bit perplexed as to how LR came up with the second list.

--Ken

Califdan · Jul 6, 2020

I suspect that you are a victim of new and improved technology. I'm not talking about LR specifically, but computer science in general. In the olden days a sort was a sort - period. Every "character" (meaning a letter, a number, or a special symbol (including a space or blank) had corresponding numeric code Once such code as called ASCII and a sort just put them in order by their ASCII code. However, some computers used a different coding system( EBCDIC (if I recall the spelling). The main difference if my 70 year old memory doesn't fail me is that in once system the alphabetic letters came before the number and in the other the numbers came before the letters.

Well, as time went one the sort algorithms got "smarter". For example, they would treat "A" and "a" as equivalent even though they had different codes . For example let's say the 010=A, 011=a, 012=B, 013=b, etc, before this change "Acme" would sort before "aardvark" as all the words starting with big "A" come before all the words starting with little "a" (and we did most all our data entry in only upper case for this reason).

Time marches on. and more changes came along. for example, they started ignoring break character like a space or hyphen, and even started ignoring some leading words like "the" and "A" when sorting so that "The Wind" would sort with the "W"s rather than the "T"s. And, on it went.

In your case the change that got you is a more recent one where anything that looks like a number is treated as a number. in other words 1958 is a numeric value but 194x as a word consisting of the letters "1", "9", "4" and "X" and it's a whole word as it treats the hyphen as a delimiter between words or numbers the same as a space. And most sort algorithms these days sort numbers numerically and place them before before or after the words. So in your case 194x and 195x are words since they contain a non numeric character whereas 1946 and 1954 are numbers. So they grouped the words first followed by the numbers.

Sometimes new isn't as good as old.

Replytoken · Jul 6, 2020

Califdan said:
I suspect that you are a victim of new and improved technology. I'm not talking about LR specifically, but computer science in general. In the olden days a sort was a sort - period. Every "character" (meaning a letter, a number, or a special symbol (including a space or blank) had corresponding numeric code Once such code as called ASCII and a sort just put them in order by their ASCII code. However, some computers used a different coding system( EBCDIC (if I recall the spelling). The main difference if my 70 year old memory doesn't fail me is that in once system the alphabetic letters came before the number and in the other the numbers came before the letters.

Well, as time went one the sort algorithms got "smarter". For example, they would treat "A" and "a" as equivalent even though they had different codes . For example let's say the 010=A, 011=a, 012=B, 013=b, etc, before this change "Acme" would sort before "aardvark" as all the words starting with big "A" come before all the words starting with little "a" (and we did most all our data entry in only upper case for this reason).

Time marches on. and more changes came along. for example, they started ignoring break character like a space or hyphen, and even started ignoring some leading words like "the" and "A" when sorting so that "The Wind" would sort with the "W"s rather than the "T"s. And, on it went.

In your case the change that got you is a more recent one where anything that looks like a number is treated as a number. in other words 1958 is a numeric value but 194x as a word consisting of the letters "1", "9", "4" and "X" and it's a whole word as it treats the hyphen as a delimiter between words or numbers the same as a space. And most sort algorithms these days sort numbers numerically and place them before before or after the words. So in your case 194x and 195x are words since they contain a non numeric character whereas 1946 and 1954 are numbers. So they grouped the words first followed by the numbers.

Sometimes new isn't as good as old.

Great, so I am not losing my mind, but I'm now finding out that it is becoming functionally obsolete if I don't update it to take into account new programming "features".

Actually, I am glad it isn't me, but I am surprised that we cannot have more sort options, to allow the use of old standards like ASCII. It is going to be interesting to see how all of today's programmer's in their prime feel about rapidly changing standards 20-30 years from now. It's a miracle that Cut, Copy, Past and Undo have remained through all of the menu/ribbon/command key changes over the years.

Now, do I want to change my naming system or adjust my expectations?

Thanks,

--Ken

clee01l · Jul 6, 2020

The API used by Lightroom has AFAIK always been that Numbers sort first before characters. Spaces or other non Latin characters sort before or after Latin Characters. EBCDIC was an 8 bit design of those TELEX keyboard instructions. Developed by IBM. ASCII is a 7 bit scheme for the standard typewriter keys. EBCDIC and ASCII Hexidecimal overlap where the characters and numbers only match In EBCDIC the first bit is 0 and the rest conform to ASCII 7 bit values.

This has nothing to do with sort order used today. In Unicode, numeric characters are sorted before alphabetic characters. In EBCDIC, alphabetic characters are sorted before numeric characters. but that is as far as it goes. Unicode is universal and has been since well before Lightroom.
Sorting is based on the Unicode Collation Algorithm, defined by the Unicode Consortium. This standard provides a complete and unambiguous sort ordering for all Unicode characters. Understand that and you will be able to create a file naming convention that will sort as expected.
This may help UTS #10: Unicode Collation Algorithm

FWIW, Only old farts will even know about EBCDIC

Replytoken · Jul 6, 2020

clee01l said:
The API used by Lightroom has AFAIK always been that Numbers sort first before characters. Spaces or other non Latin characters sort before or after Latin Characters. EBCDIC was an 8 bit design of those TELEX keyboard instructions. Developed by IBM. ASCII is a 7 bit scheme for the standard typewriter keys. EBCDIC and ASCII Hexidecimal overlap where the characters and numbers only match In EBCDIC the first bit is 0 and the rest conform to ASCII 7 bit values.

This has nothing to do with sort order used today. In Unicode, numeric characters are sorted before alphabetic characters. In EBCDIC, alphabetic characters are sorted before numeric characters. but that is as far as it goes. Unicode is universal and has been since well before Lightroom.
Sorting is based on the Unicode Collation Algorithm, defined by the Unicode Consortium. This standard provides a complete and unambiguous sort ordering for all Unicode characters. Understand that and you will be able to create a file naming convention that will sort as expected.
This may help UTS #10: Unicode Collation Algorithm

FWIW, Only old farts will even know about EBCDIC

Thanks for making me feel young, Cletus! I know about ASCII, but had not heard about EBCDIC until this thread, so I guess I am not an old fart. But what does that say about the MIcrosoft programmers who developed Excel in the late 1980's who appear to have possibly used EBCDIC?

--Ken

Sort order on renamed files not as expected

Replytoken

Senior Member

Califdan

Senior Member

Replytoken

Senior Member

clee01l

Senior Member

Replytoken

Senior Member