• Welcome to the Lightroom Queen Forums! We're a friendly bunch, so please feel free to register and join in the conversation. If you're not familiar with forums, you'll find step by step instructions on how to post your first thread under Help at the bottom of the page. You're also welcome to download our free Lightroom Quick Start eBooks and explore our other FAQ resources.
  • Stop struggling with Lightroom! There's no need to spend hours hunting for the answers to your Lightroom Classic questions. All the information you need is in Adobe Lightroom Classic - The Missing FAQ!

    To help you get started, there's a series of easy tutorials to guide you through a simple workflow. As you grow in confidence, the book switches to a conversational FAQ format, so you can quickly find answers to advanced questions. And better still, the eBooks are updated for every release, so it's always up to date.

Sort order on renamed files not as expected

Status
Not open for further replies.

Replytoken

Senior Member
Lightroom Guru
Premium Classic Member
Premium Cloud Member
Joined
Dec 7, 2007
Messages
3,070
Location
Puget Sound
Lightroom Experience
Intermediate
Lightroom Version
Classic
Lightroom Version Number
Classic 9.3
Operating System
  1. Windows 10
I recently renamed some files that I scanned and I am not sure about LR's sort logic. As I do not have the exact dates for most of these images, they are being named with as much date as I have. For example, an image form the 1940's would be named as 194x-xx-xxxx-(unique serial number). An image with a specific date would be named as 1946-04-29-(unique serial number). What I am not understanding is how LR decided to sort the images. I expected this:
  • 1946-xx-...
  • 194x-xx-...
  • 1954-xx-...
  • 195x-xx-...
But what LR gave me was:
  • 194x-xx-...
  • 195x-xx-...
  • 1946-xx-...
  • 1954-xx-...
Any idea what logic they are using for sorting? I put the names into an Excel spreadsheet and they sorted like the first list above, so I am a bit perplexed as to how LR came up with the second list.

--Ken
 
I suspect that you are a victim of new and improved technology. I'm not talking about LR specifically, but computer science in general. In the olden days a sort was a sort - period. Every "character" (meaning a letter, a number, or a special symbol (including a space or blank) had corresponding numeric code Once such code as called ASCII and a sort just put them in order by their ASCII code. However, some computers used a different coding system( EBCDIC (if I recall the spelling). The main difference if my 70 year old memory doesn't fail me is that in once system the alphabetic letters came before the number and in the other the numbers came before the letters.

Well, as time went one the sort algorithms got "smarter". For example, they would treat "A" and "a" as equivalent even though they had different codes . For example let's say the 010=A, 011=a, 012=B, 013=b, etc, before this change "Acme" would sort before "aardvark" as all the words starting with big "A" come before all the words starting with little "a" (and we did most all our data entry in only upper case for this reason).

Time marches on. and more changes came along. for example, they started ignoring break character like a space or hyphen, and even started ignoring some leading words like "the" and "A" when sorting so that "The Wind" would sort with the "W"s rather than the "T"s. And, on it went.

In your case the change that got you is a more recent one where anything that looks like a number is treated as a number. in other words 1958 is a numeric value but 194x as a word consisting of the letters "1", "9", "4" and "X" and it's a whole word as it treats the hyphen as a delimiter between words or numbers the same as a space. And most sort algorithms these days sort numbers numerically and place them before before or after the words. So in your case 194x and 195x are words since they contain a non numeric character whereas 1946 and 1954 are numbers. So they grouped the words first followed by the numbers.

Sometimes new isn't as good as old.
 
I suspect that you are a victim of new and improved technology. I'm not talking about LR specifically, but computer science in general. In the olden days a sort was a sort - period. Every "character" (meaning a letter, a number, or a special symbol (including a space or blank) had corresponding numeric code Once such code as called ASCII and a sort just put them in order by their ASCII code. However, some computers used a different coding system( EBCDIC (if I recall the spelling). The main difference if my 70 year old memory doesn't fail me is that in once system the alphabetic letters came before the number and in the other the numbers came before the letters.

Well, as time went one the sort algorithms got "smarter". For example, they would treat "A" and "a" as equivalent even though they had different codes . For example let's say the 010=A, 011=a, 012=B, 013=b, etc, before this change "Acme" would sort before "aardvark" as all the words starting with big "A" come before all the words starting with little "a" (and we did most all our data entry in only upper case for this reason).

Time marches on. and more changes came along. for example, they started ignoring break character like a space or hyphen, and even started ignoring some leading words like "the" and "A" when sorting so that "The Wind" would sort with the "W"s rather than the "T"s. And, on it went.

In your case the change that got you is a more recent one where anything that looks like a number is treated as a number. in other words 1958 is a numeric value but 194x as a word consisting of the letters "1", "9", "4" and "X" and it's a whole word as it treats the hyphen as a delimiter between words or numbers the same as a space. And most sort algorithms these days sort numbers numerically and place them before before or after the words. So in your case 194x and 195x are words since they contain a non numeric character whereas 1946 and 1954 are numbers. So they grouped the words first followed by the numbers.

Sometimes new isn't as good as old.
Great, so I am not losing my mind, but I'm now finding out that it is becoming functionally obsolete if I don't update it to take into account new programming "features". :eek:

Actually, I am glad it isn't me, but I am surprised that we cannot have more sort options, to allow the use of old standards like ASCII. It is going to be interesting to see how all of today's programmer's in their prime feel about rapidly changing standards 20-30 years from now. It's a miracle that Cut, Copy, Past and Undo have remained through all of the menu/ribbon/command key changes over the years.

Now, do I want to change my naming system or adjust my expectations?

Thanks,

--Ken
 
The API used by Lightroom has AFAIK always been that Numbers sort first before characters. Spaces or other non Latin characters sort before or after Latin Characters. EBCDIC was an 8 bit design of those TELEX keyboard instructions. Developed by IBM. ASCII is a 7 bit scheme for the standard typewriter keys. EBCDIC and ASCII Hexidecimal overlap where the characters and numbers only match In EBCDIC the first bit is 0 and the rest conform to ASCII 7 bit values.

This has nothing to do with sort order used today. In Unicode, numeric characters are sorted before alphabetic characters. In EBCDIC, alphabetic characters are sorted before numeric characters. but that is as far as it goes. Unicode is universal and has been since well before Lightroom.
Sorting is based on the Unicode Collation Algorithm, defined by the Unicode Consortium. This standard provides a complete and unambiguous sort ordering for all Unicode characters. Understand that and you will be able to create a file naming convention that will sort as expected.
This may help UTS #10: Unicode Collation Algorithm

FWIW, Only old farts will even know about EBCDIC
 
The API used by Lightroom has AFAIK always been that Numbers sort first before characters. Spaces or other non Latin characters sort before or after Latin Characters. EBCDIC was an 8 bit design of those TELEX keyboard instructions. Developed by IBM. ASCII is a 7 bit scheme for the standard typewriter keys. EBCDIC and ASCII Hexidecimal overlap where the characters and numbers only match In EBCDIC the first bit is 0 and the rest conform to ASCII 7 bit values.

This has nothing to do with sort order used today. In Unicode, numeric characters are sorted before alphabetic characters. In EBCDIC, alphabetic characters are sorted before numeric characters. but that is as far as it goes. Unicode is universal and has been since well before Lightroom.
Sorting is based on the Unicode Collation Algorithm, defined by the Unicode Consortium. This standard provides a complete and unambiguous sort ordering for all Unicode characters. Understand that and you will be able to create a file naming convention that will sort as expected.
This may help UTS #10: Unicode Collation Algorithm

FWIW, Only old farts will even know about EBCDIC
Thanks for making me feel young, Cletus! I know about ASCII, but had not heard about EBCDIC until this thread, so I guess I am not an old fart. But what does that say about the MIcrosoft programmers who developed Excel in the late 1980's who appear to have possibly used EBCDIC? :rolleyes:

--Ken
 
Status
Not open for further replies.
Back
Top