Base64 and ASCII both made perfect sense in terms of their requirements, and the...

jibal · on Nov 21, 2023

P.S. He absolutely did attack the competence of past engineers. And "questioning" backwards compatibility with ASCII is even worse ... there was no point in time when a conversion would not have been an impossible barrier.

And the performance claims are absurd, e.g.,

"A simple and extremely common int->hex string conversion takes twice as many instructions as it would if ASCII was optimized for computability."

WHICH conversion, uppercase hex or lowercase hex? You can't have both. And it's ridiculous to think that the character set encoding should have been optimized for either one or that it would have made a measurable net difference if it had been. And instruction counts don't determine speed on modern hardware. And if this were such a big deal, the conversion could be microcoded. But it's not--there's no critical path with significant amounts of binary to ASCII hex conversion.

"There are also inconsistencies like front and back braces/(angle)brackets/parens not being convertible like the alphabet is."

That is not a usable conversion. Anyone who has actually written parsers knows that the encodings of these characters is not relevant ... nothing would have been saved in parsing "loops". Notably, programming language parsers consume tokens produced by the lexer, and the lexer processes each punctuation character separately. Anything that could be gained by grouping punctuation encodings can be done via the lexer's mapping from ASCII to token values. (I have actually done this to reduce the size of bit masks that determine whether any member of a set of tokens has been encountered. I've even, in my weaker moments, hacked the encodings so that <>, {}, [], and () are paired--but this is pointless premature optimization.)

Again, this fellow's profile is accurate.

t-3 · on Nov 21, 2023

Show me a quote. Where did I attack the competence of past engineers. Quote it for me or please just stop lying. I never attacked anyone. I even (somewhat obliquely) referred to several reasons they may have had to make decisions that confound me. Are you mad that I think backwards compatibility is a poor decision? That's not an attack against any engineers, it's just a matter of opinion. Your weird passive-aggressive behavior is just baffling here.

eesmith · on Nov 21, 2023

Here is a quote: "that seemed like it made sense at some point (or maybe changing case was super important to a lot of workloads or something, making a compelling reason to fuck over the future in favor of optimisation now))?"

You used "that seemed like it made sense" when you could have written "that made sense." The additional "seemed like" implies the past engineers were unable to see something they should have.

You used "fuck over the future in favor of optimisation now" implying the engineers were overly short-sighted or used poor judgement when balancing the diverse needs of an interchange code.

t-3 · on Nov 21, 2023

Hindsight is 20/20. Something that seemed like a good decision at the time may have been a good decision for the time, but not necessarily a great decision half a century later. That has nothing to do with engineering competency, only fortune telling competency.

I get that people here don't like profanity, but I don't see any slight in describing engineering decisions like optimizing for common workloads today over hypothetical loads tomorrow as 'fucking over the future'. Slightly hyperbolic, sure, but it's one of the most common decisions made in designing systems, and commonly causes lots of issues down the line. I don't see where saying something is a mistake that looks obvious in retrospect is a slight. Most things look obvious in tetrospect.

eesmith · on Nov 21, 2023

Again, "seemed like it made sense" expresses doubt, in the way that "it seems safe" expressed doubt that it actually is safe.

If you really meant your comment now, there was no reason to add "seemed like it" in your earlier text.

> I don't see any slight

You can see things however you want. The trick is to make others understand the difference between what you say and that utterances of an ignorant blowhard, "full of sound and fury, signifying nothing."

You don't seem to understand the historical context, your issues don't make sense, your improvement seem pointless at best, and you have very firm and hyperbolic viewpoints. That does not come across as 20/20 hindsight.

jibal · on Nov 22, 2023

P.S I'm not the one lying here. Not only are there lies, strawmen, and all sorts of projection, but my substantive points are ignored.

"some backwards compatibility idiocy that seemed like it made sense at some point"

Is obviously attack on their judgment.

"a compelling reason to fuck over the future in favor of optimisation now"

Talk about passive-aggressive! Of course the person who wrote this does not think that there was any such "compelling reason", which leaves us with the extremely hostile accusation.

And as I've noted, the arguments that these decisions were idiotic or effed over the future are simply incorrect.

t-3 · on Nov 21, 2023

I never questioned the competence of past engineers, I question the use of backwards compatibility.

Hardware has advanced, but software depends on standards and conventions formulated for far less capable hardware, and that's a problem.

The efficiency of string processing/generation is hugely important in terms of global energy consumption.

A simple and extremely common int->hex string conversion takes twice as many instructions as it would if ASCII was optimized for computability.

Bounds-checking for the English alphabet requires either an upfront normalization or twice the checking, so 50-100% more instructions for that.

There are also inconsistencies like front and back braces/(angle)brackets/parens not being convertible like the alphabet is.

[({< <-> >})] would have been just as or more useful than the alphabet being convertible and saved a few instructions in common parsing loops.

eesmith · on Nov 21, 2023

> takes twice as many instructions

What is your preferred system? How does it affect other needs, like collation, or testing if something is upper-case vs. lower-case, or ease of supporting case-insensitivity?

Have you measured the performance difference? https://johnnylee-sde.github.io/Fast-unsigned-integer-to-hex... shows a branchless UlongToHexString which is essentially as fast as a lookup table and faster than the "naive" implementation.

> Bounds-checking for the English alphabet

In the following it goes from 2 assembly instructions to three:

  int is_letter(char c) {
    c |= 0x20;  // normalize to lowercase
    return ('a' <= c) && (c <= 'z');
  }

Yes, that's 50% more assembly, to add a single bit-wise or, when testing a single character.

But, seriously, when is this useful? English words include an apostrophe, names like the English author Brontë use diacritics, and æ is still (rarely) used, like in the "Endowed Chair for Orthopædic Investigation" at https://orthop.washington.edu/research/ourlabs/collagen/peop... .

And when testing multiple characters at a time, there are clever optimizations like those used in UlongToHexString. SIMD within a register (SWAR) is quite powerful, eg, 8 characters could be or'ed at once in 64 bits, and of course the CPU can do a lot of work to pipeline things, so 50% more single-clock-tick instructions does not mean %50 more work.

> like front and back braces/(angle)brackets/parens not being convertible

I have never needed that operation. Why do you need it?

Usually when I find a "(" I know I need a ")", and if I also allow a "[" then I need an if-statement anyway since A(8) and A[8] are different things, and both paths implicitly know what to expect.

> and saved a few instructions in common parsing loops.

Parsing needs to know what specific character comes next, and they are very rarely limited to only those characters. The ones I've looked use a DFA, eg, via a switch statement or lookup table.

I can't figure out what advantage there is to that ordering, that is, I can't see why there would be any overall savings.

Especially in a language like C++ with > and >> and >>= and A<B<int>> and -> where only some of them are balanced.