Yes, let's move away from the standard syntax used to specify generics, which is used in virtually every widely-used programming language of the past 15-20 years I can think of, and instead redefine the most common syntax used for array indexing (oh, and make array indexing look like a function call).
This fixating on relatively inconsequential issues, while totally missing the forest for the trees, is a dangerous attribute I've found in a small subset of programmers.
I would say it's in a small subset of people who think like computer scientists rather than like software engineers. Computer scientists think about better syntax for computer languages. (I don't think that this is actually a better syntax, especially given the already-established conventions. But thinking about array indexing as a function call is, in my view, an interesting take.)
Software engineers, on the other hand, worry about how to efficiently write non-trivial programs. For that problem, this change doesn't move the needle whatsoever.
Most non-junior programmers think more like software engineers than like computer scientists. And even many who think like computer scientists can see the forest, not just the trees.
I've never seen a computer scientist put a lot of effort into syntax. The POPL people are more interested in type systems and semantics than they are in syntax.
This article completely loses me at the end where it suggests that using square brackets for assignment and collection access is an unnecessary/dangerous convenience.
It might not be the author's preference, but the argument made about it getting abandoned as the language ages anyway is demonstrably false with the most popular languages in the world (Python, JavaScript, Java, C, C++), and in languages with operator overloading it could never be true no matter how far the language evolved.
> How do you disambiguate that from a function call? It's now just abusing ()
You don't. If you apply an integer to an array, it will return an entry. To me, it makes perfect sense to use a(i) instead of a[i]. In fact, in case you need some more complicated data structure to organise your data one day, you could switch to a function call for lookup without rewriting the code that uses a(i).
The question is not 'how do you disambiguate?', but 'why do you want to distinguish?' Because an array is just a special case of a function that maps an integer to something else. It is logically not different from a function with a large contiguous switch statement.
> To me, it makes perfect sense to use a(i) instead of a[i].
I disagree. Square brackets links to memory, paren just returns data. Of course you could implement the paren function to return a reference, but it makes the code a lot harder to read since that is not the normal case. See comparison:
If it is a list, access like that is a function. Even in the context where it's an array with the depedence we have these days on function inlining, it could still be a function
Another way to make that argument is that arrays are an exponential type and therefore are logically equivalent to functions.
That is, the cardinality of the type Array<T> is exactly the same as the cardinality of the type Function<Integer, T>. Any pure function that takes an integer and returns T can be replaced with an array, and vice versa.
Same deal with Map<K, V>, that's logically equivalent to Function<K, V>.
Two problems:
1. People don't think in category theory, they have containers to contain thinks and functions to calculate things.
2. People are right, the function call really is doing something different than an array index. If a thing is different, it should look different.
You are wrong, collections are mutable and hence different from functions. Index operators are expected to return l-values while functions are not, that difference is extremely important so is worth the compiler overhead in order to make code easier for users to read.
You can't analyze this stuff with the specifics of a language in mind, or even assume there's a heap, let alone talk about what compiler optimizations apply. Because if we try to do that, what's our common basis of understanding? You could be talking about gcc and I'd be talking about clang and we'd go in circles.
If we limit ourselves to the math, we can agree that a type is a domain that contains certain values and excludes others. But we have to throw out mutability because values, by mathematical definition, are immutable.
And once we've got a clear notion of what values are, we can then talk about how many values there are in a type. And that's where I'm coming from in claiming functions are equivalent to containers.
I mean, really? Isn't the whole point of the STL to paper over those differences and supply functions with similar semantics for those kind of accesses regardless of the data structure?
About the only reason I can see for array access syntax existing is that in the days before optimizing compilers, people would have lost their mind if array access called into a function each time (with good reason). It took manual programmer intervention to hint what compilers couldn't cleanly infer themselves. We don't need that anymore.
One is looking up a value, while the other is (in most of these languages) executing a subroutine.
While they are mathematically equivalent, in these languages they are doing different things. Function application and array access are two different concepts the symbols are trying to convey to a programmer.
This is why, e.g., in Haskell where a string is literally a list of characters, they nevertheless have a different syntax for the two.
I think .get would be more plausible. It's just strange that we can't use [] for indexing, which is an operation that can apply to any container, but << and >> must be reserved for bit-shifting.
If I'm adding shifting to a new language, I'm inclined to use functions just so it's clear what's going on, and to put operations like rotation on an even footing. Also, I don't think people have an intuition for the precedence of shift operators, which makes them less useful.
The article proposes using basic function call syntax. That makes no sense to me. And with .(), you now have 3 characters to type instead of 2 with []. Obviously all of this is pedantic, but then again so is the article.
I think the tone of this post is insufferable, and the arguments have little merit presented this way.
The alternative suggested is also non ideal, since array declaration/indexing is semantically different than a function call, not all languages have method calls, not all languages have postfix method calls, I can think of a few ways to break that syntax (what if the type has a call operator and indexing operator?), and it is hypocritical.
The problem with <> is that they are used elsewhere. If you use [] by sacrificing arrays, () will cause problems because they're used elsewhere.
The lowest friction solution would be to introduce a new two character bracket. How about <: or :>? (: :)? I don't know but writing about it in that tone won't get anyone to do something different.
From a computer science standpoint it is obvious that generic declarations are just functions that accept type parameters and return a declaration according to the type parameters. So the only reasonable suggestion would be to use the same syntax for both type parameters and normal parameters and optionally add something that distinguishes between the two (like your !).
Meanwhile picking either <> or [] feels like that decision is based on personal preference. Obviously, you can justify choosing the most popular syntax because of familiarity but then [] was never an option to begin with.
I tried (lightly) to convince people to switch from <> to [] well before Rust 1.0, but even when many (not sure if it was most) agreed that square brackets were superior, there was no will to make such a radical syntactic change, for two main reasons:
1. Angle brackets are also not uniformly superior. Technically, I think it’s fair to consider them uniformly superior, but social aspects are important too; there was reasonable concern that Rust had exhausted its weirdness budget. (Of the most common languages people may be familiar with when coming to Rust, I think Scala is the only one that uses square brackets for generics; the likes of C++, C and Java are much more common and use angle brackets.)
2. For Rust specifically, it wouldn’t actually have been just a syntactic change—if you want to change array indexing to use function call syntax, you’ve got to sort out more technical challenges there, things that would be approaching trivial in full-GC languages but which are actually rather difficult in Rust. Specifically, function calls are rvalues only (`… = f()`), but array indexing can be an lvalue also (`a[i] = …`). So you need to more or less unify the Index and Fn trait families, auto-ref/-deref might cause trouble, and placement might make an appearance in it too for best results (and that’s something that still isn’t resolved). Now I’m inclined to believe that these changes would have been a really good thing and resulted in a more coherent and incidentally slightly more flexible language (and not in a dangerous way), but it would now be even more difficult to achieve (though not impossible; the edition system could be used for the syntax side of things, and Fn/Index implementations are still mutually exclusive, because you can’t implement Fn manually on stable, so you could blanket-impl Fn traits for types implementing Index traits without breaking backwards-compatibility).
I’m simplifying the story a bit. Others may fill out more history and technical detail if they desire. I’m going to bed.
Can't say I agree with most of the last paragraph. I much prefer to have my function calls easily differentiated from my array indexing / hashmap keying.
Though I do agree that array/object literals using [] and {} are probably not necessary.
My personal anecdatal arguement against "Hard to parse for humans" is... I actually really like generics residing within <>. It's actually harder for me to read generics denoted otherwise (D uses parentheses)
As for the last paragraph, I'd implore language implementors to continue using < > lest we get that awfully ambigous nonsense.
Scala uses brackets for generics and parens for array accesses. It works nicely, I think.
The primary argument against < and > as generic brackets is that the ambiguity can make for some confusing error messages. It also prevents you from pre-matching your braces before the parsing phase (a technique that enhances error recovery).
Maybe it's my bias of having been introduced to parameterized types via C++'s template mechanism, but I find that <> looks natural to me and Scala's use of square brackets feels off. In general, I find myself preferring C-derived syntax.
I managed to get them back in there with a space in between.
Input sanitizing and escaping is one aspect of the HN code I've never dug too much into. There are a lot of corner cases that don't work, but it's never been a high enough priority to fix them.
If this had been published in, like, 1982, maybe the author would've had a point. But by now we're all so used to <> for generics, especially when scanning code rapidly, that I can't imagine the meager benefits of switching would outweigh the cognitive cost.
I don’t think in 1982 they had issues with it - most problems that this article purports angle brackets to have are problems we have recently invented.
As for HN not being able to handle having them as characters in titles, one would hope that would be a “simple” matter of escaping the characters at time of submission and parsing them at time of rendering.
Try using verilog where <= is both the non-blocklocking assignment operator and the less-than-or-equal operator. At first it might sound/look weird, but you can get used to it easily.
It’s all about the context; I think it would actually be more confusing to go with the author’s suggestion.
Furthermore, I can’t think of a place in code where the generic and comparison use cases of < or > would be ambiguous or even adjacent.
> Furthermore, I can’t think of a place in code where the generic and comparison use cases of < or > would be ambiguous or even adjacent.
Most languages that use angle brackets for generics (e.g. C++, Rust, Swift) also allow you to explicitly instantiate generics, which leads to a syntactic ambiguity, e.g. is
a < b > (c)
a function call or two comparisons? The usual way to disambiguate in these cases is to check whether b is plausibly a type and treat it as a generic, but this is ugly on a few different levels.
> The usual way to disambiguate in these cases is to check whether b is plausibly a type and treat it as a generic
Most compilers don't even go that far: using type information parsing is a huge no no for almost all languages that aren't C++. So for example, Typescript will flag:
a<b>(c)
as an error if a, b, and c are untyped, meaning it automatically resolved the generic application in the parser before type information was processed, while this is fine:
a<b>=(c)
Since the equal sign means > is part of a >= token. The parser is automatically determining if generic application was meant or not, before any type info is considered.
That's why I said "plausibly". From the TypeScript spec, it looks like it takes the Boolean grammar approach (treat it like a type expression if it's a valid type expression, otherwise don't):
The grammar ambiguity is resolved as follows: In a context where one possible interpretation of a sequence of tokens is an Arguments production, if the initial sequence of tokens forms a syntactically correct TypeArguments production and is followed by a '(' token, then the sequence of tokens is processed an Arguments production, and any other possible interpretation is discarded. Otherwise, the sequence of tokens is not considered an Arguments production.
Right...the big weakness of this approach are the error messages, which are fairly obtuse if you ever happen to run into this corner case. Luckily, most programmers won't.
If you need compatibility with C, JS or whatever, then you can't use square brackets for generics anyway. If you don't need such compatibility, why can't you just make that a function call and tell users to avoid funny combinations of comparisons and parens?
But when I write articles, how it will be handled by various platforms is on my list of concerns. So many platforms damage things with & < > ' " in them, stripping angle brackets and maybe their contents, and possibly entity-encoding all of those characters. (I find that curly quotes are harmed less often than straight quotes these days!)
When I wrote https://chrismorgan.info/blog/rust-artwork-owl/, which has the title “<_>::v::<_>”, I thought carefully about how it would be handled by feed readers, &c. Fortunately I had already been very careful in the site implementation (e.g. I support HTML in my titles and can strip it or have a different plaintext title, so angle brackets were definitely handled correctly) and had made the deliberate decision that my feed was Atom and uses <title type="html">, so feed readers should all get it right (though doubtless some will ignore the declared semantics and butcher it); if it had been an RSS feed (which doesn’t support declaring the type of a title), some feed readers would have stripped it to “::v::” (and been justified in so doing). Reddit handled it just fine, but maybe it’s just as well I didn’t submit it to HN!
That’s the most ridiculous article I read this week. At this point the fact I can’t input < and > with my virtual Azerty on physical Japanese keyboard could make in it to make it more substantial.
IMO haskells way of defining type parameters with forall is the best option. It doesn't fit with C-style function definitions though.
It's kind of a stupid thing to rant over. Whether a language uses <> or [] is the least interesting thing about it, but that doesn't stop everyone from bikeshedding.
Personally, as a C++ adventurist I find the instantiating syntax a bit confusing, but quite tolerable.
I wonder whether anyone attempted to use "|T|" as their template syntax...Ruby uses it in iterators (I think) and if I'm not mistaken Rust uses it with closures.
Wouldn't be a bit cleaner to have something like the following?
|T| something (T foo) -> T {
// do something in here
return foo;
}
> Ruby uses it in iterators (I think) and if I'm not mistaken Rust uses it with closures.
Ruby and Rust both use them for closures (I assume Rust took them from Ruby). Nearly all of Ruby's iteration constructs are based on closures, so it makes sense that you understood them as for iterators.
I agree with some parts of the problem (those < > chars already have other uses), but not others (guillemets are even shorter and they read just fine). I don't care for the proposed solution: it's just robbing Peter to pay Paul.
Now people are proposing digraphs and trigraphs as alternatives. Have we learned nothing from C?
I'm annoyed that (even in 2020) every programming language restricts its syntax to ASCII characters (1967) which can be typed on an IBM Model F keyboard (1981). Unicode has dozens of unique styles of brackets. Everybody's using an OS that supports Unicode, and an editor/IDE that autocompletes most of their source code anyway. We're even using programming languages which allow Unicode identifiers, so ASCII-only viewers have been in trouble for 25 years already. People are even using emoji in commit messages. That ship has sailed. Unicode is safe to use.
It could be List⟦Int⟧ or List⟬Int⟭ or List「Int」 or dozens of others. They're big, they're clear, they're easy to parse (one char, no other uses). All you need to do is pick one and make it part of the language, and update a few editor modes to support it in some templates.
It's not going to happen, but my favorite solution to the parsing ambiguities would be to parse < as a comparison if there is whitespace around it, and as a template bracket otherwise. Kind of how "puts -1" in Ruby prints -1, and "puts - 1" tries to subtract 1 from the result of puts.
Unfortunately, scala caused parsing problems by allowing alphabetic function names as operators, grabbing implicit arguments from all over the place, and leaving out () in function calls (I think they reversed the latter decision)
This fixating on relatively inconsequential issues, while totally missing the forest for the trees, is a dangerous attribute I've found in a small subset of programmers.