Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Doing my own syntax highlighting (finally) (alexwlchan.net)
31 points by speckx 77 days ago | hide | past | favorite | 7 comments


I like the concept. I don't buy all the author's personal choices, though.

I think the highlighting should serve 2 purposes: 1. help you parse the code, and 2. put more or less emphasis on some elements.

Comments have a primary role when reading someone's code, so they deserve a distinguished color by virtue of point 2 above.

Strings are sometimes difficult to parse correctly because the symbol to start them is the same to end them, so item 1 above applies.

And variable definitions have the tendency of hiding in plain sight, despite being crucial to understand a piece of code, so they match both criteria 1 and 2.

But numbers, booleans and constants can't be possibly mistaken for anything else, nor do they need to stand out more than the rest, so why highlighting them?

Deemphasizing punctuation might be a good idea: I'd probably reserve the same treatment to some common boilerplates, too, like #include in C/C++, #[derive] in Rust, etc.

Finally many languages make it hard to tell types and variables apart. Therefore I'd argue that types deserve their own coloring, obeying reason 1 above.

To add a final nitpick, the two "use" statements in the example define two symbols, "FilterType" and "Error". I think only these two words should be highlighted in blue, not the rest of the hierarchy.


I've long felt this whole concept is a waste of time. I've been using a very simple rule since shortly after I started programming: comments are green. That's it. That's the rule. Distinguish code from stuff that isn't code, and otherwise just read the damn thing. No highlighting at all would be fine too. Playing games with rainbows makes everything take more effort to read, not less. You ever notice how you do just fine parsing syntax intuitively and subconsciously as, for example, you're reading this paragraph? We already spend our entire lives training the skill of instantly understanding syntax when reading un-highlighted text.


> Syntax highlighting is mostly a matter of taste

To quibble a bit: Color choice is mostly a matter of taste, but highlighting itself is a matter of workflows. Highlighting syntax in particular just happens to be a default most people find acceptable.

In other circumstances, the user may benefit from coloring by variable data-type, or coloring by distinct variable name, or coloring by scope, etc. Often IDEs will keep the font color, and use other channels like a highlight-box around some text, or gutter-icons.


A more sensible text related to code presentation is Rougier's On the design of text editors. But I found I don't really need a lot of syntax highlighting. My preference is towards those separations:

- Code vs comments - builtins vs other symbols - maybe strings.

But I like to rely on whitespace (blank lines and indentations) more than colors these days.

[0]: https://arxiv.org/abs/2008.06030


> he suggests colouring just a few key elements, like strings, comments, and variable definitions. I don’t know if that would work for everybody, but I like the idea

I don't like this trend copying at all. The post he's referring to is probably written by someone with light sensitivity.


I kind of like it as it shifts the focus from syntax to logic. In futherance of that framing, some changes I'd include

- make all literals the same color

- make uses of defined items colored as a lighter color of the definition

- still color conditionals and loops

- use JS to have hovering over an item to highlight all other uses


I don't think syntax highlighting is supposed to make any particular things stand out from "the rest" of the code. Every token of code is important at some part of the process; otherwise it wouldn't be there. So there's no "rest" of the code to stand out from.

Rather, the point of syntax highlighting (IMHO) is to accomplish three closely-related goals:

1. to insert obvious boundaries wherever the syntactic category of the lexeme stream changes, by changing color. This is why political maps color each country differently — it outlines what region of the map is in what country. (Note that, on its own, you don't need any given region to have any stable assigned color to achieve this effect. Political maps are often colored using the four-color theorem. Code could be too, if this is all you wanted to achieve.)

2. to create a scannable visual index, with the colors serving as syntactic categories, allowing your eyes to jump around the screen, or scan the file while scrolling, "by syntactic category." (That is: to re-anchor your eyes on a line that contains the identifier `foo`, without syntax highlighting, you'd have to either read the file line-by-line; or remember "where you left" the line by the relative shapes of the lines on-screen; or literally search for `foo` in your editor. But if `foo` is an identifier, and identifiers have their own distinct color, then you can glance around the screen for all the tokens that have been syntax-highlighted as identifiers — and then, as your eye lands on each identifier, you just check whether it says `foo`.) This is a reflex you pick up after reading a lot of code in a stable syntax-highlighting scheme; you might not even be aware you do this!

3. to induce in the user a sort of syntax-category<=>color synesthesia, where you can learn to spot problems in the code simply by noticing that something is the wrong color; or that you expected a token of a certain color to be present, but it's not (this is why parens+brackets+braces are often each given their own distinct highlight color). Basically the inversion of #2.

You really only get any of these benefits to the degree that your syntax highlighting is [as the author puts it] "christmas lights diarrhea." You immediately lose benefit #1 as soon as any two syntactic categories are the same color. And you lose benefits #2 and #3 more and more as fewer things have their own distinct highlight colors.

"Fully" colorized code might be ugly as hell to just read; but when you're actually writing it, it's ergonomic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: