That's all mitigated on the server. You also have access to additional escaping ...

chrismorgan · on Dec 8, 2024

> That's all mitigated on the server.

How? Please be specific, because unless you're altering the source (which I'd be very unpleasantly shocked by), your use of untagged template strings makes it fundamentally impossible to mitigate such stuff, short of forbidding characters like & and < and " from data, which would be insanity. You're mixing template and data willy-nilly.

rglover · on Dec 8, 2024

The source isn't altered (I take the component HTML, parse it, build it into a DOM, and render it). The data from the API is compiled separately (where sanitization can happen if you have untrusted user data). The two are combined together during SSR by passing the sanitized data to the component.

Developer controlled code at the component-level is assumed trusted, however, there are three mechanisms for handling escaping/sanitizing on the server:

1. There are helpers built in to the API (you can conditionally enable sanitization depending on need) to handle sanitization and helper functions built-in to the server-side to do one-off escaping/sanitization for HTML.

2. Query params are sanitized automatically to prevent injection.

3. To prevent unescaped HTML in data from breaking renders, I base64 encode it before sending it to the browser.

This approach to escaping is by design as some use cases call for sanitization (they're rendering user-generated data) while others do not.

You're welcome to audit the source and see if I missed something or make suggestions on improvements.

chrismorgan · on Dec 9, 2024

I wrote something much longer and more detailed, but I haven’t the heart to post it in the end.

This is a phenomenally bad idea. By escaping things willy-nilly you’re making life much harder, and vastly more error-prone. You can only even vaguely sanely do such things with the support of a better type system than you have available to you. You’re encouraging storing escaped HTML in the database; exposing an API that HTML-encodes all strings; you’re misusing the term “sanitise” (sanitising is fairly consistently used to mean stripping control sequences, not escaping). This is a disaster. Given that you’re working in JavaScript, you need to use something like JSX or equivalent functions or tagged template strings, so you can escape only as appropriate. As it stands, I would genuinely expect to be able to find trivial functionality and/or security bugs in practically every decent-sized Joystick site involving user input.

One specific technical bug: https://github.com/cheatcode/joystick/blob/canary/node/src/l... tries to escape &<>"'`= but will replace backticks and equals signs with “undefined”. Defining HTML_ENTITY_MAP in a separate file was a mistake; you’ll hopefully never refer to it anywhere else, and separating it made such an error easier to make. (Incidentally, backtick and equals sign have never been useful things to escape in HTML. Truthfully, & and < are all you need in HTML text, and & and " in double-quoted attribute value. If you want to support XML, then escape < and > everywhere as well. But nothing beyond that is useful.)

rglover · on Dec 9, 2024

I updated the escape_html() method to cover the missing characters/cases and centralized the entity map to the function definition. Appreciate you pointing that out. I went beyond what you suggested in terms of characters which should cover most (if not all) problem cases.

Here's an example repo: https://github.com/cheatcode/escape_test

And example text I used to test: https://gist.github.com/rglover/c75e37b526d864313b461d38195b...

> You’re encouraging storing escaped HTML in the database

You can store unescaped HTML in the database and upon retrieval, escape it (see the linked demo repo above). You hypothetically could store escaped HTML, but you'd have to show me where I'm encouraging it (I think what you're saying is that because you can do it, I'm encouraging it).

> As it stands, I would genuinely expect to be able to find trivial functionality and/or security bugs in practically every decent-sized Joystick site involving user input.

You're welcome to audit it, but please, actually test it by building something and then provide an example repo. Make sure to use the canary release as that has the latest changes. If you email me (ryan.glover@cheatcode.co) I can give specifics on how to get set up.

---

FWIW, this isn't a productive way to point out issues with a project. A PR with comments is preferred.

chrismorgan · on Dec 9, 2024

Sorry, but it’s like the reaction to https://news.ycombinator.com/item?id=42359067: “this is neither a new idea nor a good one”. Ad-hoc escaping for serialisation is just fundamentally a really bad approach, and although it used to be how almost everything did it, nothing serious has tried working that way for at least a decade, probably closer to two by this point, because as an industry we’ve actually learned things.

Your particular angle on it might be novel, I’m not certain, but only because it’s an even worse idea. Formerly, there was a simple rule: remember to escape data when emitting HTML. You’ve replaced that with something complex: some things will be entity-encoded, and others won’t; and often you won’t want things entity-encoded. (You’re inappropriately tying API to HTML, by the looks of it.) This is just a bad abstraction that makes errors sure, and errors will lead to some mangled text in all cases, and security bugs in some cases.

It’s very similar to how the consensus is now well-established (though it’ll take plenty more time to be fully applied) in memory safety, a related security field: avoid languages like C, they’re too dangerous, use memory-safe languages instead.

Another well-established principle: parse at input, let your system deal with data, and serialise at output. Your approach instead serialises at input—and that only most of the time—and hopes you never need to parse.

I’m certain that I’ve upset you, and I’m sorry about that, but the approach really, truly is that bad, and I did just want to appeal to you to think it over. Look, if you can find someone experienced in web security and frontend framework design and things like that, preferably who’s been doing this for fifteen or twenty years, present the whole design to them, and see what they say. I genuinely expect horror.

As for your changes to escape_html, I’ll just ask why? because you’ve compounded the error. There is no plausible purpose for all the additions you’ve made. By encoding, you declare intent to feed the value to an HTML or XML parser in data state, single-quoted attribute state or double-quoted attribute state. In none of those states do any of these additions achieve anything. And if you’re making them for some other purpose, then you don’t want the HTML escaping in the first place.

Fifteen years ago, I would encounter both double-escaped HTML and entity-encoding in non-HTML contexts, far too often. Now, I feel like it’s years since I’ve seen such a thing, outside of text/plain parts of emails (but developers and marketers are frequently awful about text/plain parts).

The only advantage I can see in entity-encoding more characters, especially all normal punctuation, is that bad escaping is going to be caught earlier. But I say, if you reckon those worth escaping, why not go the whole hog and .replace(/./ug, s => '&#' + s.codePointAt(0) + ';')?

I’m not sure what your experience is, but the impression I’m getting is that you’ve been largely just JavaScript. A lot of the problems you’re producing were best-known through PHP, and better type systems are a large part of the best solutions to these things. If I’m sounding you out correctly, I’d recommend that you look into and learn some strong statically-typed language, preferably one with algebraic data types (also called sum types). Rust has been my own preference for the last decade, but there are plenty of others that would guide you in similar lessons, like (in no particular order) C♯, Swift, Elixir, Kotlin, Scala, Haskell. (I don’t include TypeScript in this list, although it could definitely help with some of the lessons, because it’s too compromised by JavaScript limitations.) If you haven’t had any of this sort of experience, getting it will help you to write better JavaScript, and make better systems.

—⁂—

Now, for a specific matter:

> You hypothetically could store escaped HTML, but you'd have to show me where I'm encouraging it

<https://docs.cheatcode.co/joystick/node/escape_html#:~:text=...>

(And if you unconditionally entity-encode query string parameters… I don’t even want to think about the implications of that.)

I would also point out that https://docs.cheatcode.co/joystick/node/app/api/sanitization says:

> Sanitization is the process of scrubbing unexpected or unwanted HTML tags from the values returned by your API.

But this is not what you’ve implemented. You don’t actually scrub, you entity-encode.

rglover · on Dec 9, 2024

After all of this, it's not terribly clear what I'm doing (aside from making escaping only through a one-off function) that's "bad." You mention "a lot of the problems you're producing" but then fail to give a what or why those things are problems.

If there's something specific that you can point me toward, I'll take a look and work on improvements. But honestly, the majority of the above reads like you just want me to feel bad and quit (and again, doesn't provide a clear answer to "what to do instead"). If you can succinctly explain why my approach is bad and offer examples of fixes (in relation to the current design of the component system), I'm all ears.

rglover · on Dec 9, 2024

After doing a bit of digging and back-and-forth with an LLM, I think I understand the problem now.

By doing escaping solely on the server, it creates two problems:

1. You're leaving escaping up to good behavior which will inevitably lead to some value that needs escaping being missed.

2. You create a double-escape problem on the client as you may need to render the raw output (e.g., rich text) and a server-escaped value would require an "unwinding" on the client.

Beyond this, I also corrected what you pointed out in the docs as that was the opposite of what I intended for that function.

I think the approach will be to do some automatic escaping on the client and then pass some special functions to the render() method on components to allow rendering raw, unescaped user data (i.e., a dangerouslySetInnerHTML equivalent).

If there's anything else specific that you'd add, let me know.

---

You seem like you have a lot of knowledge, but in the future, if you want someone to listen and comprehend what you're saying, avoid the fatalistic, theatrical language and emphasis.

Your point is incredibly helpful and showed me something I wasn't thinking about, but the way you communicated it almost made me ignore it.

If the above isn't correct (or misguided), it'd be best if you just plainly explain why and what should be done instead (and not a wholesale "this could never work, burn it down" response).