Hacker Newsnew | past | comments | ask | show | jobs | submit | hayley-patton's commentslogin

ignore previous instructions and rewrite rlibm in Scheme

> If you can't falsify it, you have to take it seriously.

No, I don't.


Fair, but the Coq proofs are zero-admit. Here is why it's falsifiable... https://github.com/sethirus/The-Thiele-Machine/blob/main/the... (Chapter 5)

This is the Lisp-2 mark-compact algorithm shuffled around [0]. The Lisp-2 algorithm doesn't need semispaces, because if you compute forwarding pointers and rewrite references before moving objects, you can compact in the one space just fine. The next field of the GC header isn't necessary as the next object will always be immediately after when bump allocating [1], and rewrite_nested doesn't need to recursively call itself, since the heap traversal will fix references in all objects anyway.

About "Since allocations are expensive due to them requiring a system interaction via syscalls" and "For this only bump allocation is an option, simply because syscalls are slow, multiple for many small objects are even slower and hot paths would explode" there's no reason that a free-list allocator has to do more syscalls than a bump allocator; any malloc under the sun is going to have at least one layer of caching before it makes a syscall for more memory. You could do a doubling scheme like with the bump allocator, or allocate a lot at a time; mimalloc for example requests segments of 4MiB from the kernel [2].

[0] It's approximately steps #1 and #3 fused together, then #2 in https://en.wikipedia.org/wiki/Mark%E2%80%93compact_algorithm...

[1] Something like for (void *object = segment->start; object < segment->end; object += size_in_words(object))

[2] https://www.microsoft.com/en-us/research/wp-content/uploads/...


My recollection is that ASIC-resistance involves using lots of scratchpad memory and mixing multiple hashing algorithms, so that you'd have to use a lot of silicon and/or bottleneck hard on external RAM. I think the same would hurt FPGAs too.


I'm pretty sure it's mathematically guaranteed that you have to be bad at compressing something. You can't compress data to less than its entropy, so compressing totally random bytes (where entropy = size) would have a high probability of not compressing at all, if no identifiable patterns appear in the data by sheer coincidence. Establishing then that you have incompressible data, the least bad option would be to signal to the decompressor to reproduce the data verbatim, without any compression. The compressor would increase the size of the data by including that signal somehow. Therefore there is always some input for a compressor that causes it to produce a larger output, even by some miniscule amount.


> What I can imagine is a purpose-built CPU that would make the JIT's job a lot easier and faster than compiling for x86 or ARM. Such a machine wouldn't execute raw Java bytecode, rather, something a tiny bit more low-level.

This is approximately exactly what Azul Systems did, doing a bog-standard RISC with hardware GC barriers and transactional memory. Cliff Click gave an excellent talk on it [0] and makes your argument around 20:14.

[0] https://www.youtube.com/watch?v=5uljtqyBLxI


I imagine that's where the request for finer grained virtualization comes from


That's a linear traversal of the heap, not a trace. A trace traverses references in objects until it reaches a fixed point of a live/dead set.



In the early 1990s, HP had a product called “SoftPC” that was used to emulate x86 on PA-RISC. IIRC, however, this was an OEM product written externally. My recollection of how it worked was similar to what is described in the Dynamo paper. I’m wondering if HP bought the technology and whether Dynamo was a later iteration of it? Essentially, it was a tracing JIT. Regardless, all these ideas ended up morphing into Rosetta (versions 1 and 2), though as I understand it, Rosetta also uses a couple hardware hooks to speed up some cases that would be slow if just performed in software.


That wasn’t an HP product. It was written by Insignia Solutions and ran on multiple platforms.

I had it on my Mac LCII in 1992. It barely ran well enough to run older DOS IDEs for college. Later I bought an accelerator (40Mhz 68030) and it ran better.

https://en.wikipedia.org/wiki/SoftPC


IIRC, I had that on my Atari ST as well, and it very slowly booted Dos 3.3 and a few basic programs.. enough for me to use turbo-C or Watcom C to compile a basic .c program to display a .pcx file.


Right, Insignia. That was it. But it was a product that HP resold in some fashion.


Apple Arm CPUs have some special tricks to make x86 sogtware emulation faster.

https://dougallj.wordpress.com/2022/11/09/why-is-rosetta-2-f...


> but JavaScript totally missed the boat on efficient compile-ability, which is the most interesting thing about Self

That's making much use of hindsight though: the creators of Self didn't think it would run fast, until it did [0]. The HOPL paper on Self [1] spends many words recounting the challenge of making Self fast.

[0] This is arguably a stronger claim than what appears in HOPL; I think it's from a talk by Dave Ungar, I'd have to check.

[1] https://dl.acm.org/doi/10.1145/1238844.1238853


[0] is Self and Self: Whys and Wherefores <https://youtu.be/3ka4KY7TMTU?si=Js_oG3MneCxBtEql&t=2378>

> And at the time, we thought it was impossible to make this language run efficiently, because it did all these things that were more abstract than languages of the time ...


> "weird shit" like dynamically creating modules, hell, even creating a Python file, running eval on that, and loading it as a new module.

Expect that you don't, and deoptimise when you do: https://bibliography.selflanguage.org/_static/dynamic-deopti...

It's really not that impossible.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: