Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I apologize for the late response but here is a challenge to you.

Try to to parallelize a simplified form of applied energistics.

Applied energistics is a mod that lets you create an item transportation network. There are storage containers and machines with an inventory (for the sake of simplicity make them hold exactly 1 item and let the machines just turn A into B, B into C, C into A). The network interacts with inventories through interfaces. A storage interface makes items in that inventory accessible to every machine. Machines receive inputs through exporter interfaces and send outputs through importer interfaces.

It effectively is a database for items and that is exactly what makes it difficult to parallelize. The vast majority of games have 1:1 interactions between entities. In this system interactions can be as bad as n:m. That's also why it lags so badly with large networks. A hundred machines periodically scan hundreds of inventories.



So, firstly, I'm not entirely clear on what you're asking for here. If you just need to transfer ownership in parallel between different machines, the answer is to use a channel from one machine to the other. There are very efficient channels provided in crossbeam, and we commonly use them for tasks like this. If a channel between every machine would be too costly, a hub-spoke model can be used pretty easily, with routing performed between regions.

Similarly, fine-grained parallelism can be employed by storing each storage container behind a mutex or reader-writer lock, or even avoiding locking entirely and just using copy-on-write to update the item state when it is changed (we can either do this by executing all our state changes for each tick at once, in parallel, using Arc::make_mut, which is usually fastest, or if we need to do it asynchronously by using a crate like arcswap, which is slower). This is less efficient than a channel, but it has the advantage that the current inventory of a machine can be read without extracting the item (something you didn't specify as a requirement, but which I'm including for completeness).

Note from what I said previously that we don't actually need to continuously scan inventories for updates at all. The obvious optimization to perform is instead to have channel writes push changes directly to a change queue (this can be parallelized or sharded with some difficulty, but from experience a single channel usually suffices). The change queue can then be read or routed (in parallel or otherwise) to the appropriate storage devices to deliver its payload. If need be (since you haven't given a lot of details), we can also track which storage interfaces are being read by players, and each tick (in parallel) iterate through any players attached to the interface to notify them of new updates to that interface. There are other crates that automatically implement the incremental updates I mentioned, such as Frank McSherry's https://github.com/TimelyDataflow/timely-dataflow, for when you have something more complex to do; however, I have never had to reach for this because (which is why I wrote this post) it's actually uncommon to have something super complicated to parallelize!

From what I understand, this does not sound like it has nearly the complexity of a database :) The major thing that makes database performance harder to parallelize (though to be clear--they parallelize extremely well!) is not knowing what transactions are needed. In this case, though, we have perfect forward knowledge of what kinds of transactions there could be; the only things we would likely want to serialize would be attaching and detaching storage interfaces, and we can batch them up very easily on each tick due to the relatively "low" concurrent transaction count (keep in mind that some databases can process millions of transactions per second on a 16 core machine). And even if we did need to parallelize attaching and removing storage interfaces, it's not a strict requirement that we do that serially--crates like dashmap provide parallel reads, insertions, and deletions, and are basically an off-the-shelf, in-memory key-value database.

Finally, the kind of load you're talking about (hundreds of machines and hundreds of inventories) does not sound remotely sufficient to lag the game if it's optimized well, particularly since if we did do the naive scan strategy, it parallelizes easily (to see why: each scan tick, we first parallelize all imports into storage, then parallelize all scans from storage).

I suspect the problem here is not that the challenge you've provided is difficult to parallelize, or that it implements the functionality of a database or is M:N (by the way--something that is M:N in a hard to address way are entity-entity collisions!), but that the solution is designed in a very indirect way on top of existing Minecraft mechanics. As far as I can tell from what I've read about Redstone, it's completely possible to parallelize for most purposes to which it's put, since blocks can only update other blocks in very limited, local ways on each tick--it might even be amenable to GPU optimizations (in our own game, we would make sure that updates commuted on each tick to avoid needing to serialize merging operations on adjacent Redstone tiles). However, I could easily be misunderstanding both what you're asking for, and how Redstone works. If this is the case, please let me know!

Even more speculatively: I think a lot of game designers, when they think about parallelizing something, think about doing it in the background, or running things concurrently at different rates. While this can be done, this is primarily useful for performing a long-running background operation without blocking the game, not for improving the game's overall performance! In fact, running in the background in this way is often slower than just running single threaded, especially if it interacts with other world state. Many game developers therefore conclude that the task can't be profitably parallelized and move on. But the best (and simplest) solutions often involve keeping a sequential algorithm, but rewriting it so that each step of the algorithm can be performed massively in parallel, as in several of the possible solutions I outlined above. This is the bulk synchronous parallel model, which is the most commonly used parallelization strategy in HPC environments, and is also the primary parallelization strategy for GPU programming. It allows mixing fine-grained state updates with partitioning to maximize your utilization of all your cores, and because you're parallelizing a single workload and partitioning by write resources, it usually has far less contention with other threads than if you were trying to parallelize many workloads at once, each hitting the same stuff. This is the model we almost always turn to to parallelize things unless it's extremely obvious that we don't want them blocking a tick (like chunk generation, for example) and it reliably gives us great speedups without making the algorithm incomprehensible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: