Is the general idea to have a table with definition {id:uuid,created_at:timestam...

asah · on Dec 16, 2023

yes re table definition.

To answer your question, for indexing there's a couple of options: - if you know which fields you want to index, use an expression index with a regular B-tree. - if you don't know which fields you want to index, use a GIN index on the whole field.

Blogpost which covers both cases: https://scalegrid.io/blog/using-jsonb-in-postgresql-how-to-e...

That said, I generally don't bother indexing the "raw" table, but instead create a parsed+summary table that breaks out the fields I want to index into 'real' sql columns and summarizes the data to reduce the number of records. In particular, I use a partitioned table for the raw data which keeps the data-per-table tiny enough to fit in RAM, and "real time" queries just hit the latest raw table(s). For big, real-time data, I sometimes use TABLESAMPLE queries to create estimates, or triggers to do exact counting (which is what NoSQL databases do, just without the nice structure of triggers).

berkes · on Dec 17, 2023

In event sourcing, as described in DDD, the aggregate is more important. So there (also) has to be an aggregate Id, that's non-unique (one aggregate has many events).

Then, there's the optimistic locking system that you'll need if there's a lot of concurrency. For that, typically, a 'sequence' number is added. Simple, effective.

Last, you're quite often only interested in one, or a few, types when doing reporting. A stringly typed 'type' column is often Good Enough.

Created at is handy, but not crucial.

So, you'll want something like

{ Id: uuid, aggregate_id: ?, sequence: Int, type: String, data: jsonb }

Querying then fetches "everything for aggregate x, ordered by sequence". Possibly limited to type X (e.g. when rebuilding a query index). But the first is what happens every time an Aggregate is hydrated (i.e. ShoppingCart::load(”acme/1337”) ) so it has to be indexed for that, primarily.