Hi HN, I'm one of the founders of s2.dev. RePlaya (https://github.com/s2-streamstore/replaya) is a self-hosted browser session replay tool using rrweb (https://github.com/rrweb-io/rrweb).
It occurred to me that a durable stream per session would be a much neater architectural foundation for much of what you'd want from such a tool. As a unique feature, it also made live tailing straightforward because the player can read from the same stream the recorder is appending to.
The alternative architecture is likely an ingest firehose which is then indexed, with associated complexity and latency. You'd have to string together multiple data systems like a message queue, a metadata database, and blob storage and/or an OLAP database.
Here the only dependency is S2, which has an open source version you can self-host called s2-lite (https://news.ycombinator.com/item?id=46708055).
How it works:
- one S2 stream per browser session
- large rrweb events (like a full snapshot) get framed across multiple binary S2 records and reassembled on read
- active sessions are tailed with an S2 read session, and bridged to the browser over SSE
- session listing relies on stream names encoding reverse timestamps, as S2 returns a lexicographic order listing
- relying on fencing tokens so a stopped session can't be written to again by a late recorder
- retention and GC are handled via S2 stream config, so no background job needed
Curious to hear from folks on the tool or the stream-per-session model!
How does this compare to e.g. OpenReplay, which looks quite similar?
OpenReplay is much more mature and full-featured, RePlaya is just the core session capture, listing, and replay functionality. OpenReplay has more dependencies, so self-hosting means running a full stack: Postgres, ClickHouse, Redis, and its backend services. RePlaya is one stateless Node process plus S2 (or self-hosted s2-lite).
very cool. session replays are so crucial for understanding new features qualitatively, especially before you have enough users for useful quantitative metrics.
ballpark, how much does this cost to run?
Thanks! And agreed, session replays can be really useful to understand user behaviour such as product edge cases.
On cost, it's running the collector Node app (I'd expect a few $ per month at low volume), and the S2 stream backend.
If you use the S2 cloud service, cost is basically just the rrweb bytes. The rates are $0.075/GiB to write, $0.05/GiB-month to store, $0.10/GiB to read back over the internet. See s2.dev/pricing.md for an agent-friendly summary.
Assuming a typical few-minute session is ~1 MiB of events, ingesting it, storing it a month, and replaying it a couple of times (unlikely!):
1k sessions/mo ≈ $0.35
10k sessions/mo ≈ $3.50
100k sessions/mo ≈ $35