Morphium 6.2.5: The One-Document Batch That Cost 20 Seconds
Morphium 6.2.5 is out on Maven Central. On paper it's a quiet patch release — a handful of fixes, a couple of small features. But one of those fixes came from a bug that's worth telling, because it's the kind that hides in plain sight until exactly the wrong conditions line up.
The 20-second mystery
The symptom was maddeningly intermittent: every so often, a sendAndAwaitAnswers() call in a messaging-heavy service would take twenty seconds to come back, instead of the usual handful of milliseconds. No error, no timeout, no stack trace — just a request that occasionally fell off a cliff and then recovered on its own.
It never reproduced locally. On localhost everything was instant. The only place it showed up was over a high-latency link — in this case a service reaching its MongoDB through an SSH/SOCKS tunnel with a few tens of milliseconds of round-trip time.
We ended up measuring it live against production, and the root cause turned out to be a single hardcoded number.
batchSize = 1
Morphium's messaging and watch() are built on MongoDB change streams. The change stream cursor was created with a getMore batch size of 1 — exactly one event per round-trip to the server.
At low latency that's invisible. At low traffic it's invisible too. But combine a busy stream with a slow link and the maths turns ugly: with one event per round-trip, the stream can deliver at most one event per RTT. At 30 ms RTT that's about 33 events per second — and if messages arrive faster than that, the cursor builds a backlog it can only drain at 1/RTT. The events still arrive, just later and later, until traffic dies down and the cursor finally catches up. The answers awaited by sendAndAwaitAnswers() were sitting in exactly that backlog. Hence: twenty seconds, then fine again.
The reason batchSize=1 was there in the first place is the best part. It was a deliberate workaround for a multi-document-batch hang in the old watch() implementation — load-bearing at the time. After the change stream rewrite in 6.x, that hang no longer reproduces. The workaround had quietly outlived the problem it was solving.
The fix
The batch size is now configurable, and defaults to 100 instead of 1:
cfg.driverSettings().setChangeStreamBatchSize(500);   // default is 100
ChangeStreamMonitor mon = new ChangeStreamMonitor(morphium);
mon.setBatchSize(500);
mon.start();
Because awaitData returns as soon as the first event is available, a larger batch adds no latency at all when traffic is low — a single round-trip simply gets to drain many backlogged events when traffic is high. The effective batch is still bounded by MongoDB's ~16 MB per-reply limit regardless of the configured count.
The lesson, if there is one: watch out for load-bearing defaults. A value chosen to dodge a long-gone bug can sit untouched for years and then turn into a production incident the moment the environment changes around it.
Everything else in 6.2.5
The rest of the release is a solid round of fixes and a couple of features:
Atlas / DNS
mongodb+srv://URLs now resolve the companion TXT seedlist record, soauthSourceandreplicaSetfrom Atlas are picked up automatically instead of having to be set by hand (#169). Explicit configuration always wins.- The SRV resolver now only falls back to public DNS (8.8.8.8 / 1.1.1.1) as a genuine last resort — when system name-servers exist they're treated as authoritative. This fixes wrong results and per-server timeouts in split-DNS / private-Atlas / firewalled setups (#170).
Quarkus & native images
- New
ClassGraphCache.preRegisterClassesWithAnnotation()lets frameworks that already know their annotated classes at build time (e.g. the quarkus-morphium extension via Jandex) inject them and skip the runtime ClassGraph scan entirely — essential for native images, where a live scan finds nothing (#200).
InMemoryDriver
- Upsert now seeds the new document from equality predicates nested inside
$and, matching real MongoDB. Previously a filter like{$and:[{_id:"lock"},{expires_at:{$lte:now}}]}got a generatedObjectId, so a laterdelete({_id:"lock"})never matched — a real lock-leak we hit in a migration runner (#201). $exprqueries that use aggregation expression operators (e.g.$dateFromString) are no longer wrongly rejected as "unknown operator".
Aggregation & mapping
- Field-name translation now also covers
unset(Enum...)and thelookupforeignField, closing the last camelCase→snake_case gaps in the aggregation builders (#198). BigDecimalMappernow tolerates values that MongoDB returns asInteger/Longinstead of throwing aClassCastException.
Upgrade
    <groupId>de.caluga</groupId>
    <artifactId>morphium</artifactId>
    <version>6.2.5</version>
</dependency>
6.2.5 is a drop-in upgrade from 6.2.x — no API changes. Full notes are on the GitHub release.
As always: feedback, issues and PRs are welcome on GitHub. Happy coding!