Staff Engineer Onboarding Guide
Audience: Staff / Principal / Lead Engineers evaluating or adopting Wow for production systems. Version: Wow 8.3.8 (Spring Boot 4.x, Kotlin 2.3, JVM 17+)
TL;DR
Wow is a compiler-driven, fully reactive CQRS + Event Sourcing framework. Its distinguishing characteristic: all command routing, event handler registration, and API documentation are generated at compile time via KSP — zero runtime reflection. The framework achieves ~60,000 TPS (SENT wait mode) and ~18,000 TPS (PROCESSED wait mode) on commodity hardware with MongoDB + Redis + Kafka. It is optimized for single-aggregate throughput with a LocalFirst routing strategy that prioritizes in-JVM dispatch. You will trade Axon's ecosystem maturity for Wow's raw throughput, compile-time safety, and a unified message bus abstraction that eliminates the command/event impedance mismatch.
| Capability | Verdict | Source |
|---|---|---|
| Compile-time command routing | KSP generates CommandAggregate metadata, no reflection at runtime | wow-compiler |
| Write throughput (SENT mode) | ~60k TPS (AddCartItem), ~48k TPS (CreateOrder) | README.md:70-74 |
| Write throughput (PROCESSED mode) | ~18k TPS for both AddCartItem and CreateOrder | README.md:76-98 |
| Event store backends | MongoDB, Redis, R2DBC (MariaDB/PostgreSQL), In-Memory | settings.gradle.kts:27-30 |
| Message bus backends | Kafka (distributed), Redis (distributed), In-Memory (local) | settings.gradle.kts:27-30 |
| Projection stores | Elasticsearch, MongoDB, R2DBC, CoCache (caching layer) | settings.gradle.kts:24-31 |
| Saga support | Stateless sagas with compile-time event-to-command mapping | wow-core saga |
| Compensation | First-class dashboard + retry engine for failed events | compensation/ |
| Test coverage enforcement | 80% minimum (jacoco), Given-When-Expect DSL | CLAUDE.md:93 |
1. The Core Architectural Insight
Wow treats the WaitStrategy as a first-class architectural primitive, creating a push-based notification pipeline that bridges the command and event buses in a single reactive chain.
Most CQRS frameworks treat "waiting for a command result" as an afterthought — a blocking Future.get() or a polling loop. Wow inverts this: the WaitStrategy is a reactive sink (Sinks.Many<WaitSignal>) that propagates through every stage of the processing lifecycle via push notifications.
Why This Matters
In a traditional CQRS setup:
- Commands go to a command bus (ordering not guaranteed per aggregate)
- Events go to an event bus (ordered per aggregate)
- The caller must poll a read model or correlation ID table to know when processing is done
Wow unifies these concerns:
- Both commands and events flow through the same
MessageBus<M, E>abstraction with the sameLocalFirstrouting optimization - The
WaitStrategybridges the two pipelines — command sender subscribes to a reactive signal stream that downstream processors (aggregate, projector, saga handler) push into - Six distinct stages (
SENT,PROCESSED,SNAPSHOT,PROJECTED,EVENT_HANDLED,SAGA_HANDLED) form a DAG with prerequisite dependencies, enabling partial-stage waiting
The result: a caller can sendAndWaitForSent() at ~60k TPS (fire-and-forget into the bus) or sendAndWaitForProcessed() at ~18k TPS (full chain including aggregate execution, event publishing, projections, and sagas — all pushed back reactively).
Second-Order Insight: Compile-Time Metadata Generation
The wow-compiler KSP processor scans annotations (@OnCommand, @OnEvent, @AggregateRoot, @StatelessSaga, etc.) and generates:
CommandAggregateimplementations — maps command types to handler methods, eliminating runtime dispatch overhead- Event processor metadata — knows which events trigger which projection/saga handlers
- OpenAPI route registrations — automatically registers
HandlerFunctionbeans for each@CommandRoute-annotated method
This means at runtime you have a pre-computed routing table, not a reflection-based dispatch. The compiler output is what the wow-core engine reads.
2. Architecture Pseudocode (Python)
This is the core CQRS+ES flow as it would be expressed in a Python-like pseudocode. It shows the essential pattern, not Kotlin-specific syntax.
# Architecture pseudocode -- NOT runnable, shows the reactive pipeline
class WowEngine:
"""The processing pipeline: Command -> Validate -> Load Aggregate ->
Execute -> Persist Events -> Publish Events -> Project -> Saga"""
def handle_command(self, command_msg: CommandMessage) -> Mono[CommandResult]:
# 1. IDEMPOTENCY CHECK (Bloom filter or DB check)
if self.idempotency_checker.is_duplicate(command_msg.request_id):
raise DuplicateRequestIdException(command_msg)
# 2. VALIDATE (Bean Validation + optional self-validation)
self.validator.validate(command_msg.body)
# 3. LOAD AGGREGATE (snapshot + event sourcing)
aggregate = await self.load_aggregate(command_msg.aggregate_id)
# Internally: snapshot_repo.load(agg_id)
# -> if snapshot exists: deserialize state
# -> else: create new state from factory
# -> event_store.load(agg_id, from_version=snapshot.version+1)
# -> apply each DomainEventStream to rebuild state
# 4. EXECUTE COMMAND against the loaded aggregate
command_handler = self.routing_table[type(command_msg.body)] # pre-computed by KSP
events = aggregate.execute(command_handler, command_msg)
# Events are returned, NOT state mutation. The aggregate is a pure function:
# f(State, Command) -> List[DomainEvent]
# 5. APPEND EVENTS to event store (atomic, version-checked)
event_stream = DomainEventStream(
aggregate_id=command_msg.aggregate_id,
version=aggregate.next_version,
events=events,
request_id=command_msg.request_id
)
await self.event_store.append(event_stream)
# Failure here -> EventVersionConflictException (optimistic concurrency)
# or DuplicateRequestIdException (idempotency)
# 6. SOURCE EVENTS onto aggregate state
aggregate.apply_events(event_stream)
# 7. SNAPSHOT (if snapshot strategy triggers)
if self.snapshot_strategy.should_snapshot(aggregate.version):
await self.snapshot_repo.save(aggregate.snapshot())
# 8. PUBLISH EVENT STREAM to domain event bus
await self.domain_event_bus.send(event_stream)
# LocalFirst routing: deliver to local subscribers first,
# then fan-out to distributed (Kafka/Redis) subscribers
# 9. PROJECT read models (async, via event bus subscription)
# projection_handler subscribes to DomainEventBus, updates read models
# Supported backends: Elasticsearch, MongoDB, R2DBC, CoCache
# 10. SAGA ORCHESTRATION (async, via event bus subscription)
# saga_handler subscribes to DomainEventBus, emits compensating commands
# Sagas are stateless: listen for events, produce commands
return CommandResult(...)
class WaitStrategy_Flow:
"""How the WaitStrategy creates push-based notification pipeline"""
def send_with_wait(self, command, wait_strategy):
# Register the wait strategy (maps wait_command_id -> sink)
self.wait_registrar.register(wait_strategy)
# Propagate wait endpoint address into message header
wait_strategy.propagate(self.endpoint, command.header)
# Send command to the bus
self.command_bus.send(command)
# Each downstream processor calls: wait_notifier.notify(wait_strategy_id, signal)
# The wait strategy sink receives signals reactively:
# SENT -> PROCESSED -> PROJECTED -> EVENT_HANDLED -> SAGA_HANDLED -> [complete]
# Caller receives either:
# - A Mono[CommandResult] (single terminal result)
# - A Flux[CommandResult] (streaming results as stages complete)
return wait_strategy.waiting_last()3. Complete System Architecture
This diagram shows the full architectural surface area — all modules, their relationships, and the data flow topology.
Module Map
| Module | Responsibility | Layer | Source |
|---|---|---|---|
wow-api | Pure API contracts: CommandMessage, DomainEvent, AggregateId, all annotations | Foundation | settings.gradle.kts:21 |
wow-core | Framework engine: command/event bus, event store abstractions, projections, sagas, wait strategies | Engine | settings.gradle.kts:22 |
wow-compiler | KSP processor: generates command routing, event metadata, OpenAPI specs at compile time | Dev-time | settings.gradle.kts:26 |
wow-spring | Spring IoC integration: SpringServiceProvider bridge | Integration | settings.gradle.kts:32 |
wow-spring-boot-starter | Auto-configuration with feature variants (mongo-support, kafka-support, etc.) | Integration | settings.gradle.kts:34 |
wow-webflux | WebFlux command endpoint auto-registration | API Layer | settings.gradle.kts:33 |
wow-kafka | Distributed command/event bus via Kafka | Messaging | settings.gradle.kts:27 |
wow-mongo | MongoDB event store + snapshot store | Storage | settings.gradle.kts:28 |
wow-redis | Redis event store + snapshot store | Storage | settings.gradle.kts:30 |
wow-r2dbc | R2DBC event store (MariaDB/PostgreSQL) | Storage | settings.gradle.kts:29 |
wow-elasticsearch | Elasticsearch projection store | Storage | settings.gradle.kts:31 |
wow-test | Unit testing DSL: AggregateSpec, SagaSpec (Given-When-Expect) | Dev-time | settings.gradle.kts:44-45 |
wow-cosec | ABAC authorization framework | Cross-cutting | settings.gradle.kts:40 |
wow-opentelemetry | Tracing + metrics via OpenTelemetry | Cross-cutting | settings.gradle.kts:35 |
compensation/* | Event compensation orchestrator + dashboard | Cross-cutting | settings.gradle.kts:56-63 |
wow-cocache | CoCache-based projection caching layer | Storage | settings.gradle.kts:24 |
4. Command Processing Data Flow
This sequence diagram traces the full lifecycle of a command from API call to final acknowledgment. Every stage is tracked by the WaitStrategy.
Key Observations from the Flow
Optimistic concurrency is the only locking model. The
EventStore.append()checks that the aggregate version has not changed since loading. If there's a conflict,EventVersionConflictExceptionis thrown. There is no distributed lock. Retry must happen at a higher level.The
LocalFirstdecision happens at the bus layer, not the gateway. Both command and event buses share the identicalLocalFirstMessageBuspattern (LocalFirstMessageBus.kt:89-171). If the aggregate is local AND has subscribers, the message is delivered in-process first; a copy is then sent to the distributed bus for other instances.Projections and sagas are decoupled from the command path. They subscribe to the
DomainEventBusas independent consumers. This is standard CQRS. TheWaitStrategyre-couples them for the purpose of client notification only.
5. Event Store Scalability Model
The event store scales along two axes: storage backend choice and aggregate sharding.
Scalability Model Summary
| Dimension | Mechanism | Limitation | Source |
|---|---|---|---|
| Per-aggregate write throughput | Single aggregate is serialized by optimistic concurrency; no parallel writes to same aggregate | Hot aggregates will bottleneck | EventStore.kt:38-43 |
| Multi-aggregate write scaling | AggregateIdSharding routes different aggregates to different storage shards (e.g., CosId snowflake -> hash -> shard) | Shard distribution quality depends on ID generation scheme | AggregateIdSharding.kt:82-104 |
| Read scaling (long aggregates) | SnapshotRepository stores periodic state snapshots; replay only events since last snapshot | Snapshot strategy must be tuned per aggregate type | VersionOffsetSnapshotStrategy.kt |
| Event bus fan-out | LocalFirst routes to local consumers first (zero network hop), then distributes via Kafka/Redis | Kafka partition ordering must align with aggregate ID to preserve per-aggregate order | LocalFirstMessageBus.kt:129-170 |
| Storage backend choice | MongoDB (document model fits events naturally), Redis (highest throughput, in-memory), R2DBC (SQL, operational familiarity) | Each has different latency/throughput/ops profiles | settings.gradle.kts:27-30 |
6. Comparison with Alternatives
Side-by-Side: Wow vs Axon Framework vs Eventuate vs Manual CQRS+ES
| Dimension | Wow (8.3.8) | Axon Framework (4.x) | Eventuate Tram | Manual (DIY) |
|---|---|---|---|---|
| Language / Platform | Kotlin 2.3, JVM 17+, reactive (Project Reactor) | Java/Kotlin, supports both blocking and reactive | Java, Spring Boot | Any |
| Command routing | KSP compile-time code generation; zero reflection | Runtime annotation scanning + reflection | Runtime annotation scanning | Manual wiring |
| Event sourcing | First-class: EventStore with 4 backend choices | First-class: Axon Server or JPA/JDBC | CDC-based (Debezium) or JDBC polling | Manual event tables + bus |
| Message bus | Unified MessageBus<M,E> with LocalFirst routing | Separate CommandBus + EventBus abstractions | Separate command/event channels | Kafka/RabbitMQ manual wiring |
| Wait/notification | Push-based WaitStrategy chain (6 stages) with reactive sinks | CommandCallback on send; SubscriptionQuery for streaming | Polling CommandReplyOutcome table | Custom correlation ID polling |
| Snapshot strategy | VersionOffsetSnapshotStrategy — snapshot every N versions | Configurable trigger (version count, time) | Snapshot via event upcaster | Manual |
| Saga / Process Manager | Stateless sagas: compile-time event-to-command mapping via KSP | Stateful Saga with @SagaEventHandler and @EndSaga | Saga with event handlers and command producers | Custom orchestration |
| Testing | AggregateSpec / SagaSpec with Given-When-Expect DSL + fork support | AggregateTestFixture / SagaTestFixture with Given-When-Then | Manual test setup | Manual |
| OpenAPI generation | Automatic via KSP from @CommandRoute annotations | Manual or via custom plugin | Manual | Manual |
| Metrics / Observability | OpenTelemetry native (decorator chain on all buses/handlers) | Axon metrics + Micrometer | Custom | Custom |
| Compensation | First-class dashboard + retry engine | Dead letter queue via Axon Server | Manual | Manual |
| Ecosystem maturity | Younger (2021+), smaller community | Mature (2010+), large community, AxonIQ commercial support | Moderate (Eventuate commercial) | N/A (you build it) |
| Learning curve | Requires DDD + ES + reactive + Kotlin KSP concepts | Requires DDD + ES; Axon Server shields complexity | Requires CDC understanding | Full ownership |
When to Choose Wow
- You are building high-throughput write microservices (~60k TPS per node target)
- Your team is comfortable with Kotlin, Project Reactor, and KSP
- You want compile-time safety for all command routing and event handling
- You already operate Kafka, MongoDB, or Redis and want to use them as event stores
- You value a unified programming model (commands and events share the same bus abstraction)
When to Choose Axon
- You need the Axon Server ecosystem (dead letter queue management, event store administration UI)
- Your team is Java-dominant and prefers blocking programming models
- You need third-party commercial support (AxonIQ)
- You are building systems where the framework documentation and community support are critical decision factors
7. Design Tradeoff Analysis
What Wow Optimizes For
| Optimization | How | Cost |
|---|---|---|
| Single-aggregate write throughput | In-memory bus dispatch (local subscribers), snapshot-based loading, no distributed lock | Hot aggregate serialization creates a hard ceiling per aggregate ID |
| Compile-time safety | KSP generates routing tables, eliminating runtime class scanning and reflection | Requires wow-compiler KSP dependency in every domain module; incremental compilation overhead |
| Unified messaging | MessageBus<M,E> for both commands and events; LocalFirst shared by both | The abstraction is generic — debugging distributed routing requires understanding LocalFirst logic |
| Developer experience | AggregateSpec / SagaSpec DSL, @OnCommand / @OnEvent annotations, auto-registered WebFlux routes | Developers must internalize the "command returns events, not state mutation" paradigm |
| Reactive end-to-end | Project Reactor throughout; no blocking anywhere in the hot path | Every team member must understand Mono/Flux semantics; debugging reactive stacks is harder |
| Void commands | Commands can be marked isVoid — the aggregate produces no events and no state change | Void commands skip the LocalFirst optimization because there is nothing to wait for (LocalFirstCommandBus.kt:41-46) |
What Wow Sacrifices
| Sacrifice | Impact | Mitigation |
|---|---|---|
| No multi-aggregate transaction | Cross-aggregate consistency requires sagas with eventual consistency | First-class saga support with compensation dashboard for monitoring failure |
| No event store administration UI | MongoDB/Redis event store must be managed via native tools | Compensation dashboard covers the operational gap for failed events |
| Kotlin/JVM lock-in | KSP compiler, Flow/Mono usage, Kotlin data classes for events | Java can use Wow via the example-transfer Java example, but the ergonomics are reduced |
| No blocking programming model | All command/event paths are reactive; mixing blocking code in handlers causes thread pool starvation | @Blocking annotation support for CPU-bound handlers |
| Optimistic concurrency retries are the caller's responsibility | If EventVersionConflictException occurs, the caller (not the framework) must retry | Idempotency via requestId deduplication ensures safe retry |
| Sharding is manual | AggregateIdSharding must be configured per aggregate type; no automatic rebalancing | CosId snowflake IDs provide good distribution; static shard maps are common in practice |
Dependency coupling analysis
| Dependency Direction | Nature | Risk Level |
|---|---|---|
wow-api (no dependencies) | Pure contracts — zero external dependencies beyond Kotlin stdlib | None |
wow-core -> wow-api | Engine depends on contracts; hard coupling | Low — the API is the stable contract |
wow-core -> Project Reactor | Reactive streams are the core abstraction; deeply embedded | Medium — Reactor API changes would be breaking |
wow-core -> CosId | ID generation for aggregate IDs and global IDs | Low — GlobalIdGenerator is a pluggable interface (GlobalIdGenerator.kt) |
wow-spring-boot-starter -> all backends | Feature variants (mongo-support, kafka-support, etc.) create optional coupling | Low — Gradle feature variants isolate backend dependencies |
wow-compiler -> wow-api annotations | KSP reads annotations from wow-api to generate code | Low — versioned annotation contracts |
8. Performance Model
Benchmark Results (from example/ performance tests)
All benchmarks run with MongoDB + Redis + Kafka on Kubernetes (README.md:56-98).
| Operation | Wait Mode | Avg TPS | Peak TPS | Avg Latency | Source |
|---|---|---|---|---|---|
| AddCartItem | SENT | 59,625 | 82,312 | 29 ms | README.md:70-74 |
| AddCartItem | PROCESSED | 18,696 | 24,141 | 239 ms | README.md:76-80 |
| CreateOrder | SENT | 47,838 | 86,200 | 217 ms | README.md:88-92 |
| CreateOrder | PROCESSED | 18,230 | 25,506 | 268 ms | README.md:94-98 |
Performance Interpretation
SENT mode is 3x faster than PROCESSED. This is because SENT only waits for the command to be accepted by the bus. No aggregate loading, event persistence, or projection happens before the response. This is the "fire-and-forget" pattern — useful when the caller does not need confirmation.
CreateOrder (SENT) has higher latency than AddCartItem (SENT) despite similar TPS. CreateOrder involves more complex validation (external
CreateOrderSpecspecification service is injected into the handler, callingspecification.require(it)reactively — see Order.kt:106-138). This is a design choice, not a framework limitation.Peak TPS can spike 30-80% above average. The reactive pipeline handles bursts well because backpressure is managed by Reactor's
Sinks.Manyinfrastructure.The bottleneck for PROCESSED mode is the event store append latency. MongoDB insert latency dominates at ~18k TPS. Switching to Redis or tuning MongoDB write concerns can raise this ceiling.
JMH microbenchmarks exist separately. The
wow-benchmarksmodule contains JMH benchmarks forCommandGateway,InMemoryCommandBus,InMemoryEventStore, and backend-specific benchmarks (MongoDB, Redis). These measure raw framework overhead, not end-to-end throughput.
TPS Scaling Model (per node)
| Component | Approximate ceiling | Limiting factor |
|---|---|---|
| In-memory command bus dispatch | >500,000 ops/s | JVM throughput; not the bottleneck |
| MongoDB event store append | ~20,000 ops/s | MongoDB single-document insert latency |
| Redis event store append | ~50,000 ops/s | Redis ZADD latency |
| Kafka message publish | >100,000 msgs/s | Kafka partition throughput |
| Snapshot generation | ~50,000 ops/s | Serialization + store write |
| Projection update (Elasticsearch) | ~10,000 ops/s | ES indexing throughput |
9. Risk Assessment
What Breaks and How
| Failure Mode | Likelihood | Impact | Detection | Mitigation | Source |
|---|---|---|---|---|---|
| Event version conflict | High (under concurrent writes to same aggregate) | Command rejected; caller must retry | EventVersionConflictException in logs | Idempotent retry via requestId; aggregate redesign to reduce write contention | AbstractEventStore.kt:40-52 |
| Duplicate command processing | Low (with Bloom filter or DB check) | Idempotent processing or business logic duplication | DuplicateRequestIdException | AggregateIdempotencyChecker with Bloom filter or DB-backed check | DefaultCommandGateway.kt:77-88 |
| Snapshot corruption | Low | Aggregate loaded from stale/broken state on restart | Unexpected state after replay | Snapshot is optional; event replay always authoritative; repair by deleting snapshot and replaying | EventSourcingStateAggregateRepository.kt:82-86 |
| Kafka partition leader failure | Medium (in production) | Delayed event delivery; potential message loss if acks=0 | Kafka consumer lag metrics | Configure acks=all, min.insync.replicas=2 | KafkaAutoConfiguration.kt |
| Event store backend outage | Medium | All writes blocked; reads from cached projections may continue to serve | Health check on MongoDB/Redis/R2DBC connection | Multi-region replica sets; circuit breaker in reactive pipeline | EventStoreAutoConfiguration.kt |
| Projection drift | Medium | Read models diverge from event store truth | Automated reconciliation; compensation dashboard tracks failed events | StateEventCompensator / DomainEventCompensator replay failed events; Dashboard for manual retry | compensation/ core |
| Saga failure (no compensation) | Medium | Distributed transaction left in inconsistent state | Compensation dashboard shows saga failures | Stateless sagas emit compensating commands on failure; dashboard allows manual retry with spec | StatelessSagaHandler.kt |
| WaitStrategy memory leak | Medium | Unbounded Sinks.Many accumulation if waiters never disconnect | waitingLast() timeout; JMX metrics on active waiters | WaitingFor uses Sinks.unsafe().many().unicast().onBackpressureBuffer() — if the sink is never completed, memory grows | WaitingFor.kt:38-39 |
| JVM garbage collection pause | Low (with G1GC tuning) | Tail latency spike; reactive timeouts triggered | GC logs; OpenTelemetry JVM metrics | -Xmx2g minimum (gradle.properties:15); production tuning required |
Security Considerations
- ABAC authorization is provided via
wow-cosec(AbbacTagsMerger.kt). Commands carry@ApplyAbacTagsmetadata that the CoSec engine evaluates. - Tenant isolation is built-in via
TenantId. Every aggregate ID carries a tenant ID. The@StaticTenantIdannotation (Cart.kt:34) can fix the tenant at compile time. - Operator tracking is captured via
OperatorCapableon events — every event records who initiated the operation.
10. Architectural Decision Log
When adopting Wow, use this format to document architectural decisions. Each decision should record what was considered and why.
Template
## ADR-{NNN}: {Title}
**Date**: YYYY-MM-DD
**Status**: Proposed | Accepted | Deprecated | Superseded
**Context**: What is the problem we are solving?
**Decision**: What did we decide?
**Alternatives Considered**:
- Alternative 1: Why rejected
- Alternative 2: Why rejected
**Consequences**:
- Positive: What we gain
- Negative: What we trade off
- Mitigations: How we manage the negativeExample Decisions When Adopting Wow
ADR-001: Event Store Backend
- Context: Choose between MongoDB, Redis, and R2DBC as the primary event store.
- Decision: MongoDB for production (document model fits event streams, operational maturity). Redis for high-throughput services that can tolerate data loss on restart (if not configured for persistence).
- Alternatives: R2DBC (SQL familiarity but less natural for event append patterns). Redis (highest throughput but higher operational complexity for durability).
- Consequences: MongoDB requires replica set for production durability. Single-document insert latency is the throughput ceiling (~20k TPS per node).
ADR-002: Wait Strategy Mode
- Context: Choose between
SENTandPROCESSEDwait modes for command endpoints. - Decision:
SENTfor high-throughput ingestion endpoints (e.g., shopping cart additions, analytics events).PROCESSEDfor synchronous business confirmations (e.g., order creation where caller needs order ID + computed fields). - Alternatives: Always
PROCESSED(safer but 3x lower throughput). AlwaysSENT(faster but caller must implement separate confirmation polling). - Consequences: A single aggregate may expose different endpoints with different wait modes. API contracts must document which mode each endpoint uses.
ADR-003: Snapshot Strategy
- Context: Configure
SnapshotStrategyto balance load time vs. storage overhead. - Decision:
VersionOffsetSnapshotStrategy(offset=50)— snapshot every 50 versions. For aggregates that rarely exceed 50 events in their lifetime, snapshots may never trigger (acceptable). - Alternatives: Snapshot every version (too much storage overhead). No snapshots (slow load for long-lived aggregates).
- Consequences: Load time is O(snapshot_deserialize + events_since_snapshot). Tune the offset based on per-aggregate event frequency.
ADR-004: Projection Store Choice
- Context: Choose read model store for projections.
- Decision: Elasticsearch for search-heavy read models. MongoDB for simple key-value projections. CoCache for caching hot projections.
- Alternatives: Single store for all projections (simpler ops but suboptimal for different query patterns).
- Consequences: Multiple projection stores increase operational surface area. Projection handlers must be aware of which store they write to.
ADR-005: Sharding Strategy
- Context: Configure aggregate sharding for multi-node deployments.
- Decision: Use
CosIdShardingDecoratorwith Modulo sharding over N nodes, keyed on CosId snowflake ID. Aggregate IDs use CosIdSnowflakeIdGeneratorwhich provides naturally distributed IDs. - Alternatives: Manual node-per-aggregate-type sharding (
SingleAggregateIdSharding). Consistent hashing (more complex rebalancing). - Consequences: Adding nodes changes the modulo, requiring data migration. Pre-allocate enough shards (e.g., 64 shards mapped to 3 nodes initially) to allow rebalancing without data movement.
11. Production Readiness Checklist
| Capability | Status | Notes |
|---|---|---|
| Event store backup/restore | Depends on backend | MongoDB: use mongodump. Redis: use RDB/AOF. R2DBC: use DB-native tools. |
| Event store migration / versioning | EventUpgrader | Events have a @Revision annotation; EventUpgrader can transform old event shapes to new ones. See EventUpgrader.kt. |
| Horizontal scaling | Via Kafka consumer groups | Each Wow instance subscribes to Kafka topics for distributed commands/events. LocalFirst ensures co-located aggregates are processed in-process. |
| Graceful shutdown | Configurable | wow.shutdown-timeout (default: 60s) — WowProperties.kt:29. Drains in-flight commands before stopping. |
| Health checks | Spring Boot Actuator | Standard Spring Boot health endpoints. Backend-specific health indicators for MongoDB, Redis, Kafka. |
| Metrics export | OpenTelemetry | All buses, handlers, event stores, and projection dispatchers are wrapped in metric decorators. See Metrics.kt directory. |
| Distributed tracing | OpenTelemetry | Trace context propagates through CommandRequestHeaderPropagator and local-first header propagation. See propagation package. |
| Rate limiting | Not built-in | Must be implemented at the API gateway or via custom Filter in the FilterChain. |
| Circuit breaking | Not built-in | Reactor's Mono.retryWhen() can be used. Backend connection errors propagate as reactive errors. |
| Audit log | Event sourcing inherently provides audit | Every state change is recorded as a domain event with operator, timestamp, and request ID. |
12. Getting Started as a Staff Engineer
First Week: Understand the Flow
- Read the Order aggregate at Order.kt. This is the canonical example that shows: command validation, external service injection, event composition, error handling, and multi-event returns.
- Read the Cart aggregate at Cart.kt. Simpler example with state-based conditional logic.
- Trace the command path in DefaultCommandGateway.kt:
check()->propagate()->send()->next(). - Understand the aggregate loading in EventSourcingStateAggregateRepository.kt: snapshot -> create -> replay events.
- Study the event bus in DomainEventBus.kt: ordered publication, ordered processing, the contract.
Second Week: Deepen on the Engine
- Read the
LocalFirstMessageBusat LocalFirstMessageBus.kt. This is the most important implementation detail for understanding distributed vs local routing. - Study the
WaitStrategychain at WaitingFor.kt and CommandStage.kt. Understand how stages form a DAG. - Examine the auto-configuration at WowAutoConfiguration.kt and all sub-configurations. This is how the framework wires itself into Spring Boot.
- Review test examples at OrderTest.kt (aggregate tests) and CartSagaTest.kt (saga tests). The Given-When-Expect DSL with
forksupport is powerful. - Run the benchmark suite:
./gradlew wow-benchmarks:jmhto see raw framework overhead numbers.
Related Pages
| Page | Relevance |
|---|---|
| Architecture Guide | High-level architecture overview for developers |
| Configuration Reference | Full wow.* configuration property reference |
| Command Configuration | Command bus, gateway, wait strategy configuration |
| Event Configuration | Event bus, dispatcher configuration |
| Event Sourcing Configuration | Event store, snapshot, state configuration |
| Saga Guide | Stateless saga development guide |
| Event Compensation | Compensation orchestrator and dashboard |
| Testing Guide | AggregateSpec/SagaSpec DSL reference |
| Architecture Deep Dive | Detailed component internals |
| Event Store Deep Dive | Event store design and performance characteristics |
| Example: Order Service | Complete Order aggregate with saga and projection |
| Example: Transfer Service | Java-based simple event sourcing example |