Staff Engineer Onboarding Guide

Audience: Staff / Principal / Lead Engineers evaluating or adopting Wow for production systems. Version: Wow 8.3.8 (Spring Boot 4.x, Kotlin 2.3, JVM 17+)

TL;DR

Wow is a compiler-driven, fully reactive CQRS + Event Sourcing framework. Its distinguishing characteristic: all command routing, event handler registration, and API documentation are generated at compile time via KSP — zero runtime reflection. The framework achieves ~60,000 TPS (SENT wait mode) and ~18,000 TPS (PROCESSED wait mode) on commodity hardware with MongoDB + Redis + Kafka. It is optimized for single-aggregate throughput with a LocalFirst routing strategy that prioritizes in-JVM dispatch. You will trade Axon's ecosystem maturity for Wow's raw throughput, compile-time safety, and a unified message bus abstraction that eliminates the command/event impedance mismatch.

Capability	Verdict	Source
Compile-time command routing	KSP generates `CommandAggregate` metadata, no reflection at runtime	wow-compiler
Write throughput (SENT mode)	~60k TPS (AddCartItem), ~48k TPS (CreateOrder)	README.md:70-74
Write throughput (PROCESSED mode)	~18k TPS for both AddCartItem and CreateOrder	README.md:76-98
Event store backends	MongoDB, Redis, R2DBC (MariaDB/PostgreSQL), In-Memory	settings.gradle.kts:27-30
Message bus backends	Kafka (distributed), Redis (distributed), In-Memory (local)	settings.gradle.kts:27-30
Projection stores	Elasticsearch, MongoDB, R2DBC, CoCache (caching layer)	settings.gradle.kts:24-31
Saga support	Stateless sagas with compile-time event-to-command mapping	wow-core saga
Compensation	First-class dashboard + retry engine for failed events	compensation/
Test coverage enforcement	80% minimum (jacoco), Given-When-Expect DSL	CLAUDE.md:93

1. The Core Architectural Insight

Wow treats the WaitStrategy as a first-class architectural primitive, creating a push-based notification pipeline that bridges the command and event buses in a single reactive chain.

Most CQRS frameworks treat "waiting for a command result" as an afterthought — a blocking Future.get() or a polling loop. Wow inverts this: the WaitStrategy is a reactive sink (Sinks.Many<WaitSignal>) that propagates through every stage of the processing lifecycle via push notifications.

Why This Matters

In a traditional CQRS setup:

Commands go to a command bus (ordering not guaranteed per aggregate)
Events go to an event bus (ordered per aggregate)
The caller must poll a read model or correlation ID table to know when processing is done

Wow unifies these concerns:

Both commands and events flow through the same MessageBus<M, E> abstraction with the same LocalFirst routing optimization
The WaitStrategy bridges the two pipelines — command sender subscribes to a reactive signal stream that downstream processors (aggregate, projector, saga handler) push into
Six distinct stages (SENT, PROCESSED, SNAPSHOT, PROJECTED, EVENT_HANDLED, SAGA_HANDLED) form a DAG with prerequisite dependencies, enabling partial-stage waiting

The result: a caller can sendAndWaitForSent() at ~60k TPS (fire-and-forget into the bus) or sendAndWaitForProcessed() at ~18k TPS (full chain including aggregate execution, event publishing, projections, and sagas — all pushed back reactively).

Second-Order Insight: Compile-Time Metadata Generation

The wow-compiler KSP processor scans annotations (@OnCommand, @OnEvent, @AggregateRoot, @StatelessSaga, etc.) and generates:

CommandAggregate implementations — maps command types to handler methods, eliminating runtime dispatch overhead
Event processor metadata — knows which events trigger which projection/saga handlers
OpenAPI route registrations — automatically registers HandlerFunction beans for each @CommandRoute-annotated method

This means at runtime you have a pre-computed routing table, not a reflection-based dispatch. The compiler output is what the wow-core engine reads.

2. Architecture Pseudocode (Python)

This is the core CQRS+ES flow as it would be expressed in a Python-like pseudocode. It shows the essential pattern, not Kotlin-specific syntax.

python

# Architecture pseudocode -- NOT runnable, shows the reactive pipeline

class WowEngine:
    """The processing pipeline: Command -> Validate -> Load Aggregate ->
       Execute -> Persist Events -> Publish Events -> Project -> Saga"""

    def handle_command(self, command_msg: CommandMessage) -> Mono[CommandResult]:

        # 1. IDEMPOTENCY CHECK (Bloom filter or DB check)
        if self.idempotency_checker.is_duplicate(command_msg.request_id):
            raise DuplicateRequestIdException(command_msg)

        # 2. VALIDATE (Bean Validation + optional self-validation)
        self.validator.validate(command_msg.body)

        # 3. LOAD AGGREGATE (snapshot + event sourcing)
        aggregate = await self.load_aggregate(command_msg.aggregate_id)
        # Internally: snapshot_repo.load(agg_id)
        #   -> if snapshot exists: deserialize state
        #   -> else: create new state from factory
        #   -> event_store.load(agg_id, from_version=snapshot.version+1)
        #   -> apply each DomainEventStream to rebuild state

        # 4. EXECUTE COMMAND against the loaded aggregate
        command_handler = self.routing_table[type(command_msg.body)]  # pre-computed by KSP
        events = aggregate.execute(command_handler, command_msg)
        # Events are returned, NOT state mutation. The aggregate is a pure function:
        #   f(State, Command) -> List[DomainEvent]

        # 5. APPEND EVENTS to event store (atomic, version-checked)
        event_stream = DomainEventStream(
            aggregate_id=command_msg.aggregate_id,
            version=aggregate.next_version,
            events=events,
            request_id=command_msg.request_id
        )
        await self.event_store.append(event_stream)
        # Failure here -> EventVersionConflictException (optimistic concurrency)
        # or DuplicateRequestIdException (idempotency)

        # 6. SOURCE EVENTS onto aggregate state
        aggregate.apply_events(event_stream)

        # 7. SNAPSHOT (if snapshot strategy triggers)
        if self.snapshot_strategy.should_snapshot(aggregate.version):
            await self.snapshot_repo.save(aggregate.snapshot())

        # 8. PUBLISH EVENT STREAM to domain event bus
        await self.domain_event_bus.send(event_stream)
        # LocalFirst routing: deliver to local subscribers first,
        # then fan-out to distributed (Kafka/Redis) subscribers

        # 9. PROJECT read models (async, via event bus subscription)
        # projection_handler subscribes to DomainEventBus, updates read models
        # Supported backends: Elasticsearch, MongoDB, R2DBC, CoCache

        # 10. SAGA ORCHESTRATION (async, via event bus subscription)
        # saga_handler subscribes to DomainEventBus, emits compensating commands
        # Sagas are stateless: listen for events, produce commands

        return CommandResult(...)


class WaitStrategy_Flow:
    """How the WaitStrategy creates push-based notification pipeline"""

    def send_with_wait(self, command, wait_strategy):
        # Register the wait strategy (maps wait_command_id -> sink)
        self.wait_registrar.register(wait_strategy)

        # Propagate wait endpoint address into message header
        wait_strategy.propagate(self.endpoint, command.header)

        # Send command to the bus
        self.command_bus.send(command)

        # Each downstream processor calls: wait_notifier.notify(wait_strategy_id, signal)
        # The wait strategy sink receives signals reactively:
        #   SENT -> PROCESSED -> PROJECTED -> EVENT_HANDLED -> SAGA_HANDLED -> [complete]

        # Caller receives either:
        #   - A Mono[CommandResult] (single terminal result)
        #   - A Flux[CommandResult] (streaming results as stages complete)
        return wait_strategy.waiting_last()

3. Complete System Architecture

This diagram shows the full architectural surface area — all modules, their relationships, and the data flow topology.

Module Map

Module	Responsibility	Layer	Source
`wow-api`	Pure API contracts: `CommandMessage`, `DomainEvent`, `AggregateId`, all annotations	Foundation	settings.gradle.kts:21
`wow-core`	Framework engine: command/event bus, event store abstractions, projections, sagas, wait strategies	Engine	settings.gradle.kts:22
`wow-compiler`	KSP processor: generates command routing, event metadata, OpenAPI specs at compile time	Dev-time	settings.gradle.kts:26
`wow-spring`	Spring IoC integration: `SpringServiceProvider` bridge	Integration	settings.gradle.kts:32
`wow-spring-boot-starter`	Auto-configuration with feature variants (`mongo-support`, `kafka-support`, etc.)	Integration	settings.gradle.kts:34
`wow-webflux`	WebFlux command endpoint auto-registration	API Layer	settings.gradle.kts:33
`wow-kafka`	Distributed command/event bus via Kafka	Messaging	settings.gradle.kts:27
`wow-mongo`	MongoDB event store + snapshot store	Storage	settings.gradle.kts:28
`wow-redis`	Redis event store + snapshot store	Storage	settings.gradle.kts:30
`wow-r2dbc`	R2DBC event store (MariaDB/PostgreSQL)	Storage	settings.gradle.kts:29
`wow-elasticsearch`	Elasticsearch projection store	Storage	settings.gradle.kts:31
`wow-test`	Unit testing DSL: `AggregateSpec`, `SagaSpec` (Given-When-Expect)	Dev-time	settings.gradle.kts:44-45
`wow-cosec`	ABAC authorization framework	Cross-cutting	settings.gradle.kts:40
`wow-opentelemetry`	Tracing + metrics via OpenTelemetry	Cross-cutting	settings.gradle.kts:35
`compensation/*`	Event compensation orchestrator + dashboard	Cross-cutting	settings.gradle.kts:56-63
`wow-cocache`	CoCache-based projection caching layer	Storage	settings.gradle.kts:24

4. Command Processing Data Flow

This sequence diagram traces the full lifecycle of a command from API call to final acknowledgment. Every stage is tracked by the WaitStrategy.

Key Observations from the Flow

Optimistic concurrency is the only locking model. The EventStore.append() checks that the aggregate version has not changed since loading. If there's a conflict, EventVersionConflictException is thrown. There is no distributed lock. Retry must happen at a higher level.
The LocalFirst decision happens at the bus layer, not the gateway. Both command and event buses share the identical LocalFirstMessageBus pattern (LocalFirstMessageBus.kt:89-171). If the aggregate is local AND has subscribers, the message is delivered in-process first; a copy is then sent to the distributed bus for other instances.
Projections and sagas are decoupled from the command path. They subscribe to the DomainEventBus as independent consumers. This is standard CQRS. The WaitStrategy re-couples them for the purpose of client notification only.

5. Event Store Scalability Model

The event store scales along two axes: storage backend choice and aggregate sharding.

Scalability Model Summary

Dimension	Mechanism	Limitation	Source
Per-aggregate write throughput	Single aggregate is serialized by optimistic concurrency; no parallel writes to same aggregate	Hot aggregates will bottleneck	EventStore.kt:38-43
Multi-aggregate write scaling	`AggregateIdSharding` routes different aggregates to different storage shards (e.g., CosId snowflake -> hash -> shard)	Shard distribution quality depends on ID generation scheme	AggregateIdSharding.kt:82-104
Read scaling (long aggregates)	`SnapshotRepository` stores periodic state snapshots; replay only events since last snapshot	Snapshot strategy must be tuned per aggregate type	VersionOffsetSnapshotStrategy.kt
Event bus fan-out	`LocalFirst` routes to local consumers first (zero network hop), then distributes via Kafka/Redis	Kafka partition ordering must align with aggregate ID to preserve per-aggregate order	LocalFirstMessageBus.kt:129-170
Storage backend choice	MongoDB (document model fits events naturally), Redis (highest throughput, in-memory), R2DBC (SQL, operational familiarity)	Each has different latency/throughput/ops profiles	settings.gradle.kts:27-30

6. Comparison with Alternatives

Side-by-Side: Wow vs Axon Framework vs Eventuate vs Manual CQRS+ES

Dimension	Wow (`8.3.8`)	Axon Framework (`4.x`)	Eventuate Tram	Manual (DIY)
Language / Platform	Kotlin 2.3, JVM 17+, reactive (Project Reactor)	Java/Kotlin, supports both blocking and reactive	Java, Spring Boot	Any
Command routing	KSP compile-time code generation; zero reflection	Runtime annotation scanning + reflection	Runtime annotation scanning	Manual wiring
Event sourcing	First-class: `EventStore` with 4 backend choices	First-class: Axon Server or JPA/JDBC	CDC-based (Debezium) or JDBC polling	Manual event tables + bus
Message bus	Unified `MessageBus<M,E>` with `LocalFirst` routing	Separate `CommandBus` + `EventBus` abstractions	Separate command/event channels	Kafka/RabbitMQ manual wiring
Wait/notification	Push-based `WaitStrategy` chain (6 stages) with reactive sinks	`CommandCallback` on send; `SubscriptionQuery` for streaming	Polling `CommandReplyOutcome` table	Custom correlation ID polling
Snapshot strategy	`VersionOffsetSnapshotStrategy` — snapshot every N versions	Configurable trigger (version count, time)	Snapshot via event upcaster	Manual
Saga / Process Manager	Stateless sagas: compile-time event-to-command mapping via KSP	Stateful `Saga` with `@SagaEventHandler` and `@EndSaga`	`Saga` with event handlers and command producers	Custom orchestration
Testing	`AggregateSpec` / `SagaSpec` with Given-When-Expect DSL + fork support	`AggregateTestFixture` / `SagaTestFixture` with Given-When-Then	Manual test setup	Manual
OpenAPI generation	Automatic via KSP from `@CommandRoute` annotations	Manual or via custom plugin	Manual	Manual
Metrics / Observability	OpenTelemetry native (decorator chain on all buses/handlers)	Axon metrics + Micrometer	Custom	Custom
Compensation	First-class dashboard + retry engine	Dead letter queue via Axon Server	Manual	Manual
Ecosystem maturity	Younger (2021+), smaller community	Mature (2010+), large community, AxonIQ commercial support	Moderate (Eventuate commercial)	N/A (you build it)
Learning curve	Requires DDD + ES + reactive + Kotlin KSP concepts	Requires DDD + ES; Axon Server shields complexity	Requires CDC understanding	Full ownership

When to Choose Wow

You are building high-throughput write microservices (~60k TPS per node target)
Your team is comfortable with Kotlin, Project Reactor, and KSP
You want compile-time safety for all command routing and event handling
You already operate Kafka, MongoDB, or Redis and want to use them as event stores
You value a unified programming model (commands and events share the same bus abstraction)

When to Choose Axon

You need the Axon Server ecosystem (dead letter queue management, event store administration UI)
Your team is Java-dominant and prefers blocking programming models
You need third-party commercial support (AxonIQ)
You are building systems where the framework documentation and community support are critical decision factors

7. Design Tradeoff Analysis

What Wow Optimizes For

Optimization	How	Cost
Single-aggregate write throughput	In-memory bus dispatch (local subscribers), snapshot-based loading, no distributed lock	Hot aggregate serialization creates a hard ceiling per aggregate ID
Compile-time safety	KSP generates routing tables, eliminating runtime class scanning and reflection	Requires `wow-compiler` KSP dependency in every domain module; incremental compilation overhead
Unified messaging	`MessageBus<M,E>` for both commands and events; `LocalFirst` shared by both	The abstraction is generic — debugging distributed routing requires understanding `LocalFirst` logic
Developer experience	`AggregateSpec` / `SagaSpec` DSL, `@OnCommand` / `@OnEvent` annotations, auto-registered WebFlux routes	Developers must internalize the "command returns events, not state mutation" paradigm
Reactive end-to-end	Project Reactor throughout; no blocking anywhere in the hot path	Every team member must understand `Mono`/`Flux` semantics; debugging reactive stacks is harder
Void commands	Commands can be marked `isVoid` — the aggregate produces no events and no state change	Void commands skip the `LocalFirst` optimization because there is nothing to wait for (LocalFirstCommandBus.kt:41-46)

What Wow Sacrifices

Sacrifice	Impact	Mitigation
No multi-aggregate transaction	Cross-aggregate consistency requires sagas with eventual consistency	First-class saga support with compensation dashboard for monitoring failure
No event store administration UI	MongoDB/Redis event store must be managed via native tools	Compensation dashboard covers the operational gap for failed events
Kotlin/JVM lock-in	KSP compiler, Flow/Mono usage, Kotlin data classes for events	Java can use Wow via the `example-transfer` Java example, but the ergonomics are reduced
No blocking programming model	All command/event paths are reactive; mixing blocking code in handlers causes thread pool starvation	`@Blocking` annotation support for CPU-bound handlers
Optimistic concurrency retries are the caller's responsibility	If `EventVersionConflictException` occurs, the caller (not the framework) must retry	Idempotency via `requestId` deduplication ensures safe retry
Sharding is manual	`AggregateIdSharding` must be configured per aggregate type; no automatic rebalancing	CosId snowflake IDs provide good distribution; static shard maps are common in practice

Dependency coupling analysis

Dependency Direction	Nature	Risk Level
`wow-api` (no dependencies)	Pure contracts — zero external dependencies beyond Kotlin stdlib	None
`wow-core` -> `wow-api`	Engine depends on contracts; hard coupling	Low — the API is the stable contract
`wow-core` -> Project Reactor	Reactive streams are the core abstraction; deeply embedded	Medium — Reactor API changes would be breaking
`wow-core` -> CosId	ID generation for aggregate IDs and global IDs	Low — `GlobalIdGenerator` is a pluggable interface (GlobalIdGenerator.kt)
`wow-spring-boot-starter` -> all backends	Feature variants (`mongo-support`, `kafka-support`, etc.) create optional coupling	Low — Gradle feature variants isolate backend dependencies
`wow-compiler` -> `wow-api` annotations	KSP reads annotations from `wow-api` to generate code	Low — versioned annotation contracts

8. Performance Model

Benchmark Results (from `example/` performance tests)

All benchmarks run with MongoDB + Redis + Kafka on Kubernetes (README.md:56-98).

Operation	Wait Mode	Avg TPS	Peak TPS	Avg Latency	Source
AddCartItem	`SENT`	59,625	82,312	29 ms	README.md:70-74
AddCartItem	`PROCESSED`	18,696	24,141	239 ms	README.md:76-80
CreateOrder	`SENT`	47,838	86,200	217 ms	README.md:88-92
CreateOrder	`PROCESSED`	18,230	25,506	268 ms	README.md:94-98

Performance Interpretation

SENT mode is 3x faster than PROCESSED. This is because SENT only waits for the command to be accepted by the bus. No aggregate loading, event persistence, or projection happens before the response. This is the "fire-and-forget" pattern — useful when the caller does not need confirmation.
CreateOrder (SENT) has higher latency than AddCartItem (SENT) despite similar TPS. CreateOrder involves more complex validation (external CreateOrderSpec specification service is injected into the handler, calling specification.require(it) reactively — see Order.kt:106-138). This is a design choice, not a framework limitation.
Peak TPS can spike 30-80% above average. The reactive pipeline handles bursts well because backpressure is managed by Reactor's Sinks.Many infrastructure.
The bottleneck for PROCESSED mode is the event store append latency. MongoDB insert latency dominates at ~18k TPS. Switching to Redis or tuning MongoDB write concerns can raise this ceiling.
JMH microbenchmarks exist separately. The wow-benchmarks module contains JMH benchmarks for CommandGateway, InMemoryCommandBus, InMemoryEventStore, and backend-specific benchmarks (MongoDB, Redis). These measure raw framework overhead, not end-to-end throughput.

TPS Scaling Model (per node)

Component	Approximate ceiling	Limiting factor
In-memory command bus dispatch	>500,000 ops/s	JVM throughput; not the bottleneck
MongoDB event store append	~20,000 ops/s	MongoDB single-document insert latency
Redis event store append	~50,000 ops/s	Redis ZADD latency
Kafka message publish	>100,000 msgs/s	Kafka partition throughput
Snapshot generation	~50,000 ops/s	Serialization + store write
Projection update (Elasticsearch)	~10,000 ops/s	ES indexing throughput

9. Risk Assessment

What Breaks and How

Failure Mode	Likelihood	Impact	Detection	Mitigation	Source
Event version conflict	High (under concurrent writes to same aggregate)	Command rejected; caller must retry	`EventVersionConflictException` in logs	Idempotent retry via `requestId`; aggregate redesign to reduce write contention	AbstractEventStore.kt:40-52
Duplicate command processing	Low (with Bloom filter or DB check)	Idempotent processing or business logic duplication	`DuplicateRequestIdException`	`AggregateIdempotencyChecker` with Bloom filter or DB-backed check	DefaultCommandGateway.kt:77-88
Snapshot corruption	Low	Aggregate loaded from stale/broken state on restart	Unexpected state after replay	Snapshot is optional; event replay always authoritative; repair by deleting snapshot and replaying	EventSourcingStateAggregateRepository.kt:82-86
Kafka partition leader failure	Medium (in production)	Delayed event delivery; potential message loss if `acks=0`	Kafka consumer lag metrics	Configure `acks=all`, `min.insync.replicas=2`	KafkaAutoConfiguration.kt
Event store backend outage	Medium	All writes blocked; reads from cached projections may continue to serve	Health check on MongoDB/Redis/R2DBC connection	Multi-region replica sets; circuit breaker in reactive pipeline	EventStoreAutoConfiguration.kt
Projection drift	Medium	Read models diverge from event store truth	Automated reconciliation; compensation dashboard tracks failed events	`StateEventCompensator` / `DomainEventCompensator` replay failed events; Dashboard for manual retry	compensation/ core
Saga failure (no compensation)	Medium	Distributed transaction left in inconsistent state	Compensation dashboard shows saga failures	Stateless sagas emit compensating commands on failure; dashboard allows manual retry with spec	StatelessSagaHandler.kt
WaitStrategy memory leak	Medium	Unbounded `Sinks.Many` accumulation if waiters never disconnect	`waitingLast()` timeout; JMX metrics on active waiters	`WaitingFor` uses `Sinks.unsafe().many().unicast().onBackpressureBuffer()` — if the sink is never completed, memory grows	WaitingFor.kt:38-39
JVM garbage collection pause	Low (with G1GC tuning)	Tail latency spike; reactive timeouts triggered	GC logs; OpenTelemetry JVM metrics	`-Xmx2g` minimum (gradle.properties:15); production tuning required

Security Considerations

ABAC authorization is provided via wow-cosec (AbbacTagsMerger.kt). Commands carry @ApplyAbacTags metadata that the CoSec engine evaluates.
Tenant isolation is built-in via TenantId. Every aggregate ID carries a tenant ID. The @StaticTenantId annotation (Cart.kt:34) can fix the tenant at compile time.
Operator tracking is captured via OperatorCapable on events — every event records who initiated the operation.

10. Architectural Decision Log

When adopting Wow, use this format to document architectural decisions. Each decision should record what was considered and why.

Template

markdown

## ADR-{NNN}: {Title}

**Date**: YYYY-MM-DD
**Status**: Proposed | Accepted | Deprecated | Superseded
**Context**: What is the problem we are solving?
**Decision**: What did we decide?
**Alternatives Considered**:
  - Alternative 1: Why rejected
  - Alternative 2: Why rejected
**Consequences**:
  - Positive: What we gain
  - Negative: What we trade off
  - Mitigations: How we manage the negative

Example Decisions When Adopting Wow

ADR-001: Event Store Backend

Context: Choose between MongoDB, Redis, and R2DBC as the primary event store.
Decision: MongoDB for production (document model fits event streams, operational maturity). Redis for high-throughput services that can tolerate data loss on restart (if not configured for persistence).
Alternatives: R2DBC (SQL familiarity but less natural for event append patterns). Redis (highest throughput but higher operational complexity for durability).
Consequences: MongoDB requires replica set for production durability. Single-document insert latency is the throughput ceiling (~20k TPS per node).

ADR-002: Wait Strategy Mode

Context: Choose between SENT and PROCESSED wait modes for command endpoints.
Decision: SENT for high-throughput ingestion endpoints (e.g., shopping cart additions, analytics events). PROCESSED for synchronous business confirmations (e.g., order creation where caller needs order ID + computed fields).
Alternatives: Always PROCESSED (safer but 3x lower throughput). Always SENT (faster but caller must implement separate confirmation polling).
Consequences: A single aggregate may expose different endpoints with different wait modes. API contracts must document which mode each endpoint uses.

ADR-003: Snapshot Strategy

Context: Configure SnapshotStrategy to balance load time vs. storage overhead.
Decision: VersionOffsetSnapshotStrategy(offset=50) — snapshot every 50 versions. For aggregates that rarely exceed 50 events in their lifetime, snapshots may never trigger (acceptable).
Alternatives: Snapshot every version (too much storage overhead). No snapshots (slow load for long-lived aggregates).
Consequences: Load time is O(snapshot_deserialize + events_since_snapshot). Tune the offset based on per-aggregate event frequency.

ADR-004: Projection Store Choice

Context: Choose read model store for projections.
Decision: Elasticsearch for search-heavy read models. MongoDB for simple key-value projections. CoCache for caching hot projections.
Alternatives: Single store for all projections (simpler ops but suboptimal for different query patterns).
Consequences: Multiple projection stores increase operational surface area. Projection handlers must be aware of which store they write to.

ADR-005: Sharding Strategy

Context: Configure aggregate sharding for multi-node deployments.
Decision: Use CosIdShardingDecorator with Modulo sharding over N nodes, keyed on CosId snowflake ID. Aggregate IDs use CosId SnowflakeIdGenerator which provides naturally distributed IDs.
Alternatives: Manual node-per-aggregate-type sharding (SingleAggregateIdSharding). Consistent hashing (more complex rebalancing).
Consequences: Adding nodes changes the modulo, requiring data migration. Pre-allocate enough shards (e.g., 64 shards mapped to 3 nodes initially) to allow rebalancing without data movement.

11. Production Readiness Checklist

Capability	Status	Notes
Event store backup/restore	Depends on backend	MongoDB: use mongodump. Redis: use RDB/AOF. R2DBC: use DB-native tools.
Event store migration / versioning	`EventUpgrader`	Events have a `@Revision` annotation; `EventUpgrader` can transform old event shapes to new ones. See EventUpgrader.kt.
Horizontal scaling	Via Kafka consumer groups	Each Wow instance subscribes to Kafka topics for distributed commands/events. `LocalFirst` ensures co-located aggregates are processed in-process.
Graceful shutdown	Configurable	`wow.shutdown-timeout` (default: 60s) — WowProperties.kt:29. Drains in-flight commands before stopping.
Health checks	Spring Boot Actuator	Standard Spring Boot health endpoints. Backend-specific health indicators for MongoDB, Redis, Kafka.
Metrics export	OpenTelemetry	All buses, handlers, event stores, and projection dispatchers are wrapped in metric decorators. See Metrics.kt directory.
Distributed tracing	OpenTelemetry	Trace context propagates through `CommandRequestHeaderPropagator` and local-first header propagation. See propagation package.
Rate limiting	Not built-in	Must be implemented at the API gateway or via custom `Filter` in the `FilterChain`.
Circuit breaking	Not built-in	Reactor's `Mono.retryWhen()` can be used. Backend connection errors propagate as reactive errors.
Audit log	Event sourcing inherently provides audit	Every state change is recorded as a domain event with operator, timestamp, and request ID.

12. Getting Started as a Staff Engineer

First Week: Understand the Flow

Read the Order aggregate at Order.kt. This is the canonical example that shows: command validation, external service injection, event composition, error handling, and multi-event returns.
Read the Cart aggregate at Cart.kt. Simpler example with state-based conditional logic.
Trace the command path in DefaultCommandGateway.kt: check() -> propagate() -> send() -> next().
Understand the aggregate loading in EventSourcingStateAggregateRepository.kt: snapshot -> create -> replay events.
Study the event bus in DomainEventBus.kt: ordered publication, ordered processing, the contract.

Second Week: Deepen on the Engine

Read the LocalFirstMessageBus at LocalFirstMessageBus.kt. This is the most important implementation detail for understanding distributed vs local routing.
Study the WaitStrategy chain at WaitingFor.kt and CommandStage.kt. Understand how stages form a DAG.
Examine the auto-configuration at WowAutoConfiguration.kt and all sub-configurations. This is how the framework wires itself into Spring Boot.
Review test examples at OrderTest.kt (aggregate tests) and CartSagaTest.kt (saga tests). The Given-When-Expect DSL with fork support is powerful.
Run the benchmark suite: ./gradlew wow-benchmarks:jmh to see raw framework overhead numbers.

Page	Relevance
Architecture Guide	High-level architecture overview for developers
Configuration Reference	Full `wow.*` configuration property reference
Command Configuration	Command bus, gateway, wait strategy configuration
Event Configuration	Event bus, dispatcher configuration
Event Sourcing Configuration	Event store, snapshot, state configuration
Saga Guide	Stateless saga development guide
Event Compensation	Compensation orchestrator and dashboard
Testing Guide	AggregateSpec/SagaSpec DSL reference
Architecture Deep Dive	Detailed component internals
Event Store Deep Dive	Event store design and performance characteristics
Example: Order Service	Complete Order aggregate with saga and projection
Example: Transfer Service	Java-based simple event sourcing example

Staff Engineer Onboarding Guide ​

TL;DR ​

1. The Core Architectural Insight ​

Why This Matters ​

Second-Order Insight: Compile-Time Metadata Generation ​

2. Architecture Pseudocode (Python) ​

3. Complete System Architecture ​

Module Map ​

4. Command Processing Data Flow ​

Key Observations from the Flow ​

5. Event Store Scalability Model ​

Scalability Model Summary ​

6. Comparison with Alternatives ​

Side-by-Side: Wow vs Axon Framework vs Eventuate vs Manual CQRS+ES ​

When to Choose Wow ​

When to Choose Axon ​

7. Design Tradeoff Analysis ​

What Wow Optimizes For ​

What Wow Sacrifices ​

Dependency coupling analysis ​

8. Performance Model ​

Benchmark Results (from example/ performance tests) ​

Performance Interpretation ​

TPS Scaling Model (per node) ​

9. Risk Assessment ​

What Breaks and How ​

Security Considerations ​

10. Architectural Decision Log ​

Template ​

Example Decisions When Adopting Wow ​

11. Production Readiness Checklist ​

12. Getting Started as a Staff Engineer ​

First Week: Understand the Flow ​

Second Week: Deepen on the Engine ​

Related Pages ​