Skip to content

Architecture

Aver implements a three-layer acceptance testing architecture inspired by Dave Farley’s four-layer model and the Screenplay pattern from Serenity.js.

The name “aver” means “to declare with confidence” — your tests aver that the system behaves as intended.

Farley’s model separates acceptance tests into four layers: Tests, DSL, Protocol Driver, and the System Under Test. Aver gives you three of those — the fourth is your system itself.

Test (verify) → Domain (what) → Adapter (how) → [your system]

Test composes domain operations into scenarios. Tests never import adapters — they speak only domain language. The suite resolves adapters from configuration at runtime.

Domain declares the vocabulary of a bounded context — what the system does, in business language. No implementation details. Just names and type signatures for actions, queries, and assertions.

Adapter binds the vocabulary to a real implementation via a protocol. One adapter per interaction mode: unit (direct code interfaces), HTTP (API), Playwright (browser). The adapt() function enforces at compile time that every domain operation has a handler.

Page objects and test helpers give you two layers: tests and implementation. This works until you need the same behavior verified at multiple levels. With two layers, you either duplicate test logic per level or couple tests to a specific protocol.

The domain layer is the missing abstraction. It sits between your tests and your implementation, giving both a shared language. Tests compose domain operations without knowing the protocol. Adapters implement domain operations without knowing the tests. The domain is the contract that lets both sides evolve independently.

Three concepts make up the domain language:

ConceptPurposeReturns
ActionDo something (side effect)void
QueryRead somethingtyped data
AssertionCheck somethingpass/fail

Actions perform operations. Queries extract data. Assertions verify expectations.

Assertions could be expressed as query + expect, but they earn their place because they express intent in domain language, enable protocol-optimized checks (Playwright’s auto-waiting vs manual polling), and produce better traces on failure.

Here’s what a domain definition looks like:

import { defineDomain, action, query, assertion } from '@averspec/core'
export const taskBoard = defineDomain({
name: 'task-board',
actions: {
createTask: action<{ title: string }>(),
moveTask: action<{ title: string; status: string }>(),
},
queries: {
taskDetails: query<{ title: string }, Task>(),
},
assertions: {
taskInStatus: assertion<{ title: string; status: string }>(),
},
})

Tests can use given, when, and then as narrative aliases for act and assert. They call the same adapter handlers — the difference is in step labeling:

Test steps (unit):
[PASS] GIVEN ShoppingCart.addItem({"name":"Widget","qty":2}) 12ms
[PASS] WHEN ShoppingCart.checkout() 45ms
[PASS] THEN ShoppingCart.totalCharged({"amount":35}) 2ms

A protocol manages session lifecycle. It creates a context in setup() that every adapter handler receives as its first argument, and cleans up in teardown().

For the unit protocol, the context is your domain object. For playwright, it’s a Playwright Page. For http, it’s an HTTP client pointed at a running server.

Protocols can also hook into test lifecycle events: onTestStart runs before each test body, onTestFail runs on failure and can return attachments (screenshots, logs), and onTestEnd runs after each test for cleanup.

Aver ships three protocols:

ProtocolContextUse Case
unit(factory)Your objectDirect interface testing
http({ baseUrl })HTTP clientAPI-level testing
playwright()Playwright PageBrowser UI testing

The unit protocol is built into core (zero dependencies). HTTP and Playwright are separate packages.

You can write custom protocols for any interaction mode — WebSocket, gRPC, CLI, whatever your system exposes.

withFixture(protocol, { before?, after? }) wraps a protocol with setup/teardown hooks — useful when an adapter needs external infrastructure (a database, a server process) that sits outside the protocol’s own context:

import { withFixture } from '@averspec/core'
import { http } from '@averspec/protocol-http'
const httpWithServer = withFixture(http({ baseUrl: 'http://localhost:3000' }), {
before: () => startServer(),
after: () => stopServer(),
})

before runs before protocol.setup(). after runs after protocol.teardown(), even if teardown throws. All other protocol behavior (lifecycle hooks, telemetry) passes through unchanged.

When you register multiple adapters for the same domain, every test runs against all of them automatically:

✓ add item to cart [unit] 1ms
✓ add item to cart [http] 12ms
✓ add item to cart [playwright] 280ms

Each test gets an isolated protocol context. Test names are parameterized with the protocol name. When two adapters disagree on a behavior, that disagreement surfaces a real bug.

You can filter adapters at runtime with AVER_ADAPTER=unit to run only the fast tests during development.

Cross-domain composition with named-config suite()

Section titled “Cross-domain composition with named-config suite()”

When a test scenario spans multiple bounded contexts, suite() accepts a named config object instead of a single domain. Each key maps to a [domain, adapter] tuple, and the test callback receives a context with a namespace per key:

const { test } = suite({
cart: [shoppingCart, cartAdapter],
payments: [paymentGateway, paymentAdapter],
})
test('checkout charges the card', async ({ cart, payments }) => {
await cart.given.addItem({ name: 'Widget', qty: 1 })
await cart.when.checkout()
await payments.then.chargeRecorded({ amount: 25 })
})

Each domain gets its own protocol lifecycle. The shared trace() function on the context collects trace entries from all domains in execution order.

Domains can be extended with additional vocabulary for protocol-specific concerns:

const taskBoardUI = taskBoard.extend('task-board-ui', {
assertions: {
showsLoadingSpinner: assertion(),
},
})

Extensions inherit all vocabulary from the parent. An adapter for the extended domain must implement everything from both the parent and the extension. This lets you write shared behavioral tests against the base domain, and protocol-specific tests against the extension.

On failure, Aver shows the test steps — every domain operation leading to the error:

FAIL shopping-cart.spec.ts > full checkout flow [unit]
Test steps (unit):
[PASS] GIVEN ShoppingCart.addItem({"name":"Widget","qty":2}) 12ms
[PASS] THEN ShoppingCart.hasItems({"count":1}) 1ms
[PASS] QUERY ShoppingCart.cartTotal() 0ms
[FAIL] WHEN ShoppingCart.checkout() — Expected order to be confirmed 45ms
Expected order to be confirmed

The trace is recorded automatically as the suite proxies domain calls through the adapter. Each step is logged as it executes, so failures show the full sequence leading to the error.

Observability data is made powerful by context — the relationships between spans matter more than any individual span. A checkout span alone tells you little. A checkout span causally connected to a payment span and a fulfillment span, all sharing an order ID within the same trace, tells you the whole story. When those relational seams break — a refactor drops the trace propagation, a renamed attribute disconnects two spans — your dashboards go dark and your agents can’t validate what they shipped.

Aver treats these relationships as a testable contract. Domains declare expected OTel spans on operations, and the framework verifies both the spans and their connections.

Two verification layers run automatically:

  1. Per-step: After each operation, verify a matching span exists with the expected name and attributes
  2. End-of-test correlation: After all steps, verify that correlated steps (shared attribute key + value) are causally connected — same traceId or span links

This catches instrumentation bugs that behavioral tests miss: the API returns the right data, but the spans are missing, misnamed, or disconnected. The @averspec/telemetry package extends this to production — extract a behavioral contract from passing tests, then verify that production traces conform to the same contract. See the telemetry guide for details.

  • Zero runtime dependencies in core. Protocols are separate packages.
  • Type-safe — the type system enforces that adapters implement every domain item.
  • Adapter authors receive ready-to-use context — protocols handle lifecycle.
  • Tests are protocol-agnostic — they import domains, never adapters.

The cost model determines when Aver earns its keep.

What grows with what: Vocabulary grows with domain surface area — the number of distinct behaviors. Tests grow with scenarios — the number of ways those behaviors compose. Surface area grows slowly; scenarios grow fast. Five domain operations can support fifty tests.

Cost per operation: One vocabulary entry in the domain, plus one handler per adapter. At one adapter, this is comparable to extracting a page object method. At three adapters, it’s a 1:3 ratio — but each handler is isolated.

The breakeven: With a single adapter, Aver’s overhead roughly equals well-structured page objects. The cross-adapter benefit kicks in at the second adapter. By the time you have two adapters, the bugs caught by cross-level verification exceed the cost of maintaining two sets of handlers.

Aver’s domain layer naturally separates what an AI agent is good at (generating code, iterating until tests pass) from what requires human judgment (deciding what the system should do, naming the vocabulary). The human defines the outer loop — domain specs in business language. The agent works the inner loop — implementing code until aver run passes.

See AI-Assisted Testing for the full rationale and setup.