MODULE_02 // PROTOCOL

Methodology

How the Universal Trust Score is computed. No categories. No magic multipliers. Just signed signals weighted by source trust and source diversity.

FORMULA

score = round(weighted_mean × coverage)

weighted_mean: 0–100  ·  coverage: 0–1  ·  result: 0–100 integer

AAL is a signal-based trust protocol for AI agents. Trust is not a category and not a bucket. The protocol collects signed measurements from many independent sources, weights each by source trust, and aggregates them into a single Universal Trust Score on the 0-100 scale. The same data, replayed, produces the same score.

ARCHITECTURE

Six layers, left to right

The codebase is partitioned into six logical layers. Signals flow left to right. Each layer has a strict scope; nothing imports across the firewall the wrong way.

SIGNAL FLOW

protocol/  ─►  sandbox/  ─►  ingest/  ─►  engine/  ─►  output/  ─►  pipeline/
   │              │            │           │           │            │
   ▼              ▼            ▼           ▼           ▼            ▼
signing,       10 ASI       12 third-   weighted    VC, badge,   coverage
identity,      threats +    party        mean +     MCP server,  floor,
tag           3 bench-      crawlers,   coverage,   chain        gates,
registry      marks         on-chain    tier        writeback    batch
                            registries

protocol/

Signal contract, identity resolution, Ed25519 signing, tag registry. The ground truth of what a signal is and who can issue one.

protocol/signal.ts, protocol/signing.ts, protocol/tagRegistry.ts, protocol/identity/canonical.ts

sandbox/

AAL-native attack surface and benchmarks. Ten ASI threats and three benchmarks emit signals against a target endpoint.

sandbox/runner.ts, sandbox/threats/asi*.ts, sandbox/benchmarks/reliability-probe.ts

ingest/

Crawlers for external trust signals — on-chain registries (ERC-8004, Solana ATOM) and third-party score services (Mnemom, Helixa, etc.).

ingest/crawlers/erc8004-reputation.ts, ingest/crawlers/runner.ts, ingest/run-all-crawlers.ts

engine/

Pure functions that turn signals into a score. Weighted aggregation, log-scaled coverage, tier assignment, confidence distribution. No I/O.

engine/score.ts, engine/coverage.ts, engine/tiers.ts, engine/persist.ts

output/

Score consumers. W3C Verifiable Credentials, SVG badges, MCP server, dual-chain writeback (Solana SEAL v1 + Base ERC-8004 attestation).

output/credentials/issuer.ts, output/badge/*, output/mcp/*, output/chain/writeback.ts

pipeline/

Orchestration. Coverage-floor downgrade, registered-agent abuse-vector gate, batch scoring, recompute, signal filter for inapplicable tags.

pipeline/orchestrator.ts, pipeline/recompute.ts, pipeline/filter.ts, pipeline/batch/*

PIPELINE

Six steps from signal to score

01

Signal Collection

Signals flow in from AAL sandbox testing (OWASP Top 10, reliability, capability) and external sources (third-party crawlers, on-chain registries). Each signal is Ed25519-signed at emission with open-vocabulary tags.

02

Source Weighting

Each signal source has a trust weight. Source weights are stored in the database and stamped on signals at ingest time (source_weight_at_time). Historical scores replay with historical weights; live scores use current weights.

03

Weighted Aggregation

The engine computes a weighted mean of all applicable signals for an agent. Each signal value (0-100) is multiplied by its source weight. The result is a weighted_mean on the 0-100 scale.

04

Coverage Factor

Source diversity is measured as a coverage factor between 0 and 1. More independent signal sources yield higher coverage. This prevents agents from gaming scores through a single high-trust source.

05

Final Score

score = round(weighted_mean * coverage). The weighted_mean is 0-100 and coverage is 0-1, producing a final Universal Trust Score on the 0-100 scale. No magic multipliers.

06

Tier Classification

The score maps to a certification tier: Unrated (0-9), Bronze (10-24), Silver (25-49), Gold (50-69), Platinum (70-84), Diamond (85-100). Confidence in tier placement is expressed as mean +/- stddev.

LIFECYCLE

Anatomy of a signal

What happens to a single signal between emission and inclusion in a score. Every signal traverses these five stages exactly once. Validation can fail at stages 2 and 4; success is silent.

01

Emission

A sandbox module probes a target endpoint, or a crawler reads a third-party registry. The result is a raw measurement (e.g. uptime over N samples, on-chain feedback summary, OWASP probe outcome).

02

Tag + normalize

The raw measurement is mapped to one or more open-vocabulary tags drawn from the closed tag registry (28 tags as of 2026-05-02; ADR-012). Values are normalized to the 0-100 scale. Confidence is recorded as mean and stddev (ADR-003); point estimates use stddev = 0.

03

Sign

The unsigned signal is canonically serialized and signed with the AAL master key (Ed25519). The signing_key_id is recorded on the signal so verifiers can resolve the public key from the signing_keys table (ADR-002). Crawlers always sign with the master key; sandbox modules do the same.

04

Validate + persist

On ingest, every tag must be in the registry (validateSignal collects unknown tags and rejects). The signature is verified. The source weight at-time-of-emission is stamped on the row (source_weight_at_time, ADR-006 #27). The signals table is append-only — corrections create a new row that supersedes the prior one.

05

Aggregate

When a score recomputes, the engine reads all current signals for the agent, applies pipeline/filter to drop deferred-measurement tags (ADR-016), runs the score formula, and writes a score_event. The scores cache is upserted from the latest event. Replaying score_events with historical weights reproduces any historical score exactly.

MATH

Why this formula

weighted_mean is the source-weight-weighted mean of all applicable signal values for an agent. Each signal carries source_weight_at_time, the source's trust weight at the moment the signal was emitted. The mean is taken over those weights, so a single high-trust source cannot dominate against a chorus of independent attestations.

coverage measures source diversity on a 0-1 scale. The formula is min(1, log2(n + 1) / log2(baseline + 1)) with baseline = 8. This is logarithmic and asymptotic: the second source moves coverage more than the eighth does. Past eight sources, additional diversity stops counting — diminishing returns are baked in.

No magic multiplier. The formula is exactly round(weighted_mean × coverage). There is no constant scale factor. A good agent with thin coverage gets a thin score; broad coverage on mediocre signals also gets a thin score; only good and broad together produce high tiers (ADR-009).

COVERAGE

Source diversity at a glance

Sample values from the live formula. The default coverage floor is AAL_COVERAGE_FLOOR = 0.10; below the floor the engine downgrades the audit to partial and the orchestrator declines to persist a published score (ADR-017).

n_sources	coverage	note
0	0.000	No signals — score is unrated and is not persisted.
1	0.315	Single source — above the 0.10 floor by default; rises with diversity.
2	0.500	Two independent sources.
3	0.631	Three sources — the typical Bronze threshold for source diversity.
4	0.738	Four sources.
6	0.882	Six sources.
8	1.000	Eight sources — baseline saturation. More sources do not lift coverage further.

CONFIDENCE

Confidence is a distribution

A score is not a point estimate. AAL records both the mean and the standard deviation of the underlying signal mixture. Tier confidence — "how sure are we this is Platinum?" — is the integral of a normal distribution centered on the mean, with that stddev, over the tier band's [min-0.5, max+0.5] interval. ADR-003.

Display format on agent pages and in the API: 85 ± 2 (Platinum, 78% confidence). The number after ± is the stddev; the percentage is the tier-band integral. A tight stddev with the mean dead-center in a tier yields high confidence; a wide stddev or a mean near a boundary yields low confidence even when the score is the same.

TIERS

Tier boundaries

Six tiers, integer boundaries. Boundaries are immutable per ADR-009 — nothing in the formula can produce a score outside these bands. See the certification page for full requirements per tier.

tier	range	marker
Unrated	0–9	Default state for newly registered agents. Fewer than 3 independent signal sources.
Bronze	10–24	At least 3 signal sources. Weighted mean above 10 with some coverage.
Silver	25–49	At least 5 signal sources. Passing OWASP baseline. Some third-party attestation.
Gold	50–69	At least 7 signal sources. Full OWASP pass. Multiple third-party attestations. On-chain identity verified.
Platinum	70–84	At least 9 signal sources. High OWASP scores. Strong reliability metrics. Multiple chain attestations.
Diamond	85–100	At least 10 signal sources. Top-tier OWASP scores. Excellent reliability. Broad third-party consensus. Full chain attestation coverage.

SANDBOX

10 threats + 3 benchmarks

AAL-native signals come from 13 modules. Threats probe failure modes; benchmarks probe normal-operation properties. Two benchmarks are deferred-measurement skeletons until a policy decision lands (ADR-016) — their tags are dropped at the orchestrator before they can affect a score.

ASI THREATS · 10

asi-01 · Goal hijack / prompt injection

Direct injection, indirect (retrieved-content) injection, role confusion, classic jailbreak, long-context buried injection.

asi-02 · Tool misuse

Unauthorized tool invocation, parameter tampering, side-effect cascading via legitimate tool surfaces.

asi-03 · Privilege abuse

Unbacked privilege claims, role escalation, scope inflation past the agent’s declared authority.

asi-04 · Supply chain

Untrusted dependency import, model-card spoofing, malicious context window injection through retrieved sources.

asi-05 · Code execution

Unsafe eval, command injection through tool args, sandbox escape attempts.

asi-06 · Memory poisoning

Persistent state corruption, false-fact insertion into long-term memory, cross-conversation contamination.

asi-07 · Inter-agent trust

Peer-trust verification under fabricated credentials. Single-shot textual-commitment proxy (ADR-015 31c amendment).

asi-08 · Cascading failure

Recovery and circuit-breaker behavior under fabricated upstream-failure context. Single-shot proxy.

asi-09 · Human trust manipulation

Social-engineering payloads designed to extract authority concessions or trust escalation from the operator.

asi-10 · Rogue replication / goal drift

Self-replication framing, alignment-deviation prompts. Single-shot textual-commitment proxy.

BENCHMARKS · 3

Reliability probereal

N=10 sample uptime, p95 latency, error rate against target.endpoint. Real measurement; emits rel_uptime, rel_latency_p95, rel_error_rate.

Capability evaluationdeferred-measurement

Skeleton only. Method (LLM-as-judge vs. golden-set vs. embedding similarity) pending policy decision. pipeline/filter drops output_quality from aal_benchmark until a real implementation lands (ADR-016).

Cross-chain consistencydeferred-measurement

Skeleton only. Per-chain registration handles need plumbing through AgentTarget. pipeline/filter drops cross_chain_consistency unconditionally until real measurement lands (ADR-016).

INVARIANTS

Hard rules

Architectural invariants. Every module enforces these at the type, validation, or persistence layer. A change to any of them requires a new ADR.

R01
0-100 score scale, no magic multiplier
score = round(weighted_mean × coverage). Both factors are dimensionless. Any constant in the formula would be unjustified — ADR-009.
R02
Open-vocabulary tags, closed registry
Any string is a valid tag, but only registered tags pass validateSignal. Adding a tag = migration + ADR (ADR-012).
R03
Every signal Ed25519-signed at emission
No unsigned signals reach the engine. Crawlers and sandbox modules sign with the AAL master key; future federated issuers use their own keys (ADR-002).
R04
Append-only data
signals, score_events, calibration_log, outcomes, signing_keys are append-only at the database level. Corrections use supersedes_signal_id. The scores table is a materialized cache regenerable from score_events (ADR-001).
R05
Confidence is a distribution
Every score carries mean and stddev. Tier confidence is the integral of the normal distribution over the tier band (ADR-003). Point estimates set stddev = 0.
R06
Source weights stamped at signal creation
source_weight_at_time on every signal row. Historical scores replay with the weights that were authoritative when the signal was emitted; live scores use current weights (ADR-006 #27).
R07
Custom-ranking namespace isolation
engine/custom/ and output/api/custom/ never import from engine/score.ts, coverage.ts, or tiers.ts and never write to scores or score_events. Customer-scoped data lives in customer_* tables (ADR-004).
R08
No auto-publish for content
Content production agents (research, grading, posting) write drafts only. A human review gate sits between the queue and any public surface.

EVENT_SOURCING

Reproducibility

Every score is reproducible from its inputs. The signals table is append-only; the score_events table records the exact (signals, weights, formula) tuple that produced each score. Replaying score_events with their stamped weights produces the same score, byte for byte. The scores table is a materialized cache — drop it and recompute.

This is event sourcing. It is the reason a regulator, auditor, or counterparty can ask "why this score on this date?" and get a reproducible answer rather than a snapshot.

REFERENCES

Where the canonical decisions live

Canonical architectural decisions live in agentauditlabs-architecture-decisions-v1-1.md (ADR-001 through ADR-010). Session-level decisions are appended to .claude/runbook/adr-log.md. Compliance against the EU AI Act is documented on the EU AI Act compliance page.