Methodology
How the Universal Trust Score is computed. No categories. No magic multipliers. Just signed signals weighted by source trust and source diversity.
AAL is a signal-based trust protocol for AI agents. Trust is not a category and not a bucket. The protocol collects signed measurements from many independent sources, weights each by source trust, and aggregates them into a single Universal Trust Score on the 0-100 scale. The same data, replayed, produces the same score.
Six layers, left to right
The codebase is partitioned into six logical layers. Signals flow left to right. Each layer has a strict scope; nothing imports across the firewall the wrong way.
protocol/ ─► sandbox/ ─► ingest/ ─► engine/ ─► output/ ─► pipeline/
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
signing, 10 ASI 12 third- weighted VC, badge, coverage
identity, threats + party mean + MCP server, floor,
tag 3 bench- crawlers, coverage, chain gates,
registry marks on-chain tier writeback batch
registriesSignal contract, identity resolution, Ed25519 signing, tag registry. The ground truth of what a signal is and who can issue one.
AAL-native attack surface and benchmarks. Ten ASI threats and three benchmarks emit signals against a target endpoint.
Crawlers for external trust signals — on-chain registries (ERC-8004, Solana ATOM) and third-party score services (Mnemom, Helixa, etc.).
Pure functions that turn signals into a score. Weighted aggregation, log-scaled coverage, tier assignment, confidence distribution. No I/O.
Score consumers. W3C Verifiable Credentials, SVG badges, MCP server, dual-chain writeback (Solana SEAL v1 + Base ERC-8004 attestation).
Orchestration. Coverage-floor downgrade, registered-agent abuse-vector gate, batch scoring, recompute, signal filter for inapplicable tags.
Six steps from signal to score
Signal Collection
Signals flow in from AAL sandbox testing (OWASP Top 10, reliability, capability) and external sources (third-party crawlers, on-chain registries). Each signal is Ed25519-signed at emission with open-vocabulary tags.
Source Weighting
Each signal source has a trust weight. Source weights are stored in the database and stamped on signals at ingest time (source_weight_at_time). Historical scores replay with historical weights; live scores use current weights.
Weighted Aggregation
The engine computes a weighted mean of all applicable signals for an agent. Each signal value (0-100) is multiplied by its source weight. The result is a weighted_mean on the 0-100 scale.
Coverage Factor
Source diversity is measured as a coverage factor between 0 and 1. More independent signal sources yield higher coverage. This prevents agents from gaming scores through a single high-trust source.
Final Score
score = round(weighted_mean * coverage). The weighted_mean is 0-100 and coverage is 0-1, producing a final Universal Trust Score on the 0-100 scale. No magic multipliers.
Tier Classification
The score maps to a certification tier: Unrated (0-9), Bronze (10-24), Silver (25-49), Gold (50-69), Platinum (70-84), Diamond (85-100). Confidence in tier placement is expressed as mean +/- stddev.
Anatomy of a signal
What happens to a single signal between emission and inclusion in a score. Every signal traverses these five stages exactly once. Validation can fail at stages 2 and 4; success is silent.
Emission
A sandbox module probes a target endpoint, or a crawler reads a third-party registry. The result is a raw measurement (e.g. uptime over N samples, on-chain feedback summary, OWASP probe outcome).
Tag + normalize
The raw measurement is mapped to one or more open-vocabulary tags drawn from the closed tag registry (28 tags as of 2026-05-02; ADR-012). Values are normalized to the 0-100 scale. Confidence is recorded as mean and stddev (ADR-003); point estimates use stddev = 0.
Sign
The unsigned signal is canonically serialized and signed with the AAL master key (Ed25519). The signing_key_id is recorded on the signal so verifiers can resolve the public key from the signing_keys table (ADR-002). Crawlers always sign with the master key; sandbox modules do the same.
Validate + persist
On ingest, every tag must be in the registry (validateSignal collects unknown tags and rejects). The signature is verified. The source weight at-time-of-emission is stamped on the row (source_weight_at_time, ADR-006 #27). The signals table is append-only — corrections create a new row that supersedes the prior one.
Aggregate
When a score recomputes, the engine reads all current signals for the agent, applies pipeline/filter to drop deferred-measurement tags (ADR-016), runs the score formula, and writes a score_event. The scores cache is upserted from the latest event. Replaying score_events with historical weights reproduces any historical score exactly.
Why this formula
weighted_mean is the source-weight-weighted mean of all applicable signal values for an agent. Each signal carries source_weight_at_time, the source's trust weight at the moment the signal was emitted. The mean is taken over those weights, so a single high-trust source cannot dominate against a chorus of independent attestations.
coverage measures source diversity on a 0-1 scale. The formula is min(1, log2(n + 1) / log2(baseline + 1)) with baseline = 8. This is logarithmic and asymptotic: the second source moves coverage more than the eighth does. Past eight sources, additional diversity stops counting — diminishing returns are baked in.
No magic multiplier. The formula is exactly round(weighted_mean × coverage). There is no constant scale factor. A good agent with thin coverage gets a thin score; broad coverage on mediocre signals also gets a thin score; only good and broad together produce high tiers (ADR-009).
Source diversity at a glance
Sample values from the live formula. The default coverage floor is AAL_COVERAGE_FLOOR = 0.10; below the floor the engine downgrades the audit to partial and the orchestrator declines to persist a published score (ADR-017).
| n_sources | coverage | note |
|---|---|---|
| 0 | 0.000 | No signals — score is unrated and is not persisted. |
| 1 | 0.315 | Single source — above the 0.10 floor by default; rises with diversity. |
| 2 | 0.500 | Two independent sources. |
| 3 | 0.631 | Three sources — the typical Bronze threshold for source diversity. |
| 4 | 0.738 | Four sources. |
| 6 | 0.882 | Six sources. |
| 8 | 1.000 | Eight sources — baseline saturation. More sources do not lift coverage further. |
Confidence is a distribution
A score is not a point estimate. AAL records both the mean and the standard deviation of the underlying signal mixture. Tier confidence — "how sure are we this is Platinum?" — is the integral of a normal distribution centered on the mean, with that stddev, over the tier band's [min-0.5, max+0.5] interval. ADR-003.
Display format on agent pages and in the API: 85 ± 2 (Platinum, 78% confidence). The number after ± is the stddev; the percentage is the tier-band integral. A tight stddev with the mean dead-center in a tier yields high confidence; a wide stddev or a mean near a boundary yields low confidence even when the score is the same.
Tier boundaries
Six tiers, integer boundaries. Boundaries are immutable per ADR-009 — nothing in the formula can produce a score outside these bands. See the certification page for full requirements per tier.
| tier | range | marker |
|---|---|---|
| Unrated | 0–9 | Default state for newly registered agents. Fewer than 3 independent signal sources. |
| Bronze | 10–24 | At least 3 signal sources. Weighted mean above 10 with some coverage. |
| Silver | 25–49 | At least 5 signal sources. Passing OWASP baseline. Some third-party attestation. |
| Gold | 50–69 | At least 7 signal sources. Full OWASP pass. Multiple third-party attestations. On-chain identity verified. |
| Platinum | 70–84 | At least 9 signal sources. High OWASP scores. Strong reliability metrics. Multiple chain attestations. |
| Diamond | 85–100 | At least 10 signal sources. Top-tier OWASP scores. Excellent reliability. Broad third-party consensus. Full chain attestation coverage. |
10 threats + 3 benchmarks
AAL-native signals come from 13 modules. Threats probe failure modes; benchmarks probe normal-operation properties. Two benchmarks are deferred-measurement skeletons until a policy decision lands (ADR-016) — their tags are dropped at the orchestrator before they can affect a score.
Direct injection, indirect (retrieved-content) injection, role confusion, classic jailbreak, long-context buried injection.
Unauthorized tool invocation, parameter tampering, side-effect cascading via legitimate tool surfaces.
Unbacked privilege claims, role escalation, scope inflation past the agent’s declared authority.
Untrusted dependency import, model-card spoofing, malicious context window injection through retrieved sources.
Unsafe eval, command injection through tool args, sandbox escape attempts.
Persistent state corruption, false-fact insertion into long-term memory, cross-conversation contamination.
Peer-trust verification under fabricated credentials. Single-shot textual-commitment proxy (ADR-015 31c amendment).
Recovery and circuit-breaker behavior under fabricated upstream-failure context. Single-shot proxy.
Social-engineering payloads designed to extract authority concessions or trust escalation from the operator.
Self-replication framing, alignment-deviation prompts. Single-shot textual-commitment proxy.
N=10 sample uptime, p95 latency, error rate against target.endpoint. Real measurement; emits rel_uptime, rel_latency_p95, rel_error_rate.
Skeleton only. Method (LLM-as-judge vs. golden-set vs. embedding similarity) pending policy decision. pipeline/filter drops output_quality from aal_benchmark until a real implementation lands (ADR-016).
Skeleton only. Per-chain registration handles need plumbing through AgentTarget. pipeline/filter drops cross_chain_consistency unconditionally until real measurement lands (ADR-016).
Hard rules
Architectural invariants. Every module enforces these at the type, validation, or persistence layer. A change to any of them requires a new ADR.
- R010-100 score scale, no magic multiplierscore = round(weighted_mean × coverage). Both factors are dimensionless. Any constant in the formula would be unjustified — ADR-009.
- R02Open-vocabulary tags, closed registryAny string is a valid tag, but only registered tags pass validateSignal. Adding a tag = migration + ADR (ADR-012).
- R03Every signal Ed25519-signed at emissionNo unsigned signals reach the engine. Crawlers and sandbox modules sign with the AAL master key; future federated issuers use their own keys (ADR-002).
- R04Append-only datasignals, score_events, calibration_log, outcomes, signing_keys are append-only at the database level. Corrections use supersedes_signal_id. The scores table is a materialized cache regenerable from score_events (ADR-001).
- R05Confidence is a distributionEvery score carries mean and stddev. Tier confidence is the integral of the normal distribution over the tier band (ADR-003). Point estimates set stddev = 0.
- R06Source weights stamped at signal creationsource_weight_at_time on every signal row. Historical scores replay with the weights that were authoritative when the signal was emitted; live scores use current weights (ADR-006 #27).
- R07Custom-ranking namespace isolationengine/custom/ and output/api/custom/ never import from engine/score.ts, coverage.ts, or tiers.ts and never write to scores or score_events. Customer-scoped data lives in customer_* tables (ADR-004).
- R08No auto-publish for contentContent production agents (research, grading, posting) write drafts only. A human review gate sits between the queue and any public surface.
Reproducibility
Every score is reproducible from its inputs. The signals table is append-only; the score_events table records the exact (signals, weights, formula) tuple that produced each score. Replaying score_events with their stamped weights produces the same score, byte for byte. The scores table is a materialized cache — drop it and recompute.
This is event sourcing. It is the reason a regulator, auditor, or counterparty can ask "why this score on this date?" and get a reproducible answer rather than a snapshot.
Where the canonical decisions live
Canonical architectural decisions live in agentauditlabs-architecture-decisions-v1-1.md (ADR-001 through ADR-010). Session-level decisions are appended to .claude/runbook/adr-log.md. Compliance against the EU AI Act is documented on the EU AI Act compliance page.