Content Tower (Tier 1: frozen embeddings + calibrated head) — build plan#

Self-contained build doc. You can clear the conversation and hand this to a fresh agent. It captures the plan and the facts discovered while exploring the live data, so nothing here depends on chat history.

0. What this is#

The PRD fuses two independent signals through a monotone gate (not an average):

Tower A — identity trust (per-DID, sybil-resistant, load-bearing): EigenTrust over the vouch graph. Already built (eigentrust.py).
Tower B — content risk (per-PR, identity-blind): how risky this diff is, judged with no knowledge of the author. Today this is only Claude (review.py) at the gate, plus an advisory slop-kNN. This doc builds the learned Tower B.

Tier 1 = run each diff through the already-wired embedding transformer (Featherless Qwen3-Embedding) and train a small calibrated head on clean_merge, using only the diff. Transformer representation power, no fine-tuning, leakage-free by construction (the model never sees identity).

Tier 2 (fine-tuning a code transformer) is deferred until there are ~10³–10⁴ labeled diffs — below that it loses to frozen-embeddings + a linear head. Not in scope here.

Non-negotiable constraints (PRD) — must hold for every phase#

Content models judge content, never identity: no author handle/DID/history/aggregates feed Tower B. Diff + PR-intrinsic stats (size, files, discussion length) only.
Structural signal (Tower A / EigenTrust) stays load-bearing and sybil-resistant.
The gate is not an average: content can only penalize, never lift an untrusted DID into the fast lane.
Calibrated + explainable. Serve a learned model only if it beats its baseline on a proper holdout (same rule the GNN already follows).
Serviceless: single DuckDB file, large artifacts under DATA_ROOT.

1. Critical path#

Phase 0: fetch diffs (patchBlobs) ─┐
                                   ├─► Phase 2 embed ─► Phase 3 head ─► Phase 4 fuse ─► Phase 5 eval-gate
Phase 1: merged labels ────────────┘

Phase 0 + Phase 1 are prerequisites for ANY content model (transformer or not) and for Claude review and slop-kNN — all three are dead without diffs. Start at Phase 0.

2. Live data facts (as observed)#

Live DB: /Volumes/spectrofi-rec/tangled-data/duckdb/trust.duckdb (DATA_ROOT=/Volumes/spectrofi-rec/tangled-data). The repo-local .data/… is a stale dev DB — ignore it.
Backfill is rich but the derived/label layer is stale (it ran before some derive() branches existed). Snapshot:
- events 83,991 · contributors 10,848 · vouches 2,029 (+) / 37 (−) · pulls 5,768
- seeds = 0, pull_status = 0 (collection was not in the old backfill — not even archived), stars = 0 (14,409 feed.star events archived but not re-derived), diff_text = 0 (patchBlobs never fetched), 0 positive clean_merge labels, no trained model.

Read the live DB read-only with retry (single-writer; a held lock blocks every open):

import duckdb, time
con=None
for _ in range(80):
    try: con=duckdb.connect("/Volumes/spectrofi-rec/tangled-data/duckdb/trust.duckdb", read_only=True); break
    except duckdb.IOException: time.sleep(0.3)

Pause ingest/api/backfill before writing, or writes crawl on the lock.

Record shapes you'll need (confirmed from the network)#

sh.tangled.repo.pull (the diff is a gzipped blob, NOT inline):

{ "rounds": [ { "createdAt": "...",
    "patchBlob": { "$type": "blob", "ref": { "$link": "<CID>" },
                   "mimeType": "application/gzip", "size": 49502 } } ],
  "source": { "branch": "..." },
  "target": { "branch": "...", "repo": "did:plc:…", "repoDid": "did:plc:…" } }

pr_id convention (set in ingest.derive): f"{author_did}/{collection}/{rkey}", e.g. did:plc:X/sh.tangled.repo.pull/3mp….
The latest round (rounds[-1]) is the final proposed change — embed/review that.

sh.tangled.repo.pull.status (authoritative outcome, public; sparse):

{ "pull": "at://did:plc:X/sh.tangled.repo.pull/<rkey>",
  "status": "sh.tangled.repo.pull.status.merged" }  // .merged / .closed / .open

Status author may differ from the pull owner — parse pr_id from the pull field (uri[len("at://"):]), never from the status record's own did/rkey. (derive() already does this.)

Knot git clone URL (for the Phase-1 label backstop, git-on-knots): https://{knot}/{owner_did}/{repo} (https, no auth for public repos; git ls-remote returns refs/heads/main).

Existing code to build on#

src/trust/embed.py — index_diffs(con, limit=256) already embeds every pull_requests.diff_text into diff_vectors(pr_id, label, embedding DOUBLE[]), idempotent/resumable; embed() returns None without FEATHERLESS_API_KEY; slop_score() cosine-kNN vs clean_merge=0.
src/trust/backfill.py — reuse _pds(did), _get(url), _records(pds,did,coll), the ThreadPoolExecutor fan-out pattern, _archive_and_derive.
src/trust/db.py — pull_requests.diff_text, pull_status, diff_vectors, pr_labels view (clean_merge), connection(read_only=…), ensure_schema().
src/trust/ingest.py — derive() (pull / pull_status / star branches).
src/trust/learned.py — copy its shape: FEATURE_COLS, _vec, train(split), LearnedScorer, isotonic calibration, _reliability, MODEL_PATH = MODEL_DIR/….
src/trust/fusion.py — score_pr, decide, should_review, _features_for.
src/trust/config.py — CFG.embed (Featherless), CFG.review, MODEL_DIR.

3. Phases#

Phase 0 — Fetch the diffs (new `src/trust/diffs.py`)#

The highest-leverage unblock: lights up the content head, Claude review, and slop-kNN.

Steps:

Select pulls needing a diff: SELECT pr_id, author_did, record(from events) FROM pull_requests WHERE diff_text IS NULL. The CID lives in the archived events.record JSON (rounds[-1].patchBlob.ref.$link); join events on (did, collection, rkey) or re-read it.
For each: resolve _pds(author_did), then GET {pds}/xrpc/com.atproto.sync.getBlob?did={author_did}&cid={cid} → bytes.
gzip.decompress(bytes).decode("utf-8", "replace") → unified-diff text. Cap stored length (~50 KB; embeddings/Claude truncate anyway). UPDATE pull_requests SET diff_text=? WHERE pr_id=?.
Parallelize like backfill: network fetch in a 12-thread pool, DB writes in chunks (single writer). Skip missing/oversized blobs gracefully (never abort the run).

Deliverable: pull_requests.diff_text populated for ~5,768 PRs (minutes of network). Self-check: a demo() that fetches one known blob and asserts it gunzips to text containing diff/@@.

Phase 1 — Merged labels (you need a positive class)#

Targeted scrape of sh.tangled.repo.pull.status (already mapped in COLLECTION_KINDS and handled in derive()): python -m trust.backfill --collection sh.tangled.repo.pull.status (capped first with --max-repos).
Measure positives before building the head: SELECT clean_merge, count(*) FROM pr_labels GROUP BY 1.
Risk: pull.status is sparse. If positives are only tens, the head is data-starved too. Backstop = git-on-knots merged detection (clone default branch via the knot URL above, check whether each pull's patch landed) for broad merged coverage + reverted/re-patched. Only build the backstop if pull.status coverage proves insufficient.

Deliverable: pr_labels.clean_merge with a real positive class (need ≥ a few hundred ideally; the trainer requires ≥4 rows spanning both classes as a hard floor).

Phase 2 — Embed the diffs (frozen transformer)#

Set FEATHERLESS_API_KEY. Run index_diffs to caught-up (loop while it returns > 0):

from trust.db import connection, ensure_schema
from trust import embed
ensure_schema()
with connection(read_only=False) as con:
    while embed.index_diffs(con, limit=256): pass

Optional GPU: self-host Qwen3-Embedding-4B (fits one GPU) to embed ~6k diffs locally for free instead of the API. The head itself is CPU-trivial.

Deliverable: diff_vectors filled for every PR with a diff.

Phase 3 — The calibrated head (new `src/trust/content.py`, Tower B)#

_xy(con): X = diff_vectors.embedding for PRs that have a non-NULL clean_merge (join pr_labels); y = clean_merge. Optionally concat PR-intrinsic scalars (additions, deletions, files_touched, discussion_len) and the slop-kNN similarity. Never identity/author features.
Model: L2-normalize the embedding → logistic regression (linear probe, L2-reg) → isotonic or Platt calibration. Linear probe is correct for frozen embeddings at low data; LightGBM on raw 2560-dim embeddings overfits — keep it only as an alt.
Time-split train/val (order by opened_at). Save content.pkl under MODEL_DIR.
ContentScorer.prob(pr_id) -> P(content safe); expose content_risk = 1 - P.
Self-check demo(): on held-out PRs, a known-bad diff scores higher risk than a clean one; print the reliability curve.

Deliverable: a calibrated content risk for every PR (cheap, no API), not just reviewed ones.

Phase 4 — Fuse into the gate (monotone, unchanged)#

In fusion.score_pr: the head supplies content_risk for all PRs; Claude (review_pr, gated by should_review) refines ambiguous/sensitive ones. Combine conservatively: content_risk = max(model_risk, claude_risk) so content still only penalizes.
Win: every PR gets a content signal; today only the Claude-reviewed subset does.
Keep decide() and its thresholds; surface the head's risk in the explanation (build_reason) like the other factors.

Phase 5 — Eval + beat-the-baseline gate#

Calibration: reliability curve (reuse learned._reliability). Ranking: AUC / average precision.
Serve only if it beats: (a) majority-class, (b) Claude-alone risk where available, (c) slop-kNN alone — on a time-split AND a repo-holdout (generalize to unseen repos).
Write a verdict (like gnn does); fusion consults it before using the head.

4. Effort & runtime#

Phase	Build	Runtime
0 diffs (`diffs.py`)	~1 hr	few min (network)
1 labels (scrape)	wired	~10 min capped
2 embed (`index_diffs`)	done	few min (API)
3 head (`content.py`)	~1 hr	seconds
4 fuse (`fusion.py`)	~30 min	—
5 eval-gate	~30 min	seconds

≈ half a day of build + minutes of runtime, given FEATHERLESS_API_KEY and enough Phase-1 positives.

5. GPU guidance#

Tier 1 needs no GPU — embedding runs on Featherless (remote); the head is CPU-trivial.
Use a GPU now only to self-host Qwen3-Embedding-4B for free bulk embedding of ~6k diffs (skip API cost/limits).
Save the GPU for Tier 2 (fine-tuning CodeBERT/StarEncoder) — deferred until ~10³–10⁴ labeled diffs exist.

6. Definition of done#

diffs.py populates diff_text; pr_labels has a positive class; diff_vectors filled.
content.py trains a calibrated head, identity-blind, with a reliability curve.
It beats majority / Claude-alone / slop-kNN on a time + repo holdout, else it doesn't serve.
fusion consumes it monotonically (content only penalizes); explanation shows the content factor.
Smoke test added (mirror tests/test_smoke.py style: importorskip the embedding path; assert a bad diff out-risks a clean one).

7. Parallel unblock (not this tower, but the other gating item)#

Structural scoring is still blocked by seeds = 0 + stale derives. Independent of Tower B:

--rederive from archived events (no network) → repopulates stars (and any archived collections) through the current derive().
Seed real maintainer DIDs — top vouch-receivers are the anchors: did:plc:onu3oqfahfubgbetlr4giknc (141 in), did:plc:wshs7t2adsemcrrd4snkeqli (89), did:plc:qfpnj4og54vl56wngdriaxug (56)… → INSERT INTO seeds ….
trust-train once labels (Phase 1) exist. EigenTrust (Tower A) and the content head (Tower B) can be built in either order; the gate needs both.

Configure Feed