Sunstead trust scoring project
1# Content Tower (Tier 1: frozen embeddings + calibrated head) — build plan
2
3Self-contained build doc. You can clear the conversation and hand this to a fresh
4agent. It captures the plan **and** the facts discovered while exploring the live data,
5so nothing here depends on chat history.
6
7---
8
9## 0. What this is
10
11The PRD fuses two independent signals through a **monotone gate** (not an average):
12
13- **Tower A — identity trust** (per-DID, sybil-resistant, **load-bearing**): EigenTrust
14 over the vouch graph. Already built (`eigentrust.py`).
15- **Tower B — content risk** (per-PR, **identity-blind**): how risky *this diff* is,
16 judged with no knowledge of the author. Today this is only Claude (`review.py`) at the
17 gate, plus an advisory slop-kNN. **This doc builds the learned Tower B.**
18
19Tier 1 = run each diff through the **already-wired embedding transformer**
20(Featherless Qwen3-Embedding) and train a small **calibrated head** on `clean_merge`,
21using only the diff. Transformer representation power, no fine-tuning, leakage-free by
22construction (the model never sees identity).
23
24Tier 2 (fine-tuning a code transformer) is **deferred** until there are ~10³–10⁴ labeled
25diffs — below that it loses to frozen-embeddings + a linear head. Not in scope here.
26
27### Non-negotiable constraints (PRD) — must hold for every phase
28- Content models judge **content, never identity**: no author handle/DID/history/aggregates
29 feed Tower B. Diff + PR-intrinsic stats (size, files, discussion length) only.
30- Structural signal (Tower A / EigenTrust) stays load-bearing and sybil-resistant.
31- The gate is **not an average**: content can only **penalize**, never lift an untrusted DID
32 into the fast lane.
33- Calibrated + explainable. Serve a learned model only if it **beats its baseline** on a
34 proper holdout (same rule the GNN already follows).
35- Serviceless: single DuckDB file, large artifacts under `DATA_ROOT`.
36
37---
38
39## 1. Critical path
40
41```
42Phase 0: fetch diffs (patchBlobs) ─┐
43 ├─► Phase 2 embed ─► Phase 3 head ─► Phase 4 fuse ─► Phase 5 eval-gate
44Phase 1: merged labels ────────────┘
45```
46
47**Phase 0 + Phase 1 are prerequisites for ANY content model** (transformer or not) and for
48Claude review and slop-kNN — all three are dead without diffs. Start at Phase 0.
49
50---
51
52## 2. Live data facts (as observed)
53
54- **Live DB**: `/Volumes/spectrofi-rec/tangled-data/duckdb/trust.duckdb`
55 (`DATA_ROOT=/Volumes/spectrofi-rec/tangled-data`). The repo-local `.data/…` is a stale dev DB — ignore it.
56- Backfill is rich but the derived/label layer is **stale** (it ran before some `derive()`
57 branches existed). Snapshot:
58 - events 83,991 · contributors 10,848 · **vouches 2,029 (+) / 37 (−)** · pulls 5,768
59 - `seeds` = **0**, `pull_status` = **0** (collection was not in the old backfill — not even archived),
60 `stars` = 0 (14,409 `feed.star` events archived but not re-derived), `diff_text` = **0**
61 (patchBlobs never fetched), **0 positive `clean_merge` labels**, no trained model.
62- **Read the live DB read-only with retry** (single-writer; a held lock blocks every open):
63 ```python
64 import duckdb, time
65 con=None
66 for _ in range(80):
67 try: con=duckdb.connect("/Volumes/spectrofi-rec/tangled-data/duckdb/trust.duckdb", read_only=True); break
68 except duckdb.IOException: time.sleep(0.3)
69 ```
70 Pause `ingest`/`api`/`backfill` before writing, or writes crawl on the lock.
71
72### Record shapes you'll need (confirmed from the network)
73
74`sh.tangled.repo.pull` (the diff is a gzipped blob, NOT inline):
75```json
76{ "rounds": [ { "createdAt": "...",
77 "patchBlob": { "$type": "blob", "ref": { "$link": "<CID>" },
78 "mimeType": "application/gzip", "size": 49502 } } ],
79 "source": { "branch": "..." },
80 "target": { "branch": "...", "repo": "did:plc:…", "repoDid": "did:plc:…" } }
81```
82- `pr_id` convention (set in `ingest.derive`): `f"{author_did}/{collection}/{rkey}"`,
83 e.g. `did:plc:X/sh.tangled.repo.pull/3mp…`.
84- The **latest round** (`rounds[-1]`) is the final proposed change — embed/review that.
85
86`sh.tangled.repo.pull.status` (authoritative outcome, public; sparse):
87```json
88{ "pull": "at://did:plc:X/sh.tangled.repo.pull/<rkey>",
89 "status": "sh.tangled.repo.pull.status.merged" } // .merged / .closed / .open
90```
91- Status author may differ from the pull owner — parse `pr_id` from the `pull` field
92 (`uri[len("at://"):]`), never from the status record's own did/rkey. (`derive()` already does this.)
93
94Knot git clone URL (for the Phase-1 label backstop, git-on-knots): `https://{knot}/{owner_did}/{repo}`
95(https, no auth for public repos; `git ls-remote` returns `refs/heads/main`).
96
97### Existing code to build on
98- `src/trust/embed.py` — `index_diffs(con, limit=256)` already embeds every `pull_requests.diff_text`
99 into `diff_vectors(pr_id, label, embedding DOUBLE[])`, idempotent/resumable; `embed()` returns
100 `None` without `FEATHERLESS_API_KEY`; `slop_score()` cosine-kNN vs `clean_merge=0`.
101- `src/trust/backfill.py` — reuse `_pds(did)`, `_get(url)`, `_records(pds,did,coll)`, the
102 `ThreadPoolExecutor` fan-out pattern, `_archive_and_derive`.
103- `src/trust/db.py` — `pull_requests.diff_text`, `pull_status`, `diff_vectors`, `pr_labels`
104 view (`clean_merge`), `connection(read_only=…)`, `ensure_schema()`.
105- `src/trust/ingest.py` — `derive()` (pull / pull_status / star branches).
106- `src/trust/learned.py` — copy its shape: `FEATURE_COLS`, `_vec`, `train(split)`,
107 `LearnedScorer`, isotonic calibration, `_reliability`, `MODEL_PATH = MODEL_DIR/…`.
108- `src/trust/fusion.py` — `score_pr`, `decide`, `should_review`, `_features_for`.
109- `src/trust/config.py` — `CFG.embed` (Featherless), `CFG.review`, `MODEL_DIR`.
110
111---
112
113## 3. Phases
114
115### Phase 0 — Fetch the diffs (new `src/trust/diffs.py`)
116The highest-leverage unblock: lights up the content head, Claude review, **and** slop-kNN.
117
118Steps:
1191. Select pulls needing a diff: `SELECT pr_id, author_did, record(from events) FROM pull_requests WHERE diff_text IS NULL`.
120 The CID lives in the archived `events.record` JSON (`rounds[-1].patchBlob.ref.$link`); join
121 `events` on `(did, collection, rkey)` or re-read it.
1222. For each: resolve `_pds(author_did)`, then
123 `GET {pds}/xrpc/com.atproto.sync.getBlob?did={author_did}&cid={cid}` → bytes.
1243. `gzip.decompress(bytes).decode("utf-8", "replace")` → unified-diff text. Cap stored length
125 (~50 KB; embeddings/Claude truncate anyway). `UPDATE pull_requests SET diff_text=? WHERE pr_id=?`.
1264. Parallelize like `backfill`: network fetch in a 12-thread pool, DB writes in chunks (single writer).
127 Skip missing/oversized blobs gracefully (never abort the run).
128
129Deliverable: `pull_requests.diff_text` populated for ~5,768 PRs (minutes of network).
130Self-check: a `demo()` that fetches one known blob and asserts it gunzips to text containing `diff`/`@@`.
131
132### Phase 1 — Merged labels (you need a positive class)
133- Targeted scrape of `sh.tangled.repo.pull.status` (already mapped in `COLLECTION_KINDS` and handled
134 in `derive()`): `python -m trust.backfill --collection sh.tangled.repo.pull.status` (capped first
135 with `--max-repos`).
136- **Measure positives** before building the head:
137 `SELECT clean_merge, count(*) FROM pr_labels GROUP BY 1`.
138- **Risk:** pull.status is sparse. If positives are only tens, the head is data-starved too.
139 Backstop = **git-on-knots `merged` detection** (clone default branch via the knot URL above,
140 check whether each pull's patch landed) for broad `merged` coverage + `reverted`/`re-patched`.
141 Only build the backstop if pull.status coverage proves insufficient.
142
143Deliverable: `pr_labels.clean_merge` with a real positive class (need ≥ a few hundred ideally;
144the trainer requires ≥4 rows spanning both classes as a hard floor).
145
146### Phase 2 — Embed the diffs (frozen transformer)
147- Set `FEATHERLESS_API_KEY`. Run `index_diffs` to caught-up (loop while it returns > 0):
148 ```python
149 from trust.db import connection, ensure_schema
150 from trust import embed
151 ensure_schema()
152 with connection(read_only=False) as con:
153 while embed.index_diffs(con, limit=256): pass
154 ```
155- Optional GPU: self-host Qwen3-Embedding-4B (fits one GPU) to embed ~6k diffs locally for free
156 instead of the API. The head itself is CPU-trivial.
157
158Deliverable: `diff_vectors` filled for every PR with a diff.
159
160### Phase 3 — The calibrated head (new `src/trust/content.py`, Tower B)
161- `_xy(con)`: `X` = `diff_vectors.embedding` for PRs that have a non-NULL `clean_merge`
162 (join `pr_labels`); `y` = `clean_merge`. Optionally concat **PR-intrinsic** scalars
163 (`additions, deletions, files_touched, discussion_len`) and the slop-kNN similarity.
164 **Never** identity/author features.
165- **Model: L2-normalize the embedding → logistic regression (linear probe, L2-reg) → isotonic
166 or Platt calibration.** Linear probe is correct for frozen embeddings at low data; LightGBM on
167 raw 2560-dim embeddings overfits — keep it only as an alt.
168- Time-split train/val (order by `opened_at`). Save `content.pkl` under `MODEL_DIR`.
169- `ContentScorer.prob(pr_id) -> P(content safe)`; expose `content_risk = 1 - P`.
170- Self-check `demo()`: on held-out PRs, a known-bad diff scores higher risk than a clean one;
171 print the reliability curve.
172
173Deliverable: a calibrated content risk for **every** PR (cheap, no API), not just reviewed ones.
174
175### Phase 4 — Fuse into the gate (monotone, unchanged)
176- In `fusion.score_pr`: the head supplies `content_risk` for all PRs; Claude (`review_pr`, gated by
177 `should_review`) refines ambiguous/sensitive ones. Combine conservatively:
178 `content_risk = max(model_risk, claude_risk)` so content still only **penalizes**.
179- Win: every PR gets a content signal; today only the Claude-reviewed subset does.
180- Keep `decide()` and its thresholds; surface the head's risk in the explanation
181 (`build_reason`) like the other factors.
182
183### Phase 5 — Eval + beat-the-baseline gate
184- Calibration: reliability curve (reuse `learned._reliability`). Ranking: AUC / average precision.
185- **Serve only if it beats**: (a) majority-class, (b) Claude-alone risk where available,
186 (c) slop-kNN alone — on a **time-split AND a repo-holdout** (generalize to unseen repos).
187- Write a verdict (like `gnn` does); `fusion` consults it before using the head.
188
189---
190
191## 4. Effort & runtime
192
193| Phase | Build | Runtime |
194|---|---|---|
195| 0 diffs (`diffs.py`) | ~1 hr | few min (network) |
196| 1 labels (scrape) | wired | ~10 min capped |
197| 2 embed (`index_diffs`) | done | few min (API) |
198| 3 head (`content.py`) | ~1 hr | seconds |
199| 4 fuse (`fusion.py`) | ~30 min | — |
200| 5 eval-gate | ~30 min | seconds |
201
202≈ half a day of build + minutes of runtime, given `FEATHERLESS_API_KEY` and enough Phase-1 positives.
203
204## 5. GPU guidance
205- **Tier 1 needs no GPU** — embedding runs on Featherless (remote); the head is CPU-trivial.
206- Use a GPU now only to **self-host Qwen3-Embedding-4B** for free bulk embedding of ~6k diffs
207 (skip API cost/limits).
208- Save the GPU for **Tier 2** (fine-tuning CodeBERT/StarEncoder) — deferred until ~10³–10⁴
209 labeled diffs exist.
210
211## 6. Definition of done
212- `diffs.py` populates `diff_text`; `pr_labels` has a positive class; `diff_vectors` filled.
213- `content.py` trains a calibrated head, identity-blind, with a reliability curve.
214- It **beats** majority / Claude-alone / slop-kNN on a time + repo holdout, else it doesn't serve.
215- `fusion` consumes it monotonically (content only penalizes); explanation shows the content factor.
216- Smoke test added (mirror `tests/test_smoke.py` style: `importorskip` the embedding path; assert a
217 bad diff out-risks a clean one).
218
219## 7. Parallel unblock (not this tower, but the other gating item)
220Structural scoring is still blocked by **`seeds = 0`** + stale derives. Independent of Tower B:
2211. `--rederive` from archived `events` (no network) → repopulates `stars` (and any archived
222 collections) through the current `derive()`.
2232. Seed real maintainer DIDs — top vouch-receivers are the anchors:
224 `did:plc:onu3oqfahfubgbetlr4giknc` (141 in), `did:plc:wshs7t2adsemcrrd4snkeqli` (89),
225 `did:plc:qfpnj4og54vl56wngdriaxug` (56)… → `INSERT INTO seeds …`.
2263. `trust-train` once labels (Phase 1) exist.
227EigenTrust (Tower A) and the content head (Tower B) can be built in either order; the gate needs both.
228```