Tangled Recommendation Engine#

A standalone Python/FastAPI service that powers Tangled's Discover feature: given a user's DID it returns repo + issue recommendations. It reads README/issue embeddings (precomputed with Gemini gemini-embedding-001, 1536-dim) from the shared Postgres + pgvector database and reranks them.

The Tangled appview connects over HTTP via TANGLED_DISCOVER_ENDPOINT → see schema.md for the wire contract and API.md for all endpoints.

How it works#

For each repo the user already works on (owned ∪ collaborations), we run an independent pgvector kNN over README embeddings. Results are merged — a candidate several of the user's repos point at ranks higher (consensus) — then deduped (forks), floored, and reranked (similarity + consensus + recency) with a round-robin guard so one busy interest can't bury the others. The user's own work is excluded. Issues use the same flow over open-issue embeddings.

Layout#

app/        FastAPI app + config + db + the pipeline stages
  merge.py dedup.py rank.py profile.py links.py   # pure, unit-tested
  db.py recommend.py schemas.py main.py
tests/      pytest unit tests (+ env-gated integration)
eval/       offline hold-out eval harness (recall@k / nDCG)
reference/  the validated Node .mjs scripts this engine was ported from (oracle)

Configuration (env / `.env`)#

Var	Required	Default	Notes
`DB_CONNECTION_STRING`	yes	—	Shared Postgres. `sslmode=require` is added automatically.
`TANGLED_WEB_BASE`	no	`https://tangled.org`	Base for generated repo URLs.
`REC_PER_SEED_LIMIT`	no	`25`	kNN neighbours per seed.
`REC_DISTANCE_FLOOR`	no	`0.30`	Drop repo matches above this cosine distance.
`REC_ISSUE_DISTANCE_FLOOR`	no	`0.40`	Floor for issue matches.
`REC_MIN_README_CHARS`	no	`120`	Drop near-empty READMEs as seeds + candidates (filters test/throwaway repos). `0` disables.
`REC_QUERY_WORKERS`	no	`8`	Concurrent per-seed kNN queries. The DB is remote, so this cuts request latency.
`REC_MAX_REPOS` / `REC_MAX_ISSUES`	no	`40`	Caps per section.

Run locally#

uv venv --python 3.12 .venv
uv pip install --python .venv -e ".[dev]"
.venv/bin/python -m uvicorn app.main:app --reload --port 8000
# smoke
curl 'localhost:8000/health'
curl 'localhost:8000/recommendations?handle=did:plc:y7g2koy4nqw7434s67fgfjca'
# interactive docs: http://localhost:8000/docs

Test#

.venv/bin/python -m pytest tests/         # unit (no DB)
RUN_DB_TESTS=1 .venv/bin/python -m pytest tests/   # + integration (needs DB_CONNECTION_STRING)
.venv/bin/python eval/harness.py          # offline recall@k / nDCG

Deploy#

docker build -t tangled-rec .
docker run -p 8000:8000 --env-file .env tangled-rec

Cloud SQL access: the shared DB only accepts authorized IPs. Add the host's egress IP:

gcloud sql instances patch <instance> --authorized-networks=$(curl -s ifconfig.me)

Point the appview at the deployment: TANGLED_DISCOVER_ENDPOINT=https://<host>/recommendations.

Configure Feed