Tangled Recommendation Engine#
A standalone Python/FastAPI service that powers Tangled's Discover feature: given a
user's DID it returns repo + issue recommendations. It reads README/issue embeddings
(precomputed with Gemini gemini-embedding-001, 1536-dim) from the shared
Postgres + pgvector database and reranks them.
The Tangled appview connects over HTTP via TANGLED_DISCOVER_ENDPOINT → see
schema.md for the wire contract and API.md for all endpoints.
How it works#
For each repo the user already works on (owned ∪ collaborations), we run an independent pgvector kNN over README embeddings. Results are merged — a candidate several of the user's repos point at ranks higher (consensus) — then deduped (forks), floored, and reranked (similarity + consensus + recency) with a round-robin guard so one busy interest can't bury the others. The user's own work is excluded. Issues use the same flow over open-issue embeddings.
Layout#
app/ FastAPI app + config + db + the pipeline stages
merge.py dedup.py rank.py profile.py links.py # pure, unit-tested
db.py recommend.py schemas.py main.py
tests/ pytest unit tests (+ env-gated integration)
eval/ offline hold-out eval harness (recall@k / nDCG)
reference/ the validated Node .mjs scripts this engine was ported from (oracle)
Configuration (env / .env)#
| Var | Required | Default | Notes |
|---|---|---|---|
DB_CONNECTION_STRING |
yes | — | Shared Postgres. sslmode=require is added automatically. |
TANGLED_WEB_BASE |
no | https://tangled.org |
Base for generated repo URLs. |
REC_PER_SEED_LIMIT |
no | 25 |
kNN neighbours per seed. |
REC_DISTANCE_FLOOR |
no | 0.30 |
Drop repo matches above this cosine distance. |
REC_ISSUE_DISTANCE_FLOOR |
no | 0.40 |
Floor for issue matches. |
REC_MIN_README_CHARS |
no | 120 |
Drop near-empty READMEs as seeds + candidates (filters test/throwaway repos). 0 disables. |
REC_QUERY_WORKERS |
no | 8 |
Concurrent per-seed kNN queries. The DB is remote, so this cuts request latency. |
REC_MAX_REPOS / REC_MAX_ISSUES |
no | 40 |
Caps per section. |
Run locally#
uv venv --python 3.12 .venv
uv pip install --python .venv -e ".[dev]"
.venv/bin/python -m uvicorn app.main:app --reload --port 8000
# smoke
curl 'localhost:8000/health'
curl 'localhost:8000/recommendations?handle=did:plc:y7g2koy4nqw7434s67fgfjca'
# interactive docs: http://localhost:8000/docs
Test#
.venv/bin/python -m pytest tests/ # unit (no DB)
RUN_DB_TESTS=1 .venv/bin/python -m pytest tests/ # + integration (needs DB_CONNECTION_STRING)
.venv/bin/python eval/harness.py # offline recall@k / nDCG
Deploy#
docker build -t tangled-rec .
docker run -p 8000:8000 --env-file .env tangled-rec
Cloud SQL access: the shared DB only accepts authorized IPs. Add the host's egress IP:
gcloud sql instances patch <instance> --authorized-networks=$(curl -s ifconfig.me)
Point the appview at the deployment: TANGLED_DISCOVER_ENDPOINT=https://<host>/recommendations.