This repository has no description
0

Configure Feed

Select the types of activity you want to include in your feed.

at main 4 folders 12 files
README.md

Tangled Recommendation Engine#

A standalone Python/FastAPI service that powers Tangled's Discover feature: given a user's DID it returns repo + issue recommendations. It reads README/issue embeddings (precomputed with Gemini gemini-embedding-001, 1536-dim) from the shared Postgres + pgvector database and reranks them.

The Tangled appview connects over HTTP via TANGLED_DISCOVER_ENDPOINT → see schema.md for the wire contract and API.md for all endpoints.

How it works#

For each repo the user already works on (owned ∪ collaborations), we run an independent pgvector kNN over README embeddings. Results are merged — a candidate several of the user's repos point at ranks higher (consensus) — then deduped (forks), floored, and reranked (similarity + consensus + recency) with a round-robin guard so one busy interest can't bury the others. The user's own work is excluded. Issues use the same flow over open-issue embeddings.

Layout#

app/        FastAPI app + config + db + the pipeline stages
  merge.py dedup.py rank.py profile.py links.py   # pure, unit-tested
  db.py recommend.py schemas.py main.py
tests/      pytest unit tests (+ env-gated integration)
eval/       offline hold-out eval harness (recall@k / nDCG)
reference/  the validated Node .mjs scripts this engine was ported from (oracle)

Configuration (env / .env)#

Var Required Default Notes
DB_CONNECTION_STRING yes Shared Postgres. sslmode=require is added automatically.
TANGLED_WEB_BASE no https://tangled.org Base for generated repo URLs.
REC_PER_SEED_LIMIT no 25 kNN neighbours per seed.
REC_DISTANCE_FLOOR no 0.30 Drop repo matches above this cosine distance.
REC_ISSUE_DISTANCE_FLOOR no 0.40 Floor for issue matches.
REC_MIN_README_CHARS no 120 Drop near-empty READMEs as seeds + candidates (filters test/throwaway repos). 0 disables.
REC_QUERY_WORKERS no 8 Concurrent per-seed kNN queries. The DB is remote, so this cuts request latency.
REC_MAX_REPOS / REC_MAX_ISSUES no 40 Caps per section.

Run locally#

uv venv --python 3.12 .venv
uv pip install --python .venv -e ".[dev]"
.venv/bin/python -m uvicorn app.main:app --reload --port 8000
# smoke
curl 'localhost:8000/health'
curl 'localhost:8000/recommendations?handle=did:plc:y7g2koy4nqw7434s67fgfjca'
# interactive docs: http://localhost:8000/docs

Test#

.venv/bin/python -m pytest tests/         # unit (no DB)
RUN_DB_TESTS=1 .venv/bin/python -m pytest tests/   # + integration (needs DB_CONNECTION_STRING)
.venv/bin/python eval/harness.py          # offline recall@k / nDCG

Deploy#

docker build -t tangled-rec .
docker run -p 8000:8000 --env-file .env tangled-rec

Cloud SQL access: the shared DB only accepts authorized IPs. Add the host's egress IP:

gcloud sql instances patch <instance> --authorized-networks=$(curl -s ifconfig.me)

Point the appview at the deployment: TANGLED_DISCOVER_ENDPOINT=https://<host>/recommendations.