···17251725 };
17261726```
1727172717281728+# Bobbin
17291729+17301730+Bobbin is an API appview for Tangled records. It serves XRPC
17311731+endpoints for `sh.tangled.*`, with it you can get repos,
17321732+issues, pulls, comments, follows, stars, labels, pipelines,
17331733+and profiles. It is read-only, there is no auth, since that
17341734+should all be handled direct-to-PDS and knot respectively.
17351735+17361736+**Bobbin has no permanent storage**.
17371737+17381738+It is only a glorified edge index, in the graph theory
17391739+sense. Additionally it has a record cache, re-filled on
17401740+demand. All other data that Bobbin serves comes live from
17411741+PDSes & knots.
17421742+17431743+## What Bobbin needs
17441744+17451745+The way that Bobbin is able to pull off being
17461746+so stateless is by moving state upstream.
17471747+Primarily it depends on an instance of
17481748+[Hydrant](https://tangled.org/did:plc:6v3ul2ptnqctyxwkz5ti4amn)
17491749+, which is the service that gives an event stream
17501750+for Bobbin to quickly backfill from on every restart.
17511751+Backfilling ought to take less than a couple of minutes
17521752+maximum. If the upstream instance of Hydrant fails
17531753+while Bobbin is live, its list/count endpoints stop
17541754+advancing and report a stale cursor. Single-lookups
17551755+will continue working, due to the second dependency:
17561756+[Slingshot](https://tangled.org/did:plc:c7mc2fn47ihdihul4vjwsuy3/tree/main/slingshot).
17571757+Slingshot fetches individual records & resolves identities.
17581758+If the upstream instance of Slingshot fails, single-lookups
17591759+will fail with a `502` error. There are some aggregation
17601760+endpoints that use Slingshot for hydrating, which will also
17611761+fail.
17621762+17631763+A soft dependency that ought to exist for Bobbin to operate
17641764+correctly is simply the plethora of knots that are out
17651765+there, that Bobbin talks to directly for git data and, for
17661766+knots at v1.15+, members & collaborators.
17671767+17681768+## Building Bobbin
17691769+17701770+Bobbin is under [Tangled's core monorepo, under bobbin/](https://tangled.org/did:plc:j5hmlfdrwkvtxm7cjmu7j2is/tree/master/bobbin).
17711771+Here's an easy local debug-build:
17721772+17731773+```sh
17741774+cargo build -p bobbin
17751775+```
17761776+17771777+Bobbin loves being in a container. When using
17781778+`bobbin/containerfiles/bobbin.Containerfile`, it runs `cargo
17791779+build --release --bin bobbin --package bobbin` within a
17801780+little Debian runtime, exposing port 8090.
17811781+17821782+## Configuration
17831783+17841784+The best way to configure Bobbin is via a toml config file.
17851785+There's an `example.toml` in [Bobbin's subdir](https://tangled.org/did:plc:j5hmlfdrwkvtxm7cjmu7j2is/blob/master/bobbin/example.toml).
17861786+Every value is overridable by a `BOBBIN_*` env var.
17871787+The load order is env, then `--config <path>`, then
17881788+`/etc/bobbin/config.toml`, then built-in defaults.
17891789+17901790+Load and check a config without starting the server:
17911791+17921792+```sh
17931793+bobbin --config config.toml validate
17941794+```
17951795+17961796+Minimal config is the two upstream URLs. The hydrant URL
17971797+takes `ws://` or `wss://`. An `http://` or `https://`
17981798+URL is rewritten to the matching websocket scheme at
17991799+connection-time.
18001800+18011801+```toml
18021802+[server]
18031803+binds = ["127.0.0.1:8090"]
18041804+18051805+# Loopback-only & can leave empty to disable debug introspection.
18061806+debug_bind = "127.0.0.1:8091"
18071807+18081808+[hydrant]
18091809+url = "https://hydrant.example.com"
18101810+18111811+[slingshot]
18121812+url = "https://slingshot.example.com"
18131813+```
18141814+18151815+> 🦪 Lewis
18161816+>
18171817+> At time of writing, we (Tangled) don't host public
18181818+> instances of Hydrant or Slingshot. You will have to
18191819+> find public instances or spin these up yourself! :P
18201820+18211821+Take a gander in the project's example.toml for an
18221822+exhaustive list of things to configure.
18231823+18241824+You will discover fun things such as a configurable adaptive
18251825+loop that watches the cgroup memory limit & throttles heavy
18261826+requests under pressure. It only works if it detects a
18271827+cgroup limit is present. The config for that is in the
18281828+`[backpressure]` block of the config template.
18291829+18301830+## Running Bobbin
18311831+18321832+Start the server using a config toml:
18331833+18341834+```bash
18351835+bobbin --config config.toml
18361836+```
18371837+Bobbin wakes up in a cold sweat and immediately gets to
18381838+work:
18391839+1. It binds its listeners, connects to the Hydrant stream
18401840+ in the background.
18411841+2. It serves requests from the first
18421842+ moment it's alive, even before the Hydrant stream connects
18431843+ or finishes catching up. Having a cold Hydrant itself
18441844+ costs only latency and approximate counts.
18451845+18461846+## The API
18471847+18481848+**Single lookups** take a record's AT-URI.
18491849+18501850+- `getRepo` takes the repo URI:
18511851+18521852+```sh
18531853+curl "$BOBBIN/xrpc/sh.tangled.repo.getRepo?repo=at://did:plc:boltless/sh.tangled.repo/squid"
18541854+```
18551855+```json
18561856+{
18571857+ "uri": "at://did:plc:boltless/sh.tangled.repo/squid",
18581858+ "cid": "bafyrei...",
18591859+ "value": { "$type": "sh.tangled.repo", "knot": "knot1.tangled.sh", "description": "...", "createdAt": "..." }
18601860+}
18611861+```
18621862+18631863+- `getProfile` takes the full profile record URI, so a bare
18641864+ handle or DID will not resolve:
18651865+18661866+```sh
18671867+curl "$BOBBIN/xrpc/sh.tangled.actor.getProfile?actor=at://did:plc:boltless/sh.tangled.actor.profile/self"
18681868+```
18691869+18701870+- If Slingshot cannot serve the record, the response is `502`:
18711871+18721872+```json
18731873+{ "error": "UpstreamFailed", "message": "upstream unavailable: ..." }
18741874+```
18751875+18761876+**Aggregation** endpoints come in `list*` and `count*` pairs,
18771877+each with a `*By` sibling, and require a `subject` query param.
18781878+18791879+- `listRepos` and `countRepos` key on the owner DID:
18801880+18811881+```sh
18821882+curl "$BOBBIN/xrpc/sh.tangled.repo.countRepos?subject=did:plc:boltless"
18831883+```
18841884+```json
18851885+{ "count": 7, "distinctAuthors": 1 }
18861886+```
18871887+18881888+```sh
18891889+curl "$BOBBIN/xrpc/sh.tangled.repo.listRepos?subject=did:plc:boltless&limit=3"
18901890+```
18911891+```json
18921892+{ "items": [ { "uri": "at://did:plc:boltless/sh.tangled.repo/squid", "cid": "bafyrei...", "value": { } } ], "cursor": null }
18931893+```
18941894+18951895+- Bobbin validates the subject per collection. Here a repo URI
18961896+ is passed where a bare DID is required, so the call returns a
18971897+ `400`:
18981898+18991899+```sh
19001900+curl "$BOBBIN/xrpc/sh.tangled.graph.listFollows?subject=at://did:plc:boltless/sh.tangled.repo/squid"
19011901+```
19021902+```json
19031903+{ "error": "InvalidRequest", "message": "invalid request: subject must be a bare did, got at-uri with collection sh.tangled.repo" }
19041904+```
19051905+19061906+**Search** is a single endpoint over an in-mem full-text
19071907+index:
19081908+19091909+```sh
19101910+curl "$BOBBIN/xrpc/sh.tangled.search.query?q=tangled&limit=2"
19111911+```
19121912+```json
19131913+{ "hits": [ { "uri": "at://...", "cid": "...", "nsid": "sh.tangled.repo", "score": 27.1, "value": { } } ], "cursor": null }
19141914+```
19151915+19161916+**Git data** such as blob, tree, diff, log, and archive proxies
19171917+straight to the repo's knot, streamed back without caching.
19181918+19191919+## Coverage and warm-up
19201920+19211921+- While the edge index is catching up from Hydrant,
19221922+ the aggregation count is a lower bound & may still climb.
19231923+- One endpoint reports how far along the backfill it is:
19241924+19251925+```sh
19261926+curl "$BOBBIN/xrpc/sh.tangled.bobbin.getCoverage"
19271927+```
19281928+19291929+While warming up:
19301930+19311931+```json
19321932+{ "ready": false, "eventsProcessed": 45588, "lastCursor": 51658 }
19331933+```
19341934+19351935+Once caught up, Bobbin flips to ready:
19361936+19371937+```json
19381938+{ "ready": true, "eventsProcessed": 106085, "lastCursor": 116527 }
19391939+```
19401940+19411941+If starting up Hydrant for the first time, Hydrant itself
19421942+will take a decent while (a couple of hours) to backfill
19431943+from PDSes. Hydrant stores its backfill on disk. Bobbin
19441944+restart reaches `ready` in minutes by replaying event from
19451945+an already-populated Hydrant. If your Hydrant is new, expect
19461946+Bobbin to backfill in that same couple of hours that Hydrant
19471947+takes.
19481948+19491949+## Loose ends and not-gonna-impl
19501950+19511951+- **No coverage signal for per-knot rosters yet.**
19521952+ Coverage tracks the hydrant stream only. A v1.15 knot
19531953+ that is unreachable serves a stale or empty member set
19541954+ with nothing to flag it.
19551955+- **Knot eventstream fan-out isn't pooled.**
19561956+ Bobbin opens one websocket per v1.15
19571957+ knot on top of the hydrant subscription. A network with
19581958+ thousands of knots wants pooling or a shared subscription.
19591959+- **No sequential issue or PR numbers.** bobbin returns rkeys,
19601960+ not `#42` style ids like the web appview. A client
19611961+ deriving a display number does it from creation order. But
19621962+ why bother? rkeys are the IDs.
19631963+17281964# Hacking on Tangled
1729196517301966We highly recommend [installing