`embed`

Drains pending embedding_jobs rows by calling the configured embedding provider and writing vectors to message_embeddings.

#Usage

discrawl embed
discrawl embed --limit 1000
discrawl embed --rebuild --limit 1000

#Flags

--limit <n> - cap how many jobs this run drains
--batch-size <n> - provider request batch size
--rebuild - regenerate vectors for the existing archive after a provider/model/input-version change

#Behavior

claims jobs with a short lock so overlapping runs do not process the same batch
rate limits requeue the batch and stop that drain run cleanly
provider or validation failures retry up to three attempts before the job is marked failed
messages with no normalized text are marked done and any stale vector for that message is removed

#Identity

Provider, model, and input version are stored on each job and vector. Changing any of them retargets pending jobs to the new identity and resets prior attempts. Existing vectors for another identity remain in SQLite but are not used by semantic search.

#When to use `--rebuild`

After changing [search.embeddings] provider, model, or any input setting, when you want to regenerate vectors for messages already in the archive.

#Pairing with `sync`

sync --with-embeddings enqueues; embed drains. The two phases are intentionally separate so a slow provider does not block the hot sync path.

embed