Git-backed snapshots
Discrawl can publish the SQLite archive as sharded, compressed NDJSON snapshots in a private Git repo, then auto-import that repo before local read commands. This gives readers org memory without Discord credentials.
#Publisher
discrawl publish --remote https://github.com/example/discord-archive.git --push
discrawl publish --readme path/to/discord-backup/README.md --push
The publisher uses your existing bot-synced archive. It exports non-DM tables only.
#Subscriber
discrawl subscribe https://github.com/example/discord-archive.git
discrawl search "launch checklist"
discrawl messages --channel general --hours 24
subscribe is the Git-only setup path. It writes a config with discord.token_source = "none", imports the snapshot, and does not require a Discord bot token. sync and tail remain disabled in this mode because they need live Discord access.
#Auto-update
Once share.remote is configured, read commands auto-fetch and import when the local share import is older than share.stale_after (default 15m):
discrawl subscribe --stale-after 15m https://github.com/example/discord-archive.git
discrawl subscribe --no-auto-update https://github.com/example/discord-archive.git
discrawl update forces the same pull/import step manually.
discrawl sync does not auto-import the share unless --update=auto or --update=force is provided, so routine live refreshes stay fast.
#Hybrid mode
Keep normal Discord credentials configured and set share.remote:
discrawl sync --update=auto # import snapshot first, then live deltas
discrawl messages --sync # blocking pre-query sync for matched scope
discrawl sync --all-channels # broader live repair
discrawl sync --full # historical backfill
#What is published
- non-DM archive tables (DM
@merows are always excluded) - README activity block - latest update time, latest archived message, archive totals, day/week/month activity
embedding_jobsis never exported
#Backing up vectors
discrawl publish --with-embeddings --push
discrawl subscribe --with-embeddings https://github.com/example/discord-archive.git
discrawl update --with-embeddings
Stored under embeddings/<provider>/<model>/<input_version>/.... Import only restores matching identities; Ollama/nomic subscribers do not accidentally pick up OpenAI/text-embedding vectors. Publishing without --with-embeddings omits embedding manifests instead of carrying forward an older bundle.
#CI
The Docker smoke test installs discrawl in a clean Go container, subscribes to a Git snapshot repo, then checks search, messages, sql, and report:
DISCRAWL_DOCKER_TEST=1 go test ./internal/cli -run TestDockerGitSourceSmoke -count=1
The backup workflows restore and save .discrawl-ci/discrawl.db with actions/cache. On a warm runner cache, scheduled publishers skip the pre-sync snapshot import and go straight to the live latest-message delta before publishing. Cache misses still import the latest published snapshot first so --latest-only has channel cursors to resume from.