Configuration

discrawl init writes a complete config so most users do not hand-edit anything initially. This page documents the full shape and override rules for when you do.

#Default paths

Discrawl follows the normal per-OS storage convention instead of writing a new top-level directory in your home folder.

On Linux and other Unix desktops, Discrawl uses XDG Base Directory paths. In practice, that means:

config: ${XDG_CONFIG_HOME:-~/.config}/discrawl/config.toml
database: ${XDG_DATA_HOME:-~/.local/share}/discrawl/discrawl.db
Git share repo: ${XDG_DATA_HOME:-~/.local/share}/discrawl/share
cache: ${XDG_CACHE_HOME:-~/.cache}/discrawl/
logs: ${XDG_STATE_HOME:-~/.local/state}/discrawl/logs/

On macOS, Discrawl uses the platform's ~/Library locations:

config: ~/Library/Application Support/discrawl/config.toml
database: ~/Library/Application Support/discrawl/discrawl.db
Git share repo: ~/Library/Application Support/discrawl/share
cache: ~/Library/Caches/discrawl/
logs: ~/Library/Application Support/discrawl/logs/

If you set XDG_CONFIG_HOME, XDG_DATA_HOME, XDG_CACHE_HOME, or XDG_STATE_HOME, those variables choose the new default locations on any OS.

Upgrades do not move your database automatically. Existing installs are preserved before those new locations take over. If ~/.discrawl/config.toml exists and the new default config file does not, Discrawl keeps loading the legacy config. Missing runtime fields also keep using existing legacy files or directories under ~/.discrawl when the new location does not exist yet. This avoids breaking users whose desktop already sets XDG variables globally.

To migrate deliberately, copy or create the new config file first, or point Discrawl at it with --config / DISCRAWL_CONFIG. Runtime paths switch one-by-one: once the new database, cache, logs, or share path exists, that path wins over the legacy ~/.discrawl counterpart. Copy the SQLite database before creating the new database path if you want to preserve the existing archive.

#File layout

version = 1
default_guild_id = ""
guild_ids = []
db_path = "~/.local/share/discrawl/discrawl.db" # macOS: "~/Library/Application Support/discrawl/discrawl.db"
cache_dir = "~/.cache/discrawl" # macOS: "~/Library/Caches/discrawl"
log_dir = "~/.local/state/discrawl/logs" # macOS: "~/Library/Application Support/discrawl/logs"

[discord]
token_source = "env" # use "none" for Git-only read access
token_env = "DISCORD_BOT_TOKEN"
token_keyring_service = "discrawl"
token_keyring_account = "discord_bot_token"

[sync]
source = "both" # "discord" for bot-only sync, "wiretap" for desktop-cache-only import
concurrency = 16
repair_every = "6h"
full_history = true
attachment_text = true
attachment_media = false
max_attachment_bytes = 104857600

[desktop]
path = "~/.config/discord" # macOS default: "~/Library/Application Support/discord"
max_file_bytes = 67108864
full_cache = false

[search]
default_mode = "fts"

[search.embeddings]
enabled = false
provider = "openai"
model = "text-embedding-3-small"
api_key_env = "OPENAI_API_KEY"
batch_size = 64
max_input_chars = 12000
request_timeout = "2m"
vector_backend = "exact"

[share]
remote = ""
repo_path = "~/.local/share/discrawl/share" # macOS: "~/Library/Application Support/discrawl/share"
branch = "main"
auto_update = true
stale_after = "15m"
media = true

[share.filter]
public_only = false
include_channel_ids = []
exclude_channel_ids = []

[remote]
mode = "local" # use "cloud" for Worker-fronted remote archives
endpoint = ""
archive = ""
token_env = "DISCRAWL_REMOTE_TOKEN"
stale_after = ""

concurrency is auto-sized at init to min(32, max(8, GOMAXPROCS*2)).

#Token resolution

In order:

DISCORD_BOT_TOKEN, or the env var named in discord.token_env
OS keyring item discrawl / discord_bot_token, or the configured keyring service/account

discrawl accepts either raw token text or a value prefixed with Bot . Normalization is automatic.

Set discord.token_source = "keyring" if you want to require keyring lookup and skip env entirely. Set it to "none" for a Git-only reader.

#Override rules

--config <path> beats everything
DISCRAWL_CONFIG=<path> overrides the default config path
discord.token_source = "none" disables live Discord access for Git-only readers
discord.token_source = "keyring" skips env lookup
remote.mode = "cloud" makes status --json and remote ... read the configured Worker archive without opening SQLite
DISCRAWL_NO_AUTO_UPDATE=1 disables Git snapshot auto-update for read commands in one process

#Notes

default_guild_id is the implicit scope for sync, tail, digest, and analytics when --guild is not passed
guild_ids is reserved for explicit multi-guild fan-out; usually you do not set this directly
changing [search.embeddings] provider/model/input version retargets pending jobs and resets prior attempts; existing vectors for another identity remain in SQLite but are not used for semantic search
[search.embeddings].vector_backend accepts exact or optional turbovec; turbovec requires Python plus the turbovec package and embedding dimensions divisible by 8.
changing db_path does not migrate existing data; copy the file yourself if you want to keep history
sync.attachment_media = true makes sync behave like sync --with-media; media bytes are cached under cache_dir/media, and CDN 404/other fetch failures are recorded on attachment rows
share.media = false makes publish/update/auto-update omit or skip restoring cached media; subscribe --no-media writes this for Git-only readers. With the default share.media = true, publish/update include cached non-DM media as gzip-compressed snapshot files, but publish does not fetch missing Discord files by itself.
[share.filter] narrows only publish output; sync can still keep a richer local archive
share.filter.public_only exports only channels visible to the guild

@everyone role after category/channel permission overwrites; private threads are excluded

share.filter.include_channel_ids and

share.filter.exclude_channel_ids accept Discord channel ids; exclusions win, and including a forum parent also includes its allowed public threads

filtered publishes cannot write generated README reports, and remove older

generated Discrawl share READMEs before committing