Discord History Exporter Spec
A local read-only bot-token exporter is the right design. It should use the existing Discord bot credentials already configured for OpenClaw, enumerate channels and threads in guild 1482433153858535656, page message history directly from the Discord API, checkpoint progress, and render one deterministic markdown transcript. Do not use browser scraping, a user token, or an LLM-heavy extraction loop.
Key decisions:
- Build a local script/tool, not a manual
message.readworkflow. - Use the bot identity already authorized in the server, not Pete's user session.
- Store normalized raw JSON first, then render markdown, so the export is resumable and auditable.
- Handle threads explicitly, including archived threads where the bot can access them.
- Honor rate limits and add jittered pauses to keep the footprint conservative.
1. Problem Statement and Goal
Pete wants a shareable export of the last 21 days of conversation across the Vinny Discord server, including threads, with no redaction. The existing message tool path works for spot reads, but it is a bad fit for a truthful full-server export because it requires lots of manual pagination, is easy to interrupt, and makes it hard to certify completeness.
The goal is a local scripted exporter that can reliably generate a complete, read-only markdown transcript of a configurable time window from a Discord guild the bot already has access to.
2. Success Metric
This is successful when all of the following are true:
- A single command can export the last
Ndays of Discord history for guild1482433153858535656into a markdown file. - The export includes:
- all accessible text channels
- all accessible threads, including archived threads when exposed by the API
- timestamps, author names, message content, and attachment URLs
- The run is resumable after interruption.
- Pete can verify coverage by spot-checking at least three high-volume surfaces, including
#vinny,#consulting, and one thread. - The script uses no user token and no browser automation.
3. Current State
Today:
- OpenClaw already has Discord bot access configured. The Discord channel docs describe the bot-token setup and guild access model in OpenClaw Discord docs.
- The server/guild id is
1482433153858535656. - The current manual export attempt created a partial artifact at
artifacts/vinny_discord_server_export_last_21_days.md, but it is materially incomplete. - Main channels confirmed in scope include:
#job-search#consulting#openclaw-infra#openclaw-releases#job-alerts#x-news#uas-news#vinny#david#cron-status#lantronix#linkedin-engagement#ai-ventures#content-ideas- Threads exist and matter. Examples already observed include
LI senior roles,jobs,goals,Pete and Vinny overview,Chrome remote debugging relay,Job search skill improvements, andTechmeme / Tech News AI Experiments.
What is missing:
- A deterministic extraction path
- Resumability/checkpointing
- Full pagination across high-volume channels
- Clean thread handling
- A reliable markdown renderer
4. Platform Capabilities
OpenClaw supports Discord as a bot-connected channel and supports reading messages and enumerating threads through the message tool, documented in OpenClaw Discord docs. That is useful for interactive tasks, but it is not the ideal substrate for a full export utility.
Discord itself supports the needed primitives through the API:
- channel/message history endpoints in the Discord Channels resource docs
- thread behavior and archived thread handling in the Discord Threads docs
- paginated history constraints, including the 100-message page limit, reflected in discord.py API guidance
Native OpenClaw is enough to confirm access and test spot reads, but a full export requires a custom script/tool layer.
5. Community Patterns
The strongest community patterns are:
- Bot-token direct export with pagination
- Common, stable, and aligned with Discord's intended API usage.
- Matches the general guidance that history exports need iterative pagination because message-history endpoints cap at 100 items per request, as reflected in discord.py docs and community discussion such as r/Discord_Bots export-history thread.
- Third-party exporter tools like DiscordChatExporter
- Mature and high throughput, with support for guild-wide export and threads, as seen in DiscordChatExporter frontend references.
- Downsides: more external dependency surface, different formatting assumptions, and less control over the exact artifact output and checkpoint model.
- Browser/UI scraping
- Fragile and a bad fit for this use case.
- It is slower, less reliable, harder to resume, and more likely to create weird edge cases than direct API reads.
6. Options
| Option | Approach | Complexity | Token Cost | Reliability | Maintenance | Notes |
|---|---|---|---|---|---|---|
| A | Keep using OpenClaw message.read manually | Low code, high operator burden | Low | Low | Low code, high pain | Good for spot checks, bad for full exports |
| B | Build local Python bot-token exporter | Medium | Near-zero | High | Medium | Best balance of control, safety, and completeness |
| C | Adopt third-party exporter binary/tool | Low-medium | Near-zero | Medium-high | Medium-high | Faster to start, less tailored, more dependency risk |
Option A: Manual message.read export
Pros:
- No new code
- Uses existing access path
Cons:
- Laborious pagination
- Easy to miss channels/threads
- Hard to resume cleanly
- Hard to guarantee completeness
- Chat context gets noisy and brittle
Option B: Local Python bot-token exporter
Pros:
- Deterministic
- Read-only
- Resumable
- Easy to shape output exactly how Pete wants it
- Low model usage
- Best control over pacing, checkpointing, and rendering
Cons:
- Needs implementation work
- Must carefully handle Discord API pagination and rate limits
Option C: Third-party exporter
Pros:
- Fastest path to raw extraction
- Often supports guild-wide export and threads already
Cons:
- Adds trust and dependency surface
- May not match desired markdown structure
- Harder to integrate with OpenClaw workflow and local artifact conventions
7. Recommendation
Build Option B: a local Python bot-token exporter.
Why:
- It uses the bot identity already authorized for this server.
- It avoids browser scraping and user-token risk.
- It minimizes LLM involvement.
- It gives full control over output format, checkpointing, and pacing.
- It is the cleanest long-term tool for repeated export requests, not just this one job.
Opinionated call: do not build this as an LLM-driven workflow, and do not keep trying to brute-force it with message.read pages in chat.
8. Security Considerations
Access model
- Use the existing Discord bot token already configured for OpenClaw.
- Do not use Pete's user token.
- Do not scrape via the browser.
Data exposure
- Output is intentionally unredacted for this request, so the script must make that explicit in the header.
- Because the export may contain confidential or internal content, default output location should remain inside the workspace
artifacts/directory.
Failure modes
- Partial export if interrupted mid-run
- Missing archived/private threads if the bot lacks access
- Duplicates if pagination/checkpoint logic is wrong
- Large markdown output becoming unwieldy
Mitigations
- Checkpoint per channel/thread
- Store normalized JSON as intermediate data
- Idempotent reruns
- Coverage summary at end of run
- Rate-limit compliance with conservative pacing and Retry-After handling
9. Implementation Scope
David should build the following:
New files
/Users/vinny/.openclaw/workspace/scripts/discord-export-history.py/Users/vinny/.openclaw/workspace/scripts/discord-export-history.mdor a short usage reference inreferences/
Output locations
- Markdown artifact:
/Users/vinny/.openclaw/workspace/artifacts/vinny_discord_server_export_last_21_days.md- Optional raw/intermediate data:
/Users/vinny/.openclaw/workspace/artifacts/tmp/discord-export/<run-id>/messages.jsonl/Users/vinny/.openclaw/workspace/artifacts/tmp/discord-export/<run-id>/checkpoint.json
Script responsibilities
- Read bot token from existing local OpenClaw config or environment reference.
- Enumerate accessible guild channels for a provided guild id.
- Enumerate threads, including archived threads where exposed.
- Fetch message history per surface with pagination.
- Filter to the last
--days Nwindow. - Normalize messages into a stable intermediate schema.
- Render markdown grouped by channel and nested thread sections.
- Write a run summary with counts per channel/thread and any gaps.
- Support resume via checkpoint file.
Command-line interface
Proposed CLI:
python3 scripts/discord-export-history.py \
--guild-id 1482433153858535656 \
--days 21 \
--include-threads \
--output artifacts/vinny_discord_server_export_last_21_days.md \
--json-out artifacts/tmp/discord-export/latest/messages.jsonl \
--checkpoint artifacts/tmp/discord-export/latest/checkpoint.jsonMessage normalization schema
Each normalized message should include:
message_idtimestamp_utctimestamp_localauthor_nameauthor_idchannel_namechannel_idthread_nameif applicablethread_idif applicablecontentattachments[]embeds[]simplifiedreply_to_message_idif presentreactions[]optionaljump_urlif derivable
10. Validation Criteria
David should verify all of the following before calling it done:
- Smoke test
- Export 1 day from
#cron-statusand verify readable markdown output.
- High-volume test
- Export 7 days from
#consultingand verify multi-page pagination works.
- Thread test
- Export one known thread and verify thread messages appear under the correct parent section.
- Resume test
- Interrupt a run midway, rerun with the same checkpoint, and verify it resumes instead of restarting from scratch.
- Coverage summary test
- End-of-run summary reports channels processed, threads processed, message counts, and any inaccessible surfaces.
- Read-only test
- No write/mutation calls to Discord are made.
11. Category
Tool
This is a new local CLI/data-export capability, not a behavioral tweak or a skill.
12. Context Loading
Load these files when building or maintaining the exporter:
- Always load:
/Users/vinny/.openclaw/workspace/AGENTS.md/Users/vinny/.openclaw/workspace/ARTIFACT-GUIDE.md/Users/vinny/.openclaw/workspace/references/spec-template.md- For Discord/OpenClaw context:
/opt/homebrew/lib/node_modules/openclaw/docs/channels/discord.md- For environment-specific values:
/Users/vinny/.openclaw/workspace/TOOLS.md- Only if token/config resolution details are needed:
~/.openclaw/openclaw.json
Do not load broad memory files for this tool unless the task changes from implementation to user-context interpretation.
13. Guardrails
- Do not use Pete's Discord user token.
- Do not use browser scraping as the primary extraction path.
- Do not mutate Discord state in any way.
- Do not silently omit inaccessible channels or threads. Report them.
- Do not claim completeness unless the coverage summary supports it.
- Do not hardcode secrets into the script or artifact.
- Do not depend on LLM summarization for extraction or correctness.
- Do not flatten thread messages into main channel flow without labeling them.
- Do not make redaction decisions automatically for this tool. Redaction is a separate post-process.
14. Handoff
David's final handoff should include:
- The implemented script path
- Exact command used to run it
- The generated markdown artifact path
- A short verification summary:
- channels exported
- threads exported
- message count
- any gaps or inaccessible surfaces
- A plain-English note to Pete on whether the export is complete enough to share as-is
Preferred delivery format:
- primary: artifact link using the Mission Control Tailscale URL
- secondary: short Discord reply summarizing coverage and any gaps