Mission Control

Artifacts

K
← Back to artifacts

Discord History Exporter Spec

OtherDraftCreated Apr 16, 20269 min readFull screen ↗

A local read-only bot-token exporter is the right design. It should use the existing Discord bot credentials already configured for OpenClaw, enumerate channels and threads in guild 1482433153858535656, page message history directly from the Discord API, checkpoint progress, and render one deterministic markdown transcript. Do not use browser scraping, a user token, or an LLM-heavy extraction loop.

Key decisions:

  • Build a local script/tool, not a manual message.read workflow.
  • Use the bot identity already authorized in the server, not Pete's user session.
  • Store normalized raw JSON first, then render markdown, so the export is resumable and auditable.
  • Handle threads explicitly, including archived threads where the bot can access them.
  • Honor rate limits and add jittered pauses to keep the footprint conservative.

1. Problem Statement and Goal

Pete wants a shareable export of the last 21 days of conversation across the Vinny Discord server, including threads, with no redaction. The existing message tool path works for spot reads, but it is a bad fit for a truthful full-server export because it requires lots of manual pagination, is easy to interrupt, and makes it hard to certify completeness.

The goal is a local scripted exporter that can reliably generate a complete, read-only markdown transcript of a configurable time window from a Discord guild the bot already has access to.

2. Success Metric

This is successful when all of the following are true:

  • A single command can export the last N days of Discord history for guild 1482433153858535656 into a markdown file.
  • The export includes:
  • all accessible text channels
  • all accessible threads, including archived threads when exposed by the API
  • timestamps, author names, message content, and attachment URLs
  • The run is resumable after interruption.
  • Pete can verify coverage by spot-checking at least three high-volume surfaces, including #vinny, #consulting, and one thread.
  • The script uses no user token and no browser automation.

3. Current State

Today:

  • OpenClaw already has Discord bot access configured. The Discord channel docs describe the bot-token setup and guild access model in OpenClaw Discord docs.
  • The server/guild id is 1482433153858535656.
  • The current manual export attempt created a partial artifact at artifacts/vinny_discord_server_export_last_21_days.md, but it is materially incomplete.
  • Main channels confirmed in scope include:
  • #job-search
  • #consulting
  • #openclaw-infra
  • #openclaw-releases
  • #job-alerts
  • #x-news
  • #uas-news
  • #vinny
  • #david
  • #cron-status
  • #lantronix
  • #linkedin-engagement
  • #ai-ventures
  • #content-ideas
  • Threads exist and matter. Examples already observed include LI senior roles, jobs, goals, Pete and Vinny overview, Chrome remote debugging relay, Job search skill improvements, and Techmeme / Tech News AI Experiments.

What is missing:

  • A deterministic extraction path
  • Resumability/checkpointing
  • Full pagination across high-volume channels
  • Clean thread handling
  • A reliable markdown renderer

4. Platform Capabilities

OpenClaw supports Discord as a bot-connected channel and supports reading messages and enumerating threads through the message tool, documented in OpenClaw Discord docs. That is useful for interactive tasks, but it is not the ideal substrate for a full export utility.

Discord itself supports the needed primitives through the API:

Native OpenClaw is enough to confirm access and test spot reads, but a full export requires a custom script/tool layer.

5. Community Patterns

The strongest community patterns are:

  1. Bot-token direct export with pagination
  • Common, stable, and aligned with Discord's intended API usage.
  • Matches the general guidance that history exports need iterative pagination because message-history endpoints cap at 100 items per request, as reflected in discord.py docs and community discussion such as r/Discord_Bots export-history thread.
  1. Third-party exporter tools like DiscordChatExporter
  • Mature and high throughput, with support for guild-wide export and threads, as seen in DiscordChatExporter frontend references.
  • Downsides: more external dependency surface, different formatting assumptions, and less control over the exact artifact output and checkpoint model.
  1. Browser/UI scraping
  • Fragile and a bad fit for this use case.
  • It is slower, less reliable, harder to resume, and more likely to create weird edge cases than direct API reads.

6. Options

OptionApproachComplexityToken CostReliabilityMaintenanceNotes
AKeep using OpenClaw message.read manuallyLow code, high operator burdenLowLowLow code, high painGood for spot checks, bad for full exports
BBuild local Python bot-token exporterMediumNear-zeroHighMediumBest balance of control, safety, and completeness
CAdopt third-party exporter binary/toolLow-mediumNear-zeroMedium-highMedium-highFaster to start, less tailored, more dependency risk

Option A: Manual message.read export

Pros:

  • No new code
  • Uses existing access path

Cons:

  • Laborious pagination
  • Easy to miss channels/threads
  • Hard to resume cleanly
  • Hard to guarantee completeness
  • Chat context gets noisy and brittle

Option B: Local Python bot-token exporter

Pros:

  • Deterministic
  • Read-only
  • Resumable
  • Easy to shape output exactly how Pete wants it
  • Low model usage
  • Best control over pacing, checkpointing, and rendering

Cons:

  • Needs implementation work
  • Must carefully handle Discord API pagination and rate limits

Option C: Third-party exporter

Pros:

  • Fastest path to raw extraction
  • Often supports guild-wide export and threads already

Cons:

  • Adds trust and dependency surface
  • May not match desired markdown structure
  • Harder to integrate with OpenClaw workflow and local artifact conventions

7. Recommendation

Build Option B: a local Python bot-token exporter.

Why:

  • It uses the bot identity already authorized for this server.
  • It avoids browser scraping and user-token risk.
  • It minimizes LLM involvement.
  • It gives full control over output format, checkpointing, and pacing.
  • It is the cleanest long-term tool for repeated export requests, not just this one job.

Opinionated call: do not build this as an LLM-driven workflow, and do not keep trying to brute-force it with message.read pages in chat.

8. Security Considerations

Access model

  • Use the existing Discord bot token already configured for OpenClaw.
  • Do not use Pete's user token.
  • Do not scrape via the browser.

Data exposure

  • Output is intentionally unredacted for this request, so the script must make that explicit in the header.
  • Because the export may contain confidential or internal content, default output location should remain inside the workspace artifacts/ directory.

Failure modes

  • Partial export if interrupted mid-run
  • Missing archived/private threads if the bot lacks access
  • Duplicates if pagination/checkpoint logic is wrong
  • Large markdown output becoming unwieldy

Mitigations

  • Checkpoint per channel/thread
  • Store normalized JSON as intermediate data
  • Idempotent reruns
  • Coverage summary at end of run
  • Rate-limit compliance with conservative pacing and Retry-After handling

9. Implementation Scope

David should build the following:

New files

  • /Users/vinny/.openclaw/workspace/scripts/discord-export-history.py
  • /Users/vinny/.openclaw/workspace/scripts/discord-export-history.md or a short usage reference in references/

Output locations

  • Markdown artifact:
  • /Users/vinny/.openclaw/workspace/artifacts/vinny_discord_server_export_last_21_days.md
  • Optional raw/intermediate data:
  • /Users/vinny/.openclaw/workspace/artifacts/tmp/discord-export/<run-id>/messages.jsonl
  • /Users/vinny/.openclaw/workspace/artifacts/tmp/discord-export/<run-id>/checkpoint.json

Script responsibilities

  1. Read bot token from existing local OpenClaw config or environment reference.
  2. Enumerate accessible guild channels for a provided guild id.
  3. Enumerate threads, including archived threads where exposed.
  4. Fetch message history per surface with pagination.
  5. Filter to the last --days N window.
  6. Normalize messages into a stable intermediate schema.
  7. Render markdown grouped by channel and nested thread sections.
  8. Write a run summary with counts per channel/thread and any gaps.
  9. Support resume via checkpoint file.

Command-line interface

Proposed CLI:

bash
python3 scripts/discord-export-history.py \
  --guild-id 1482433153858535656 \
  --days 21 \
  --include-threads \
  --output artifacts/vinny_discord_server_export_last_21_days.md \
  --json-out artifacts/tmp/discord-export/latest/messages.jsonl \
  --checkpoint artifacts/tmp/discord-export/latest/checkpoint.json

Message normalization schema

Each normalized message should include:

  • message_id
  • timestamp_utc
  • timestamp_local
  • author_name
  • author_id
  • channel_name
  • channel_id
  • thread_name if applicable
  • thread_id if applicable
  • content
  • attachments[]
  • embeds[] simplified
  • reply_to_message_id if present
  • reactions[] optional
  • jump_url if derivable

10. Validation Criteria

David should verify all of the following before calling it done:

  1. Smoke test
  • Export 1 day from #cron-status and verify readable markdown output.
  1. High-volume test
  • Export 7 days from #consulting and verify multi-page pagination works.
  1. Thread test
  • Export one known thread and verify thread messages appear under the correct parent section.
  1. Resume test
  • Interrupt a run midway, rerun with the same checkpoint, and verify it resumes instead of restarting from scratch.
  1. Coverage summary test
  • End-of-run summary reports channels processed, threads processed, message counts, and any inaccessible surfaces.
  1. Read-only test
  • No write/mutation calls to Discord are made.

11. Category

Tool

This is a new local CLI/data-export capability, not a behavioral tweak or a skill.

12. Context Loading

Load these files when building or maintaining the exporter:

  • Always load:
  • /Users/vinny/.openclaw/workspace/AGENTS.md
  • /Users/vinny/.openclaw/workspace/ARTIFACT-GUIDE.md
  • /Users/vinny/.openclaw/workspace/references/spec-template.md
  • For Discord/OpenClaw context:
  • /opt/homebrew/lib/node_modules/openclaw/docs/channels/discord.md
  • For environment-specific values:
  • /Users/vinny/.openclaw/workspace/TOOLS.md
  • Only if token/config resolution details are needed:
  • ~/.openclaw/openclaw.json

Do not load broad memory files for this tool unless the task changes from implementation to user-context interpretation.

13. Guardrails

  • Do not use Pete's Discord user token.
  • Do not use browser scraping as the primary extraction path.
  • Do not mutate Discord state in any way.
  • Do not silently omit inaccessible channels or threads. Report them.
  • Do not claim completeness unless the coverage summary supports it.
  • Do not hardcode secrets into the script or artifact.
  • Do not depend on LLM summarization for extraction or correctness.
  • Do not flatten thread messages into main channel flow without labeling them.
  • Do not make redaction decisions automatically for this tool. Redaction is a separate post-process.

14. Handoff

David's final handoff should include:

  1. The implemented script path
  2. Exact command used to run it
  3. The generated markdown artifact path
  4. A short verification summary:
  • channels exported
  • threads exported
  • message count
  • any gaps or inaccessible surfaces
  1. A plain-English note to Pete on whether the export is complete enough to share as-is

Preferred delivery format:

  • primary: artifact link using the Mission Control Tailscale URL
  • secondary: short Discord reply summarizing coverage and any gaps