Discord History Exporter Spec

A local read-only bot-token exporter is the right design. It should use the existing Discord bot credentials already configured for OpenClaw, enumerate channels and threads in guild 1482433153858535656, page message history directly from the Discord API, checkpoint progress, and render one deterministic markdown transcript. Do not use browser scraping, a user token, or an LLM-heavy extraction loop.

Key decisions:

Build a local script/tool, not a manual message.read workflow.
Use the bot identity already authorized in the server, not Pete's user session.
Store normalized raw JSON first, then render markdown, so the export is resumable and auditable.
Handle threads explicitly, including archived threads where the bot can access them.
Honor rate limits and add jittered pauses to keep the footprint conservative.

1. Problem Statement and Goal

Pete wants a shareable export of the last 21 days of conversation across the Vinny Discord server, including threads, with no redaction. The existing message tool path works for spot reads, but it is a bad fit for a truthful full-server export because it requires lots of manual pagination, is easy to interrupt, and makes it hard to certify completeness.

The goal is a local scripted exporter that can reliably generate a complete, read-only markdown transcript of a configurable time window from a Discord guild the bot already has access to.

2. Success Metric

This is successful when all of the following are true:

A single command can export the last N days of Discord history for guild 1482433153858535656 into a markdown file.
The export includes:
all accessible text channels
all accessible threads, including archived threads when exposed by the API
timestamps, author names, message content, and attachment URLs
The run is resumable after interruption.
Pete can verify coverage by spot-checking at least three high-volume surfaces, including #vinny, #consulting, and one thread.
The script uses no user token and no browser automation.

3. Current State

Today:

OpenClaw already has Discord bot access configured. The Discord channel docs describe the bot-token setup and guild access model in OpenClaw Discord docs.
The server/guild id is 1482433153858535656.
The current manual export attempt created a partial artifact at artifacts/vinny_discord_server_export_last_21_days.md, but it is materially incomplete.
Main channels confirmed in scope include:
#job-search
#consulting
#openclaw-infra
#openclaw-releases
#job-alerts
#x-news
#uas-news
#vinny
#david
#cron-status
#lantronix
#linkedin-engagement
#ai-ventures
#content-ideas
Threads exist and matter. Examples already observed include LI senior roles, jobs, goals, Pete and Vinny overview, Chrome remote debugging relay, Job search skill improvements, and Techmeme / Tech News AI Experiments.

What is missing:

A deterministic extraction path
Resumability/checkpointing
Full pagination across high-volume channels
Clean thread handling
A reliable markdown renderer

4. Platform Capabilities

OpenClaw supports Discord as a bot-connected channel and supports reading messages and enumerating threads through the message tool, documented in OpenClaw Discord docs. That is useful for interactive tasks, but it is not the ideal substrate for a full export utility.

Discord itself supports the needed primitives through the API:

channel/message history endpoints in the Discord Channels resource docs
thread behavior and archived thread handling in the Discord Threads docs
paginated history constraints, including the 100-message page limit, reflected in discord.py API guidance

Native OpenClaw is enough to confirm access and test spot reads, but a full export requires a custom script/tool layer.

5. Community Patterns

The strongest community patterns are:

Bot-token direct export with pagination

Common, stable, and aligned with Discord's intended API usage.
Matches the general guidance that history exports need iterative pagination because message-history endpoints cap at 100 items per request, as reflected in discord.py docs and community discussion such as r/Discord_Bots export-history thread.

Third-party exporter tools like DiscordChatExporter

Mature and high throughput, with support for guild-wide export and threads, as seen in DiscordChatExporter frontend references.
Downsides: more external dependency surface, different formatting assumptions, and less control over the exact artifact output and checkpoint model.

Browser/UI scraping

Fragile and a bad fit for this use case.
It is slower, less reliable, harder to resume, and more likely to create weird edge cases than direct API reads.

6. Options

Option	Approach	Complexity	Token Cost	Reliability	Maintenance	Notes
A	Keep using OpenClaw `message.read` manually	Low code, high operator burden	Low	Low	Low code, high pain	Good for spot checks, bad for full exports
B	Build local Python bot-token exporter	Medium	Near-zero	High	Medium	Best balance of control, safety, and completeness
C	Adopt third-party exporter binary/tool	Low-medium	Near-zero	Medium-high	Medium-high	Faster to start, less tailored, more dependency risk

Option A: Manual `message.read` export

Pros:

No new code
Uses existing access path

Cons:

Laborious pagination
Easy to miss channels/threads
Hard to resume cleanly
Hard to guarantee completeness
Chat context gets noisy and brittle

Option B: Local Python bot-token exporter

Pros:

Deterministic
Read-only
Resumable
Easy to shape output exactly how Pete wants it
Low model usage
Best control over pacing, checkpointing, and rendering

Cons:

Needs implementation work
Must carefully handle Discord API pagination and rate limits

Option C: Third-party exporter

Pros:

Fastest path to raw extraction
Often supports guild-wide export and threads already

Cons:

Adds trust and dependency surface
May not match desired markdown structure
Harder to integrate with OpenClaw workflow and local artifact conventions

7. Recommendation

Build Option B: a local Python bot-token exporter.

Why:

It uses the bot identity already authorized for this server.
It avoids browser scraping and user-token risk.
It minimizes LLM involvement.
It gives full control over output format, checkpointing, and pacing.
It is the cleanest long-term tool for repeated export requests, not just this one job.

Opinionated call: do not build this as an LLM-driven workflow, and do not keep trying to brute-force it with message.read pages in chat.

8. Security Considerations

Access model

Use the existing Discord bot token already configured for OpenClaw.
Do not use Pete's user token.
Do not scrape via the browser.

Data exposure

Output is intentionally unredacted for this request, so the script must make that explicit in the header.
Because the export may contain confidential or internal content, default output location should remain inside the workspace artifacts/ directory.

Failure modes

Partial export if interrupted mid-run
Missing archived/private threads if the bot lacks access
Duplicates if pagination/checkpoint logic is wrong
Large markdown output becoming unwieldy

Mitigations

Checkpoint per channel/thread
Store normalized JSON as intermediate data
Idempotent reruns
Coverage summary at end of run
Rate-limit compliance with conservative pacing and Retry-After handling

9. Implementation Scope

David should build the following:

New files

/Users/vinny/.openclaw/workspace/scripts/discord-export-history.py
/Users/vinny/.openclaw/workspace/scripts/discord-export-history.md or a short usage reference in references/

Output locations

Markdown artifact:
/Users/vinny/.openclaw/workspace/artifacts/vinny_discord_server_export_last_21_days.md
Optional raw/intermediate data:
/Users/vinny/.openclaw/workspace/artifacts/tmp/discord-export/<run-id>/messages.jsonl
/Users/vinny/.openclaw/workspace/artifacts/tmp/discord-export/<run-id>/checkpoint.json

Script responsibilities

Read bot token from existing local OpenClaw config or environment reference.
Enumerate accessible guild channels for a provided guild id.
Enumerate threads, including archived threads where exposed.
Fetch message history per surface with pagination.
Filter to the last --days N window.
Normalize messages into a stable intermediate schema.
Render markdown grouped by channel and nested thread sections.
Write a run summary with counts per channel/thread and any gaps.
Support resume via checkpoint file.

Command-line interface

Proposed CLI:

bash

python3 scripts/discord-export-history.py \
  --guild-id 1482433153858535656 \
  --days 21 \
  --include-threads \
  --output artifacts/vinny_discord_server_export_last_21_days.md \
  --json-out artifacts/tmp/discord-export/latest/messages.jsonl \
  --checkpoint artifacts/tmp/discord-export/latest/checkpoint.json

Message normalization schema

Each normalized message should include:

message_id
timestamp_utc
timestamp_local
author_name
author_id
channel_name
channel_id
thread_name if applicable
thread_id if applicable
content
attachments[]
embeds[] simplified
reply_to_message_id if present
reactions[] optional
jump_url if derivable

10. Validation Criteria

David should verify all of the following before calling it done:

Smoke test

Export 1 day from #cron-status and verify readable markdown output.

High-volume test

Export 7 days from #consulting and verify multi-page pagination works.

Thread test

Export one known thread and verify thread messages appear under the correct parent section.

Resume test

Interrupt a run midway, rerun with the same checkpoint, and verify it resumes instead of restarting from scratch.

Coverage summary test

End-of-run summary reports channels processed, threads processed, message counts, and any inaccessible surfaces.

Read-only test

No write/mutation calls to Discord are made.

11. Category

Tool

This is a new local CLI/data-export capability, not a behavioral tweak or a skill.

12. Context Loading

Load these files when building or maintaining the exporter:

Always load:
/Users/vinny/.openclaw/workspace/AGENTS.md
/Users/vinny/.openclaw/workspace/ARTIFACT-GUIDE.md
/Users/vinny/.openclaw/workspace/references/spec-template.md
For Discord/OpenClaw context:
/opt/homebrew/lib/node_modules/openclaw/docs/channels/discord.md
For environment-specific values:
/Users/vinny/.openclaw/workspace/TOOLS.md
Only if token/config resolution details are needed:
~/.openclaw/openclaw.json

Do not load broad memory files for this tool unless the task changes from implementation to user-context interpretation.

13. Guardrails

Do not use Pete's Discord user token.
Do not use browser scraping as the primary extraction path.
Do not mutate Discord state in any way.
Do not silently omit inaccessible channels or threads. Report them.
Do not claim completeness unless the coverage summary supports it.
Do not hardcode secrets into the script or artifact.
Do not depend on LLM summarization for extraction or correctness.
Do not flatten thread messages into main channel flow without labeling them.
Do not make redaction decisions automatically for this tool. Redaction is a separate post-process.

14. Handoff

David's final handoff should include:

The implemented script path
Exact command used to run it
The generated markdown artifact path
A short verification summary:

channels exported
threads exported
message count
any gaps or inaccessible surfaces

A plain-English note to Pete on whether the export is complete enough to share as-is

Preferred delivery format:

primary: artifact link using the Mission Control Tailscale URL
secondary: short Discord reply summarizing coverage and any gaps

1. Problem Statement and Goal

2. Success Metric

3. Current State

4. Platform Capabilities

5. Community Patterns

6. Options

Option A: Manual message.read export

Option B: Local Python bot-token exporter

Option C: Third-party exporter

7. Recommendation

8. Security Considerations

Access model

Data exposure

Failure modes

Mitigations

9. Implementation Scope

New files

Output locations

Script responsibilities

Command-line interface

Message normalization schema

10. Validation Criteria

11. Category

12. Context Loading

13. Guardrails

14. Handoff

Option A: Manual `message.read` export