Generate — Podcast
/generate podcast turns wiki pages into a 3–10 minute MP3 explainer. The LLM writes a spoken-word script, TTS renders each line, ffmpeg concatenates them. Single-host monologue by default; --two-voice for NotebookLM-style dialogue.
flowchart TB Start([/generate podcast topic]) --> Resolve Resolve["Resolve topic<br/>(shared select-pages)"] --> Hash Hash["Compute source-hash"] --> Script Script["LLM writes script.md<br/>single-host or two-voice"] --> Pick Pick{"TTS backend<br/>priority"} Pick -- "ELEVENLABS_API_KEY" --> EL["ElevenLabs<br/>studio-grade"] Pick -- "OPENAI_API_KEY" --> OAI["OpenAI TTS<br/>very good"] Pick -- "default" --> Piper["Piper (local)<br/>robotic, free"] EL --> Concat OAI --> Concat Piper --> Concat Concat["ffmpeg concat<br/>MP3 VBR q2"] --> Sidecar Sidecar[".meta.yaml sidecar<br/>+ .script.md kept"] --> Done([Podcast ready])
classDef engine fill:#5bbcd6,stroke:#2e6c7c,color:#0b0f14 classDef output fill:#7dcea0,stroke:#2d6a4f,color:#0b0f14 class Resolve,Hash,Script,Pick,EL,OAI,Piper,Concat engine class Sidecar,Done output/generate podcast <topic> [--vault <name>] [--length short|medium|long] [--two-voice] [--voice <name>]| Flag | Default | Notes |
|---|---|---|
--length | medium | short ≈ 3 min, medium ≈ 6 min, long ≈ 10 min |
--two-voice | off | Dialogue between Host A and Host B |
--voice | backend default | Override the TTS voice (Piper model id / OpenAI voice / ElevenLabs voice id) |
Example
Section titled “Example”/generate podcast transformers --vault llm-wiki-research --two-voice✅ Podcast generated Topic: transformers Format: two-voice TTS: openai Length: medium (~6 min) Pages in: 8 Source hash: 2dd9ed4a003f Script: vaults/llm-wiki-research/artifacts/podcast/transformers-2026-04-18.script.md MP3: vaults/llm-wiki-research/artifacts/podcast/transformers-2026-04-18.mp3 Sidecar: vaults/llm-wiki-research/artifacts/podcast/transformers-2026-04-18.meta.yamlThe .script.md is the primary re-renderable artifact — the MP3 is derived. Edit the script and re-run to change the narration without touching the wiki.
TTS Backend Selection
Section titled “TTS Backend Selection”The handler picks the best available backend, in priority order:
| Priority | Backend | Trigger | Cost per 1k chars | Quality |
|---|---|---|---|---|
| 1 | ElevenLabs | ELEVENLABS_API_KEY set | ~$0.30 | Studio-grade |
| 2 | OpenAI TTS | OPENAI_API_KEY set | ~$0.015 | Very good |
| 3 | Piper | fallback (lazy-installed) | free | Robotic but clean |
Voice defaults:
- Piper:
en_US-lessac-medium(HOST / A),en_GB-alan-medium(B) - OpenAI:
alloy(HOST / A),onyx(B) - ElevenLabs:
Rachel(HOST / A),Adam(B) — override withELEVENLABS_VOICE_A/ELEVENLABS_VOICE_B
The .script.md Artifact
Section titled “The .script.md Artifact”A generated script looks like this:
# Podcast: transformers
_Generated 2026-04-18 · source hash 2dd9ed4a003f · length target medium (~6 min) · format two-voice_
[A]: Alright. Today we're getting into transformers.[B]: Why this, why now?[A]: Because the architecture quietly took over every language task. Here's the shape of it, from *wiki/concepts/attention.md*…[B]: Huh. And how is that different from recurrent networks?…Script rules the LLM follows:
- Spoken-word, not bulleted summaries. Full sentences, natural cadence.
{{cite: path}}placeholders rendered as italic page names before TTS.- ~150 wpm target — short ≈ 450 words, medium ≈ 900, long ≈ 1500.
- Sources list at the end, spoken.
Dependencies
Section titled “Dependencies”Lazy-installed on first run:
| Tool | Install | Purpose | Required? |
|---|---|---|---|
ffmpeg | brew install ffmpeg / apt install ffmpeg | MP3 concat | Yes |
piper | brew install piper-tts | Local TTS | Only when no cloud key present |
Troubleshooting
Section titled “Troubleshooting”| Symptom | Cause | Fix |
|---|---|---|
| ”Piper not found and no cloud TTS key present” | Offline, Homebrew failing | Install Piper manually from rhasspy/piper or set OPENAI_API_KEY |
| Clipped syllables between lines | ffmpeg concat list has missing newlines | Regenerate — the handler rebuilds the concat list each run |
| Robotic-sounding output | Piper is in use | Set OPENAI_API_KEY or ELEVENLABS_API_KEY for a quality upgrade |
| ElevenLabs cost higher than expected | Long script + premium voice | Use --length short or swap to OPENAI_API_KEY (20× cheaper per char) |
Known Limitations (Phase 2C)
Section titled “Known Limitations (Phase 2C)”- No music / intro stingers. Pure voice. Deferred.
- No ID3 chapters. Would be nice for navigation — deferred.
- Cost warnings are documented but not enforced — you can burn ElevenLabs credit on a
longpodcast without a confirmation prompt. Phase 2E adds a pre-render cost check. - Piper voice library is small by default. Custom voices go in
$PIPER_VOICES_DIR.
See Also
Section titled “See Also”- /generate overview — the router
- generate-video — chains this handler for voiceover
- Artifact conventions — sidecar schema