Skip to content

Generate — Podcast

/generate podcast turns wiki pages into a 3–10 minute MP3 explainer. The LLM writes a spoken-word script, TTS renders each line, ffmpeg concatenates them. Single-host monologue by default; --two-voice for NotebookLM-style dialogue.

flowchart TB
Start([/generate podcast topic]) --> Resolve
Resolve["Resolve topic<br/>(shared select-pages)"] --> Hash
Hash["Compute source-hash"] --> Script
Script["LLM writes script.md<br/>single-host or two-voice"] --> Pick
Pick{"TTS backend<br/>priority"}
Pick -- "ELEVENLABS_API_KEY" --> EL["ElevenLabs<br/>studio-grade"]
Pick -- "OPENAI_API_KEY" --> OAI["OpenAI TTS<br/>very good"]
Pick -- "default" --> Piper["Piper (local)<br/>robotic, free"]
EL --> Concat
OAI --> Concat
Piper --> Concat
Concat["ffmpeg concat<br/>MP3 VBR q2"] --> Sidecar
Sidecar[".meta.yaml sidecar<br/>+ .script.md kept"] --> Done([Podcast ready])
classDef engine fill:#5bbcd6,stroke:#2e6c7c,color:#0b0f14
classDef output fill:#7dcea0,stroke:#2d6a4f,color:#0b0f14
class Resolve,Hash,Script,Pick,EL,OAI,Piper,Concat engine
class Sidecar,Done output
/generate podcast <topic> [--vault <name>] [--length short|medium|long] [--two-voice] [--voice <name>]
FlagDefaultNotes
--lengthmediumshort ≈ 3 min, medium ≈ 6 min, long ≈ 10 min
--two-voiceoffDialogue between Host A and Host B
--voicebackend defaultOverride the TTS voice (Piper model id / OpenAI voice / ElevenLabs voice id)
Terminal window
/generate podcast transformers --vault llm-wiki-research --two-voice
✅ Podcast generated
Topic: transformers
Format: two-voice
TTS: openai
Length: medium (~6 min)
Pages in: 8
Source hash: 2dd9ed4a003f
Script: vaults/llm-wiki-research/artifacts/podcast/transformers-2026-04-18.script.md
MP3: vaults/llm-wiki-research/artifacts/podcast/transformers-2026-04-18.mp3
Sidecar: vaults/llm-wiki-research/artifacts/podcast/transformers-2026-04-18.meta.yaml

The .script.md is the primary re-renderable artifact — the MP3 is derived. Edit the script and re-run to change the narration without touching the wiki.

The handler picks the best available backend, in priority order:

PriorityBackendTriggerCost per 1k charsQuality
1ElevenLabsELEVENLABS_API_KEY set~$0.30Studio-grade
2OpenAI TTSOPENAI_API_KEY set~$0.015Very good
3Piperfallback (lazy-installed)freeRobotic but clean

Voice defaults:

  • Piper: en_US-lessac-medium (HOST / A), en_GB-alan-medium (B)
  • OpenAI: alloy (HOST / A), onyx (B)
  • ElevenLabs: Rachel (HOST / A), Adam (B) — override with ELEVENLABS_VOICE_A / ELEVENLABS_VOICE_B

A generated script looks like this:

# Podcast: transformers
_Generated 2026-04-18 · source hash 2dd9ed4a003f · length target medium (~6 min) · format two-voice_
[A]: Alright. Today we're getting into transformers.
[B]: Why this, why now?
[A]: Because the architecture quietly took over every language task. Here's the shape of it, from *wiki/concepts/attention.md*
[B]: Huh. And how is that different from recurrent networks?

Script rules the LLM follows:

  • Spoken-word, not bulleted summaries. Full sentences, natural cadence.
  • {{cite: path}} placeholders rendered as italic page names before TTS.
  • ~150 wpm target — short ≈ 450 words, medium ≈ 900, long ≈ 1500.
  • Sources list at the end, spoken.

Lazy-installed on first run:

ToolInstallPurposeRequired?
ffmpegbrew install ffmpeg / apt install ffmpegMP3 concatYes
piperbrew install piper-ttsLocal TTSOnly when no cloud key present
SymptomCauseFix
”Piper not found and no cloud TTS key present”Offline, Homebrew failingInstall Piper manually from rhasspy/piper or set OPENAI_API_KEY
Clipped syllables between linesffmpeg concat list has missing newlinesRegenerate — the handler rebuilds the concat list each run
Robotic-sounding outputPiper is in useSet OPENAI_API_KEY or ELEVENLABS_API_KEY for a quality upgrade
ElevenLabs cost higher than expectedLong script + premium voiceUse --length short or swap to OPENAI_API_KEY (20× cheaper per char)
  • No music / intro stingers. Pure voice. Deferred.
  • No ID3 chapters. Would be nice for navigation — deferred.
  • Cost warnings are documented but not enforced — you can burn ElevenLabs credit on a long podcast without a confirmation prompt. Phase 2E adds a pre-render cost check.
  • Piper voice library is small by default. Custom voices go in $PIPER_VOICES_DIR.