The Problem
Podcast creators want to go viral on TikTok, Reels, and Shorts. But manually finding clip-worthy moments in 2-hour episodes is tedious.
The vision: Paste a YouTube URL, and 10 minutes later get 5 AI-selected viral clips with hooks, captions, and timestamps ready to post.
The result: Built across 3 days (~12 hours of active work) with Claude Opus 4.6 via Claude Code. Fully functional full-stack app with LLM orchestration, real-time progress, and professional video processing.
From Script to System
I started with a working Python script (pod-clipper) that processed podcasts locally. It worked, but had no web interface, no async operations, global config everywhere, no progress visibility, and no database.
My prompt to Claude:
"Help me turn this podcast clipper script into a production web app. I want:
- FastAPI backend with async/await
- Next.js frontend with real-time progress
- Database to store jobs and clips
- Clean architecture (no global state)"
Claude analyzed the codebase and proposed:
Backend (FastAPI + SQLite + Async)
↓
Pipeline Modules (Refactored from pod-clipper)
↓
Background Jobs (asyncio.to_thread)
↓
Real-time Progress (SSE)
↓
Frontend (Next.js + shadcn/ui)
Key Decisions
SSE over WebSocket for real-time progress:
- Simpler protocol (no connection upgrade)
- Browser-native EventSource API
- Falls back to polling gracefully
Async Bridge Pattern to solve CPU-bound blocking:
class ProgressReporter:
def report(self, status, pct, message):
# Called from blocking thread
self._events.append(event)
self._loop.call_soon_threadsafe(self._async_event.set)
async def listen(self):
# Async generator for SSE
while True:
await self._async_event.wait()
yield event
This lets FFmpeg and LLM operations run in threads while the web server streams progress in real-time.
Single commit, 36 files, 2,641 lines:
e33514e Initial commit: PodClip — YouTube viral clip extractor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two-Phase LLM Strategy
Sending entire 2-hour transcripts to Claude was expensive and overwhelming. I designed a two-phase pipeline to solve this:
Phase 1: Triage (Sonnet - fast scan)
- Scans full transcript → Top 20 candidates
- Priority score, category, timestamps
- ~$0.10 per video
Phase 2: Deep Analysis (Opus - focused)
- Analyzes small windows around promising moments
- Generates virality scores, hooks, quotes, scroll-stoppers, captions
- ~$0.50 per video
Total LLM cost: ~$0.60 per video
The 219-Line Prompt
The DETECTION_PROMPT teaches semantic boundaries:
CRITICAL: Your start_time and end_time should land on NATURAL SENTENCE BOUNDARIES
- Start just before the speaker begins the key thought (not mid-sentence)
- End just after they complete the thought (not mid-word or mid-sentence)
- A 10-second buffer will be added automatically before and after your timestamps
- Your timestamps should capture the ESSENTIAL content; the buffer prevents cutoffs
Combined with post-processing buffer, clips never cut off mid-sentence.
The FFmpeg Challenge
Naive approach had seeking issues, codec errors, and random failures:
ffmpeg -ss START -i video.mp4 -t DURATION output.mp4 # ❌ Unreliable
Claude's two-step solution:
Step 1: Stream-copy segment (instant, no decoding)
ffmpeg -ss START -i video.mp4 -t DURATION -c copy segment.mp4
Step 2: Re-encode from segment (no seeking needed)
ffmpeg -i segment.mp4 [filters] -c:v libx264 output.mp4
Bypasses all codec issues. For TikTok/Reels, scales video to 9:16 with blurred background.
Real-Time Progress
Before: Submit URL → black box → clips appear 10 minutes later
After: Live updates via SSE:
[████░░░░░░] 40% Analyzing transcript...
[████████░░] 80% Creating clips: 3/5
[██████████] 100% Complete!
Frontend polls /api/jobs/{jobId}/clips every 3 seconds during clip creation. Clips appear as they're generated, not after all are done.
The Buffer Fix
After testing with real podcasts, clips cut off mid-sentence:
- "...and that's why I think—" [CUT]
My prompt:
"New rule: clips should not cut off mid-sentence. Add 10s buffer before and after"
Claude's hybrid solution:
- Updated prompts to teach LLM about natural boundaries
- Applied post-processing buffer
- Clamped to video duration
buffered_start = max(0.0, start_time - buffer_before) # 10s before
buffered_end = min(end_time + buffer_after, video_duration) # 10s after
Result: Perfect clip boundaries. Implementation took 3 hours, 5 files modified.
Tech Stack
Backend: FastAPI + SQLAlchemy Async
- FastAPI for async/await and SSE streaming
- SQLite via
aiosqlite(single file, no server) - Pydantic validation built-in
Frontend: Next.js 15 + shadcn/ui
- Server Components for fast initial render
- Dynamic routing (
/jobs/[id]) - API proxy (
/api/*→ backend:8000) - shadcn/ui for accessible components
Pipeline: 6 Steps
1. Fetch metadata (yt-dlp)
2. Extract transcript (YouTube captions)
3. Detect highlights (Claude 2-phase)
4. Download video (yt-dlp)
5. Create clips (FFmpeg → Supabase)
6. Notify (Slack, Airtable)
What Got Built
Core:
- YouTube download, transcript extraction, AI highlight detection
- Video clip creation (vertical or original)
- Cloud storage (Supabase S3)
Interface:
- URL submission, live progress, clip gallery
- Job detail pages, library browser
Metadata:
- Virality scores, hooks, quotes, energy levels
- Scroll-stoppers, interstitials, dual captions
Production:
- Docker, Railway deployment
- Error handling, migrations
Performance: 5-10 minutes for 2-hour podcast
- Metadata: ~2s
- Transcript: ~5s
- Highlights: ~2-3m (LLM)
- Download: ~1-2m
- Clips: ~2-3m (parallel)
What I Learned About AI-Assisted Development
The split that worked:
- I owned domain knowledge (what makes clips viral), requirements, and design constraints
- Claude handled architecture patterns (SSE, async bridge, two-phase LLM), implementation, and edge case discovery (bounds checking, codec issues)
Key lesson: Start with architecture, not code. Give clear constraints ("Use shadcn/ui" > "make it look good"). Test with real data — the AI won't catch mid-sentence cutoffs until you report them.
Memory continuity: Claude Code maintained project patterns in MEMORY.md. When I started the buffer enhancement session 3 days later, it knew the project structure, referenced files by path, and followed existing patterns. No ramp-up needed.
Metrics
- Time: ~12 hours of active work across 3 days (vs estimated 3-5 days solo)
- Speedup: ~6-10x
- Code: 2,600 lines of production code
- Stack: 8 technologies seamlessly integrated
- Cost: $0.60 per video processed
The shift: Building with Claude Code changed my question from "Can I build this?" to "Should I build this?"
The limiting factor is no longer technical implementation—it's product vision and user understanding.
Appendix: The Co-Authored Commit
commit e33514e22dee45a45a068cf62b5a20e579f318f1
Author: 0xan000n <0xan000n@protonmail.com>
Date: Tue Feb 10 21:22:12 2026 -0800
Initial commit: PodClip — YouTube viral clip extractor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
36 files changed, 2641 insertions(+)
One commit, two authors. A snapshot of what human-AI collaboration looks like in practice.