Why I Built This
Extracting podcast guest nominations from X threads for dLogos was a manual grind. The process: open a thread with 200+ replies, scroll through every comment hunting for @ mentions, copy nominations into a spreadsheet, deduplicate, fix typos, then upload to Airtable for the team. Each thread took 4-6 hours. We had dozens.
The goal was a system that scrapes, validates with AI, and pushes clean data directly into Airtable. The result: 4-6 hours became under 15 minutes.
Day 1: Ship the MVP
On January 28, 2026, the entire foundation shipped in a single commit. 23 files, 4,193 lines of code:
- Browser automation with Playwright and stealth plugins
- GraphQL API interception instead of HTML scraping
- Express web server with a Twitter-styled dark UI
- Docker setup, deployed to Railway by end of day
The key technical decision was intercepting X's GraphQL API rather than parsing HTML. X's web app makes GraphQL requests internally. Instead of writing fragile CSS selectors that break whenever X redesigns a button, the scraper listens for those same API responses and extracts structured data directly — tweet IDs, timestamps, handles, mentions, nested replies. GraphQL schemas change far less often than HTML, and you get the exact data the frontend sees.
The interception handles both direct tweet content and nested conversation threads, which X structures differently in its response:
page.on('response', async (response) => {
if (!response.url().includes('/graphql/')) return;
const data = await response.json();
const instructions = data?.data
?.threaded_conversation_with_injections_v2?.instructions;
if (!instructions) return;
for (const instruction of instructions) {
for (const entry of instruction.entries || []) {
// Direct tweet content
const tweet = entry.content?.itemContent?.tweet_results?.result;
if (tweet) allTweets.set(tweet.rest_id, tweet);
// Nested conversation thread items
for (const item of entry.content?.items || []) {
const nested = item.item?.itemContent?.tweet_results?.result;
if (nested) allTweets.set(nested.rest_id, nested);
}
}
}
});
The endpoint pattern is deliberately generic — /graphql/ rather than a hardcoded endpoint name — because X rotates these frequently. The scraper also uses cursor-based pagination for threads with more replies than a single scroll can capture, supporting up to 500 pages with adaptive rate limiting that doubles the delay every 10 requests.
By end of Day 1, a working scraper was deployed to Railway.
Day 1, Evening
Six hours later, the first bug fixes shipped. Testing against real X threads revealed problems that never surfaced in development: @ symbols inconsistently present in handles, nominator and nominee roles confused in the parser, aggressive normalization stripping valid data. The fix was small — 2 files, 32 lines — but the lesson was immediate.
Day 2: Schema Clarity
On January 30th, stakeholders pushed back: "What does x_url mean? Is this the X platform or the tweet URL?"
Schema naming matters. The rename was straightforward:
x_url→reply_urlnominator_handle→commenter_handletweet_text→full_reply_text
Non-technical users need self-documenting field names. A few days of real usage revealed pain points that never came up in development.
The scraper worked. The data was clean enough. But there was a bigger problem: the team was still copy-pasting CSVs into Airtable by hand.
Day 3: The Pivot
January 31st changed everything. Our team's real workflow lives in Airtable. Manual file transfers between tools defeated the purpose of automation, and AI could validate data quality in real time.
Two major integrations shipped in a single commit — 430 lines.
The system now reads directly from Airtable: parse a record URL, fetch the Thread URL field, scrape the thread, upload the CSV as an attachment, update the record's status, and advance the workflow stage. Zero manual steps between a scraper run and the team seeing results.
For validation, Grok was the natural choice. Unlike GPT, Grok has real-time access to X through its web_search and x_search tools — it can verify handles and check context on the platform itself. The validation applies nine normalization rules: merging duplicates, fixing typos, removing organizations, standardizing handle formatting, filtering the original poster's own comments.
But one AI pass was not enough. Grok and GPT have different strengths. Grok excels at web-aware tasks — real-time search, organization verification, handle lookup. GPT-5.2 is better at structured reasoning — producing clean CSVs, handling edge cases, complex merging logic.
The pipeline runs in four stages. First, Grok normalizes the raw CSV using its search tools to verify data against live X. Its output is a JSON changelog of what it changed and why. Second, GPT takes the original CSV plus Grok's changelog and produces a clean, structured CSV. Third, for nominees mentioned by name only — no @handle — Grok searches X to find the correct handles using thread context. Finally, GPT merges those found handles into the CSV and adds a Notes column for uncertain entries.
Each stage saves results to a named Airtable field — [AUTO] Grok Normalization, [AUTO] GPT Normalization, [AUTO] Grok 2 (Handles), [AUTO] GPT 2 (Normalized with Handles) — so the team can audit every step.
// Grok: real-time validation with X search tools
const grokResponse = await fetch('https://api.x.ai/v1/responses', {
method: 'POST',
headers: { 'Authorization': `Bearer ${xaiApiKey}` },
body: JSON.stringify({
model: 'grok-4-1-fast',
tools: [{ type: 'web_search' }, { type: 'x_search' }],
input: `${csvContent}\n\n${prompt}`,
}),
});
// GPT: structured reasoning pass
const client = new OpenAI({ apiKey: openaiKey });
const gptResponse = await client.responses.create({
model: 'gpt-5.2',
input: [{ role: 'user', content: promptText }],
});
const enhanced = gptResponse.output_text;
The UI evolved alongside the pipeline: step-by-step navigation across all four stages, auto-population of each step's input from the previous step's output, test data loading for development, and browser notifications when each stage completes.
The Normalization Engine
Not everything needs AI. Alongside the multi-model pipeline, a local algorithmic normalizer handles the deterministic work — nine rules applied in sequence, no API calls required.
The normalizer merges split rows where the same nomination appears with @handle in one row and a name in another. It standardizes name variants when the same handle has multiple spellings, choosing the highest-quality version by scoring capitalization, token count, and absence of numbers or URLs. It deduplicates by tweet context. It filters organizations using a keyword list — news, policy, institute, foundation, and twelve others. It removes the original poster from the nominee list. And it ensures every handle starts with @.
For typo detection, the normalizer uses Levenshtein distance with a space-optimized two-array implementation rather than a full matrix. The namesAreSimilar function adds a first-letter guard — if two names start with different letters, it skips the expensive distance calculation entirely. Threshold: 2 edits for short names, 3 for names longer than 14 characters.
Every change is logged in a transparent JSON audit trail:
{
"summary": {
"total_original_rows": 42,
"final_rows_count": 38,
"removed_rows_count": 4,
"updated_rows_count": 15
},
"changes": [{
"field": "nominee_name",
"old_value": "daniel scchmachtenberger",
"new_value": "Daniel Schmachtenberger",
"reason": "Corrected known misspelling",
"rule": 6
}]
}
Users can verify every decision the system made. That transparency is what turns automation from a black box into something the team actually trusts.
Lessons
Integration matters more than features. The standalone scraper was useful enough — it saved time on data collection. But the Airtable integration changed the tool's category. It stopped being a utility the team had to remember to use and became part of the workflow they were already in. The lesson applies broadly: build into the systems people already use rather than asking them to adopt a new one.
Different models for different jobs. Grok handles real-time search — verifying handles against live X data, checking whether an account is a person or an organization. GPT-5.2 handles structured reasoning — producing clean CSVs, merging conflicting data, resolving edge cases. GPT-4o-mini handles cheap batch work like topic title enhancement at 20 items per API call. The same principle applies to prompting: small focused requests, test each piece, build incrementally.
Transparency builds trust. Every normalization change is logged with what changed, why, which rule applied, and the old versus new values. Without this, the team would need to manually verify the AI's output — which defeats the purpose of automation. With it, they can spot-check the audit trail and trust the rest.
Real data is the only test. The first real X thread broke the parser in ways that never surfaced in development. User feedback drove the schema rename six hours after launch. The GraphQL interception strategy itself came from watching how X's frontend actually behaves, not from documentation. Ship fast, test against reality, fix what breaks.
Impact
Three days from concept to production, roughly 1,500 lines of code. Before: 4-6 hours per thread, manually scrolling, copying, deduplicating. After: under 15 minutes, automated end to end.
Quality improved across the board — 90%+ duplicate reduction, typo correction via fuzzy matching, organization filtering, handle verification through real-time search, all with a transparent audit trail.
Cost: about $5/month for Railway hosting and roughly $2 per thread across three AI model passes. Against 4+ hours of manual work per thread, the ROI was immediate. The tool has processed dozens of threads for dLogos, extracting thousands of nominations.
Ship fast, iterate faster. Built with Claude Code.