As of May 5, 2026, Podchaser’s English podcast transcript coverage expanded from the top 20,000 podcasts to approximately 150,000 — a 7.5× increase that captures the long tail of finance, B2B, health, and niche-topic shows that previously had no searchable transcript. The historical backfill, which transcribes every in-scope episode published since January 1, 2021, completes the week of May 11. All transcripts are available now in Podchaser Pro and via the self-serve REST API; existing Enterprise GraphQL clients can upgrade to the expanded scope through their account team. There are no Pro coverage caps. The expansion is English-only and applies to transcripts only, sponsor and brand-safety detection have not changed.
If you’ve been a Podchaser user for any meaningful length of time, you’ve hit this wall: you search for a brand, an executive, a competitor, a topic, and the show that’s actually moving the conversation isn’t there. Not because we don’t have the podcast in our database. We have 6 million+ of them. But because we hadn’t transcribed it.
That changed today.
We’re growing English transcript coverage from the top 20,000 podcasts to approximately 150,000, and going back to transcribe every in-scope episode published since January 1, 2021. Phase 1 (ongoing transcription) is live as of May 5, 2026. Phase 2 (the historical backfill) follows the week of May 11, 2026.
This post is the canonical reference for what changed, what’s covered, what isn’t, and how to access it depending on whether you’re on Pro, the REST API, Enterprise GraphQL, or DaaS.
The shortest possible version:
| Before | After | |
|---|---|---|
| English podcasts with transcripts | Top 20,000 | ~150,000 |
| Coverage threshold | Top-of-chart bias | Power Score ≥ 1 (any meaningful audience) |
| Historical depth | Forward-looking only | Backfill to Jan 1, 2021 (~5 years) |
| Long-tail finance, health, B2B coverage | Patchy | Comprehensive |
| Pro access | Capped | Fully ungated |
| REST API access | Top-20K only | Full ~150K library |
| Enterprise GraphQL access | Existing scope | New tier-based scopes available |
Headline number: 7.5× more shows. Plus five years of history that, until now, you couldn’t search at all.
Podcasts are where a lot of the most important conversations happen. Brands, agencies, and investors increasingly want to know what’s being said and where. Not just on the top-of-chart shows everyone already monitors, but on the niche shows where the actual signal lives.
The problem: until now, we only transcribed the top 20K English podcasts. That misses most of the long tail, which is where high-value, non-obvious conversations cluster: independent finance commentary, vertical B2B podcasts, health practitioner shows, regional industry pods, scene-specific culture pods.
Across our API and DaaS book, the same complaint kept coming back: “this podcast doesn’t have a transcript.” Several enterprise clients had explicitly asked for expanded coverage. The expansion is the answer.
All English podcasts with a Power Score of 1 or higher at the time the episode is published.
Power Score is our composite popularity rating on a 0–100 scale, factoring in 30+ signals (chart positions, listener engagement, social reach, production consistency, and more). A Power Score of 1 is our floor for “meaningful audience” — it cuts shows with effectively no listeners while keeping every podcast that matters.
In numeric terms: that’s approximately 150,000 English podcasts. About 7.5× the previous coverage.
For each in-scope show, transcripts now include:
Worth being explicit, because the carve-outs matter:
Those carve-outs cluster at the very bottom of Power Score and account for about 10% of in-scope episodes ongoing, and roughly 40% on the historical backfill (older audio is more likely to be broken or de-hosted).
A few things this expansion explicitly doesn’t do:
Existing API and DaaS clients with transcript access. The most direct value. Several have asked for this for over a year. If you’re on Enterprise GraphQL or DaaS today, talk to your account team about upgrading scope (more on access mechanics below).
Finance, hedge fund, and investment research teams. Long-tail finance and macro commentary, the independent analysts, niche markets podcasts, sector-specific shows, is where alpha-generating signal lives. The 5-year backfill matters here especially: you can backtest mention patterns against price moves rather than starting your monitoring from scratch.
Brand monitoring agencies and B2B intelligence platforms. If your product surfaces brand or competitor mentions for clients, the long tail was a known gap. Now it’s not.
Pro power users who live in transcript search and alerts. Especially in niche topics, your alerts will start surfacing shows you didn’t know existed because nothing had been transcribed there before.
Who this isn’t for: clients focused on non-English markets, or clients only working with top-of-chart podcasts. You’re already covered.
Honest take: a few other vendors transcribe at this scale. So scale-of-coverage isn’t, on its own, a defensible moat.
What’s much rarer is offering a multi-year backfill as an ongoing capability, not just transcribing forward from your sign-up date, but giving you 5+ years of history in one query. Most vendors that compete on transcripts don’t.
The bigger story, though, is what the transcripts are next to. At Podchaser, transcripts sit alongside:
You can query all of it together — through Enterprise GraphQL, through the REST API, in S3 dumps via DaaS, or in Pro. A transcript search that filters by audience demographics and brand-safety risk in one query is a different product from a transcript search that just returns a list of mentions.
Transcripts are commoditizing. Our data, in context, is the moat.
Everything is available immediately. No coverage caps, no upsell, no gating. Search, alerts, monitoring reports, episode discovery — all of it now runs against the full ~150K library and the full historical backfill.
If you’ve ever had a Pro alert that consistently returned nothing, this is the moment to revisit it. The shows you were missing probably weren’t missing, they just weren’t transcribed.
All transcripts are available. The full set, no cap, no tier gate. Pull and search across the expanded library starting today.
If you’re not yet building on the API, start at podchaser.com/api.
This is where the access mechanics matter most. A few things to know:
Any transcript can be included in an S3 dump. Existing DaaS clients don’t automatically get the expansion unless they upgrade. Unlike the API, DaaS pricing can be split — you can buy just the backfill, just ongoing coverage, or both, scoped to specific shows or Power Score thresholds.
Talk to your account team for the expanded dump.
A few use cases that were marginal at top-20K coverage and become viable at ~150K:
Because the long-tail coverage and the multi-year backfill make this the first time the full Podchaser graph has been usable for those workflows end-to-end.
As of May 5, 2026, Podchaser transcribes approximately 150,000 English podcasts with a Power Score of 1 or higher, including every in-scope episode published since January 1, 2021. Coverage was previously limited to the top 20,000 English podcasts.
Yes. The self-serve REST API has full access to the expanded ~150K-podcast library and the 2021-onward backfill, with no per-tier coverage cap. See podchaser.com/api for documentation.
No. Existing Enterprise GraphQL clients keep their current scope and do not automatically receive the expanded transcript coverage. Account managers can configure expanded scope on a per-client basis.
In the API (REST and GraphQL), no — you receive all transcripts within your scope as a single product. In DaaS / S3 dumps, yes — coverage can be split between just the historical backfill, just ongoing transcription, or both, and scoped to specific shows or Power Score thresholds.
Yes. There are no coverage caps in Pro. All ~150,000 podcasts in scope and the full 2021-onward backfill are available across Pro search, alerts, monitoring reports, and episode discovery.
5 hours per episode. Episodes longer than 5 hours will not be transcribed.
Non-English podcasts, podcasts with a Power Score of 0 or null, and a small set of in-scope episodes that hit specific exclusions: T&C violations, broken or video-only audio, private feeds, podcasts mis-tagged as English, spam or ad-only feeds, music podcasts, extreme-output feeds, and duplicate content. These exclusions account for approximately 10% of in-scope episodes ongoing and 40% of the historical backfill.
No. This expansion covers transcripts only. Sponsor detection and brand-safety scoring continue to run on the original top-20K transcript footprint while the team improves moderation upstream.
No measurable degradation is expected. Podchaser’s new internal Transcript Hub gives the team better pipeline visibility than before and is designed to catch processing issues earlier. Latency is not guaranteed to improve, but it can now be measured and managed precisely.
Existing transcripts remain available to clients who already had access. New episodes released while that podcast is below the Power Score 1 threshold will not be transcribed. This protects historical search results from changing under clients.
No. The expansion is English-only. Non-English coverage is not part of this launch.
Trent Anderson is Head of Growth & Strategy at Podchaser, where he leads marketing, demand generation, and product positioning across Pro and the API.