Podchaser Expands Podcast Transcript Coverage to 150K Shows + 5-Year Backfill [2026]

  • Trent Anderson
  • As of May 5, 2026, Podchaser’s English podcast transcript coverage expanded from the top 20,000 podcasts to approximately 150,000 — a 7.5× increase that captures the long tail of finance, B2B, health, and niche-topic shows that previously had no searchable transcript. The historical backfill, which transcribes every in-scope episode published since January 1, 2021, completes the week of May 11. All transcripts are available now in Podchaser Pro and via the self-serve REST API; existing Enterprise GraphQL clients can upgrade to the expanded scope through their account team. There are no Pro coverage caps. The expansion is English-only and applies to transcripts only, sponsor and brand-safety detection have not changed.


    If you’ve been a Podchaser user for any meaningful length of time, you’ve hit this wall: you search for a brand, an executive, a competitor, a topic, and the show that’s actually moving the conversation isn’t there. Not because we don’t have the podcast in our database. We have 6 million+ of them. But because we hadn’t transcribed it.

    That changed today.

    We’re growing English transcript coverage from the top 20,000 podcasts to approximately 150,000, and going back to transcribe every in-scope episode published since January 1, 2021. Phase 1 (ongoing transcription) is live as of May 5, 2026. Phase 2 (the historical backfill) follows the week of May 11, 2026.

    This post is the canonical reference for what changed, what’s covered, what isn’t, and how to access it depending on whether you’re on Pro, the REST API, Enterprise GraphQL, or DaaS.

    What changed

    The shortest possible version:

    BeforeAfter
    English podcasts with transcriptsTop 20,000~150,000
    Coverage thresholdTop-of-chart biasPower Score ≥ 1 (any meaningful audience)
    Historical depthForward-looking onlyBackfill to Jan 1, 2021 (~5 years)
    Long-tail finance, health, B2B coveragePatchyComprehensive
    Pro accessCappedFully ungated
    REST API accessTop-20K onlyFull ~150K library
    Enterprise GraphQL accessExisting scopeNew tier-based scopes available

    Headline number: 7.5× more shows. Plus five years of history that, until now, you couldn’t search at all.

    Why we did this

    Podcasts are where a lot of the most important conversations happen. Brands, agencies, and investors increasingly want to know what’s being said and where. Not just on the top-of-chart shows everyone already monitors, but on the niche shows where the actual signal lives.

    The problem: until now, we only transcribed the top 20K English podcasts. That misses most of the long tail, which is where high-value, non-obvious conversations cluster: independent finance commentary, vertical B2B podcasts, health practitioner shows, regional industry pods, scene-specific culture pods.

    Across our API and DaaS book, the same complaint kept coming back: “this podcast doesn’t have a transcript.” Several enterprise clients had explicitly asked for expanded coverage. The expansion is the answer.

    What’s actually covered

    All English podcasts with a Power Score of 1 or higher at the time the episode is published.

    Power Score is our composite popularity rating on a 0–100 scale, factoring in 30+ signals (chart positions, listener engagement, social reach, production consistency, and more). A Power Score of 1 is our floor for “meaningful audience” — it cuts shows with effectively no listeners while keeping every podcast that matters.

    In numeric terms: that’s approximately 150,000 English podcasts. About 7.5× the previous coverage.

    For each in-scope show, transcripts now include:

    • All new episodes as they publish (Phase 1 — live May 5, 2026)
    • All episodes published since January 1, 2021 (Phase 2 — completes week of May 11, 2026)
    • Up to 5 hours per episode (our max transcript length)

    What’s NOT covered

    Worth being explicit, because the carve-outs matter:

    • Non-English podcasts. Coverage remains English-only.
    • Shows with Power Score 0 or null. These are typically inactive feeds, podcasts with no measurable audience, or shows where we don’t have enough signal to score them yet.
    • A small set of in-scope episodes that hit specific exclusions: T&C violations, broken or video-only audio, private feeds, podcasts that say they’re English but actually aren’t, spam or ad-only feeds, music podcasts, “extreme output” feeds (some shows publish hundreds of episodes a week), and duplicate content.

    Those carve-outs cluster at the very bottom of Power Score and account for about 10% of in-scope episodes ongoing, and roughly 40% on the historical backfill (older audio is more likely to be broken or de-hosted).

    A few things this expansion explicitly doesn’t do:

    • It’s not a Pro feature gate. Pro users get all of it.
    • It doesn’t change sponsor or brand-safety detection, those remain on the original top-20K transcript footprint while we work on better moderation.
    • It doesn’t introduce new data points, surfaces, formats, or products. It’s transcripts. There are just a lot more of them now.

    Who this is for

    Existing API and DaaS clients with transcript access. The most direct value. Several have asked for this for over a year. If you’re on Enterprise GraphQL or DaaS today, talk to your account team about upgrading scope (more on access mechanics below).

    Finance, hedge fund, and investment research teams. Long-tail finance and macro commentary, the independent analysts, niche markets podcasts, sector-specific shows, is where alpha-generating signal lives. The 5-year backfill matters here especially: you can backtest mention patterns against price moves rather than starting your monitoring from scratch.

    Brand monitoring agencies and B2B intelligence platforms. If your product surfaces brand or competitor mentions for clients, the long tail was a known gap. Now it’s not.

    Pro power users who live in transcript search and alerts. Especially in niche topics, your alerts will start surfacing shows you didn’t know existed because nothing had been transcribed there before.

    Who this isn’t for: clients focused on non-English markets, or clients only working with top-of-chart podcasts. You’re already covered.

    Versus the competition

    Honest take: a few other vendors transcribe at this scale. So scale-of-coverage isn’t, on its own, a defensible moat.

    What’s much rarer is offering a multi-year backfill as an ongoing capability, not just transcribing forward from your sign-up date, but giving you 5+ years of history in one query. Most vendors that compete on transcripts don’t.

    The bigger story, though, is what the transcripts are next to. At Podchaser, transcripts sit alongside:

    • Reach estimates (per-podcast and per-episode audience size)
    • Audience demographics (age, gender, income, parental status, geography, occupations, interests, brand affinities)
    • Power Score (our 0–100 popularity ranking)
    • Verified contacts (host, producer, booker — by role)
    • Credits graph (host and guest credits across shows and episodes)
    • Network rollups (publisher-level views and aggregates)
    • Brand safety scoring (on the original top-20K footprint, for now)

    You can query all of it together — through Enterprise GraphQL, through the REST API, in S3 dumps via DaaS, or in Pro. A transcript search that filters by audience demographics and brand-safety risk in one query is a different product from a transcript search that just returns a list of mentions.

    Transcripts are commoditizing. Our data, in context, is the moat.

    How to access — by surface

    Podchaser Pro

    Everything is available immediately. No coverage caps, no upsell, no gating. Search, alerts, monitoring reports, episode discovery — all of it now runs against the full ~150K library and the full historical backfill.

    If you’ve ever had a Pro alert that consistently returned nothing, this is the moment to revisit it. The shows you were missing probably weren’t missing, they just weren’t transcribed.

    Self-serve REST API

    All transcripts are available. The full set, no cap, no tier gate. Pull and search across the expanded library starting today.

    If you’re not yet building on the API, start at podchaser.com/api.

    Enterprise GraphQL

    This is where the access mechanics matter most. A few things to know:

    • Existing clients keep their current scope. You don’t automatically get the new transcripts. Talk to your account manager about upgrading.
    • New clients can be sold full coverage, or capped at the Top X English podcasts by Power Score. The product team built configurable limits in our admin tooling for this exact reason.
    • Access is set at the moment an episode publishes. If a podcast is in scope when episode A drops, that transcript is permanent for the client. If the podcast falls out of scope before episode B, episode B won’t be available — but episode A still will be. This keeps historical search results stable and predictable.

    DaaS (S3 dumps)

    Any transcript can be included in an S3 dump. Existing DaaS clients don’t automatically get the expansion unless they upgrade. Unlike the API, DaaS pricing can be split — you can buy just the backfill, just ongoing coverage, or both, scoped to specific shows or Power Score thresholds.

    Talk to your account team for the expanded dump.

    What this unblocks

    A few use cases that were marginal at top-20K coverage and become viable at ~150K:

    • Finance and macro signal monitoring — independent analyst pods, sector-specific shows, niche commentary
    • Vertical B2B competitive intel — small-but-influential industry podcasts that move buying decisions
    • Health and wellness narrative tracking — practitioner pods, niche modality shows, vertical health communities
    • Brand monitoring beyond the top of chart — finding the actual conversations driving sentiment shifts, not just the obvious ones
    • Historical comparative analysis — five years of history you can query against in one shot, instead of monitoring forward from now

    Because the long-tail coverage and the multi-year backfill make this the first time the full Podchaser graph has been usable for those workflows end-to-end.

    How to get started

    • Pro user: Just log in. Your alerts and searches now run against the full library.
    • Existing API or DaaS client: Contact your account manager about expanding scope.
    • Prospective API user: developers.podchaser.com — keys are self-serve.
    • Prospective Enterprise / DaaS client: Contact sales and we’ll match coverage to your use case.

    Frequently asked questions

    What is podcast transcript coverage at Podchaser today?

    As of May 5, 2026, Podchaser transcribes approximately 150,000 English podcasts with a Power Score of 1 or higher, including every in-scope episode published since January 1, 2021. Coverage was previously limited to the top 20,000 English podcasts.

    Is the new transcript coverage available in the Podchaser REST API?

    Yes. The self-serve REST API has full access to the expanded ~150K-podcast library and the 2021-onward backfill, with no per-tier coverage cap. See podchaser.com/api for documentation.

    Are existing Enterprise GraphQL clients automatically upgraded?

    No. Existing Enterprise GraphQL clients keep their current scope and do not automatically receive the expanded transcript coverage. Account managers can configure expanded scope on a per-client basis.

    Can I buy just the historical backfill?

    In the API (REST and GraphQL), no — you receive all transcripts within your scope as a single product. In DaaS / S3 dumps, yes — coverage can be split between just the historical backfill, just ongoing transcription, or both, and scoped to specific shows or Power Score thresholds.

    Will Podchaser Pro users get the new transcripts?

    Yes. There are no coverage caps in Pro. All ~150,000 podcasts in scope and the full 2021-onward backfill are available across Pro search, alerts, monitoring reports, and episode discovery.

    What’s the maximum transcript length?

    5 hours per episode. Episodes longer than 5 hours will not be transcribed.

    What podcasts are excluded from the expansion?

    Non-English podcasts, podcasts with a Power Score of 0 or null, and a small set of in-scope episodes that hit specific exclusions: T&C violations, broken or video-only audio, private feeds, podcasts mis-tagged as English, spam or ad-only feeds, music podcasts, extreme-output feeds, and duplicate content. These exclusions account for approximately 10% of in-scope episodes ongoing and 40% of the historical backfill.

    Does the expansion include sponsor or brand-safety detection?

    No. This expansion covers transcripts only. Sponsor detection and brand-safety scoring continue to run on the original top-20K transcript footprint while the team improves moderation upstream.

    Will turnaround time get worse with the volume increase?

    No measurable degradation is expected. Podchaser’s new internal Transcript Hub gives the team better pipeline visibility than before and is designed to catch processing issues earlier. Latency is not guaranteed to improve, but it can now be measured and managed precisely.

    What happens if a podcast’s Power Score drops below 1 after we transcribe it?

    Existing transcripts remain available to clients who already had access. New episodes released while that podcast is below the Power Score 1 threshold will not be transcribed. This protects historical search results from changing under clients.

    Is the new coverage available in non-English markets?

    No. The expansion is English-only. Non-English coverage is not part of this launch.


    Trent Anderson is Head of Growth & Strategy at Podchaser, where he leads marketing, demand generation, and product positioning across Pro and the API.