AI-Powered Internet Radio Show Search Engine
Built with Python, Supabase, OpenAI Whisper, and PostgreSQL
This is a search engine I built for the internet radio show Time Crisis with Ezra Koenig, hosted on Apple Music One.
Essentially, I wanted to be able to look up words, artists, and songs from previous episodes and reference them when they get talked about in future episodes. The show is very self-referential (Borgesian, some might call it). So having a library of transcripts from the show becomes essential if you want to get the full joke of all the callbacks and recurring themes that make Time Crisis with Ezra Koenig so memorable and fun to listen to. I listened back to the show from the very beginning fairly recently, and I was struck by how much of a time capsule it is for the 2015 to 2020 era of American music and culture. Having an easily searchable record of transcripts for this internet radio show lets real TC heads connect the threads, the dots, and the so-and-so of the Time Crisis universe.
Funny enough, the week that I published this search engine on the Time Crisis subreddit, I scrolled back to see that another TC head had also created a transcript search engine. I want to assure the TC community and the greater world that these were ideas arrived upon independently, but even in the first place, a transcript search is not the most novel idea. Either way, this started out as my own way of searching up references from previous episodes and finding specific segments so I could share the show with my friends. I hope that this continues to enable this sort of lore-sharing behavior among future Time Crisis heads and other listeners.
Within Apple Music/Apple Podcasts, they have a transcript feature for episodes, but there's no way to search across the entire transcript history of an internet radio show available through the platform, which is why I wanted to create this website.
As a show with now 244 episodes, this meant transcribing 488 hours of audio, music, and conversation.
For the creation of the transcripts, I initially used a whisper.cpp model to transcribe the episodes individually. Eventually, I moved on to batch processing with various scripts using this setup, but I found a Mac application called MacWhisper, which made things a hundred times easier. Especially now as I begin to assign speakers and add further details to the transcripts, the MacWhisper app has a good suite of features that allow me to accomplish this.
The transcripts themselves are broken up into segments. The MacWhisper app does a pretty good job at recognizing speakers and separating segments from different speakers, which allows the reader to see the flow of the conversation. These timestamps were then used from the transcript to implement jump-to-point navigation from the search results to the individual episode page.
The setup of the current program is a Supabase backend paired with a Cloudflare Pages front-end deployment. I can upload and update transcripts via the Supabase backend without needing to push updates to the Cloudflare Pages deployment due to the API I created within Supabase to make calls upon the database of transcripts.
To load the transcripts into the site, I set up a simple bucket of transcript CSV files that populate the page when a user navigates to an episode's page template given an episode number in the URL.
Additional features were added to the search, like sorting by episode age (so you can see how a term has been used throughout the course of the show) or by the number of hits (which allows you to see the episodes where something is discussed the most). These were basic features that were easily implemented using Claude code and good knowledge of the codebase.
For the front-end design, I went for a Google imitation, but with the classic Joker Man font, which has been a topic of discussion on the show in the past. There's a dark and light mode with system preference detection. There are browsing pages for looking at the complete episode list and the complete guest list of the show. The design is also responsive to mobile, allowing for easy navigation on phones, tablets, and small screens.
I don't have any analytics implemented at the moment, but that would be a future addition to the site in order to understand what TC heads are searching for the most.
When it came to challenges for this project, I initially started with a Flask backend deployed on Google Cloud Platform, which was too much of a hassle, admittedly. Once I ran out of free Google Cloud credits, it began charging me way more than it should actually cost to host something like this. I immediately made the jump to Supabase in order to save money on a not-too-intensive application.
Migrating the site was pretty easy. I set up a front-end on Cloudflare Pages—I had initially purchased the domain through Cloudflare so I just set up Cloudflare Pages, made all the front-end changes I wanted, and configured Supabase. With the Supabase and Cloudflare duo working together, all I had to do was shut down the Google Cloud Platform instance and switch the DNS records to point to Cloudflare Pages rather than the Google Cloud Platform instance.
As for financial results, the initial deployment of this search engine would have cost me $30 a month for some reason via Google Cloud Platform. I probably didn't choose the right service to minimize costs, but I had free credits with GCP, so I wasn't too worried about it—that is, until the billing started to come in when I ran out of free credit.
As for impact, my friends and I who listen to the show love using the site to reference episodes from the past and to find specific bits that we want to re-listen to. I've shared it with the Time Crisis subreddit as well as with the show itself via their email.
This project as a whole got me familiar with the challenges behind audio transcription and the processes one might take to create a viewable webpage for transcriptions. I think that in the future I could use the basics I built for this site to provide API access for marketing—if companies wanted to see when their products were mentioned on an internet radio show, that's not often information easily accessible via Google, DuckDuckGo, Bing, etc. As for now, I consider this project to be finished, and it will be continually updated as new episodes release.
TC forever.
Experience the power of searchable internet radio show transcripts and connect the threads of the Time Crisis universe.