My primary goal for this was essentially to make it so anyone could really easily use it, but to also explain how I built it in case anyone would just prefer to build their own.
Eleven labs is pretty awesome, to be honest. It's powerful, let's you convert from text to speech, from speech to speech, and just about anything else you could imagine. It has a dead simple API you can use to generate tons of audio files, and it even has one of the coolest features around: