There are hundreds of thousands of podcasts out there, each with hundreds of episodes chock full of really useful information and awesome songs. Problem is, that's all locked up behind a lifetime's worth of audio, unindexed and unsearchable.
Podiki detects songs and transcribes speech in podcasts, making it available to be searched, linked up, indexed and updated.
There are two parts of Podiki: the processing of podcasts and a wiki.
Submitted podcasts' new episodes are crawled and all the speech and song data extracted. As users correct the text this creates a feedback loop that updates the linguistic model used to transcribe future episodes.
The song information is determined using EchoPrint and the speech detection and transcription uses the Sphinx4 library.
The background processing is written in Scala and is backed by Redis (atm).