HSC2024: Open audio dataset

About

This training dataset is originally designed for the Helsinki Speech Challenge 2024 (HSC2024). While it was created with this challenge in mind, its applications extend far beyond, making it a valuable resource for developing and testing audio algorithms across diverse uses.

The dataset features clean speech samples generated by OpenAI’s text-to-speech model, paired with corresponding recorded signals. These recorded signals are purposefully distorted by real-world effects such as filtering and reverb, offering a realistic testing ground for your audio processing algorithms.

HSC2024 official website: https://blogs.helsinki.fi/helsinki-speech-challenge/

Creators

Ludvigsen Martin, Karvonen Elli, Juvonen Markus & Siltanen Samuli