Cem Mil Podcasts: A Spoken Portuguese Document Corpus
This document describes the Portuguese language podcast dataset released by Spotify for academic research purposes. We give an overview of how the data was sampled, some basic statistics over the collection, as well as brief information of distribution over Brazilian and Portuguese dialects.
READ FULL TEXT