The SwissDial dataset is the first annotated parallel corpus of spoken Swiss German across 8 major dialects (AG, BE, BS, GR, LU, SG, VS, ZH). The data set includes around 3 hours of high-quality audio per dialect together with Swiss German and High German transcripts.
More details about the dataset can be found in the paper and on the project website.
When using the SwissDial data set for research purposes, please cite the SwissDial publication.
Update: SwissDial dataset version 1.1 is now available with an additional 7726 recorded GR sentences.