We show the transcription outputs, transcription performances (WER) and the error patterns of the following models.
Supervised model : Supervised model is trained on labeled solo-singing data from DS1.
Self-TranscriberS2 : Self-Transcriber-S2 incorporates self-training technique trained on less amount of unlabeled data DS31 and labeled data DS1 with two iterations.
Self-Transcriber2 : Self-Transcriber2 incorporates self-training technique trained on more unlabeled data DS301 and labeled data DS1 with two iterations.
Decoded Examples from Different Models
Example 1
Reference: YOU ARE GETTING CLOSER IN SLOW MOTION
Supervised model Hypothesis (WER 28.57 % with 0 ins, 1 del, 1 sub): YOU ARE GETTING CLOSER IN SOMOTION
Self-TranscriberS2 Hypothesis (WER 14.29 % with 0 ins, 0 del, 1 sub)): YOU WERE GETTING CLOSER IN SLOW MOTION
Self-Transcriber2 Hypothesis(WER 0.00 % with 0 ins, 0 del, 0 sub): YOU ARE GETTING CLOSER IN SLOW MOTION
Example 2
Reference: OVER YOUR SHOULDER I WAS STONE COLD SOBER I PULLED YOU CLOSER TO MY CHEST
Supervised model Hypothesis (WER 33.33 % with 0 ins, 2 del, 3 sub): OVER YOUR SHOULDER ONE AND OVER I PULLED YOU CLOSER TO MY CHEST
Self-TranscriberS2 Hypothesis (WER 20.00 % with 1 ins, 0 del, 2 sub)): OVER YOUR SHOULDERS I WAS STONE AND COLD BUT I PULLED YOU CLOSER TO MY CHEST
Self-Transcriber2 Hypothesis(WER 13.33 % with 0 ins, 0 del, 2 sub): OVER YOUR SHOULDER I WAS STONE AND SOME I PULLED YOU CLOSER TO MY CHEST
Example 3
Reference: LIVING JUST TO FIND EMOTION HIDING SOMEWHERE IN THE NIGHT
Supervised model Hypothesis (WER 30.00 % with 1 ins, 0 del, 2 sub): LIVING JUST TO FIND EMOTIONS HIGH HEADING SOMEWHERE IN THE NIGHT
Self-TranscriberS2 Hypothesis (WER 20.00 % with 1 ins, 0 del, 1 sub)): A LIVING JUST TO FIND EMOTIONS HIDING SOMEWHERE IN THE NIGHT
Self-Transcriber2 Hypothesis(WER 0.00 % with 0 ins, 0 del, 0 sub): LIVING JUST TO FIND EMOTION HIDING SOMEWHERE IN THE NIGHT
Example 4
Reference: THROUGH IT WE GON' DO IT LAINIE UNCLE'S CRAZY AIN'T HE YEAH BUT HE LOVES YOU GIRL AND YOU BETTER KNOW IT
Supervised model Hypothesis (WER 59.09 % with 1 ins, 3 del, 9 sub): FOR WE WERE ON THE LINE YOUNG WAS CRAZY AND HEAR BUT HE LOVES A GIRL AND YOU BETTER KNOW
Self-TranscriberS2 Hypothesis (WER 31.82 % with 1 ins, 2 del, 4 sub)): THROUGH IT WE GONNA DO IT LIKE IT WAS CRAZY AIN'T YEAH BUT HE LOVES A GIRL AND YOU BETTER KNOW
Self-Transcriber2 Hypothesis (WER 22.73 % with 0 ins, 1 del, 4 sub): THROUGH IT WE GON' DO IT LANE YOUNG CRAZY AND HY YEAH BUT HE LOVES YOU GIRL AND YOU BETTER KNOW