We show the audio samples, transcription outputs and transcription performances (WER) of the following attack approaches.
Clean : Original clean audios from the datasets.
White Noise : Random white noise (naive baseline) are added to clean audios for decoding.
PGD : Advasarial samples are generated from projected gradient desent attack method (the strong baseline).
SAGO : Advasarial samples are generated from the proposed speech-aware gradient optimization (SAGO) attack.
MI-FGSM : Advasarial samples are generated from the proposed momentum iterative fast gradient sign method (MI-FGSM).
VMI-FGSM : Advasarial samples are generated from the proposed variance tuning momentum iterative fast gradient sign method (VMI-FGSM).
Decoded Examples from Different Attack Approaches
Example 1
Reference: so uncas you had better take the lead while i will put on the skin again and trust to cunning for want of speed.
White Noise Hypothesis (WER 0.00 %): so uncas you had better take the lead while i will put on the skin again and trust to cunning for want of speed.
PGD Hypothesis (WER 0.00 %): so uncas you had better take the lead while i will put on the skin again and trust to cunning for want of speed.
SAGO Hypothesis (WER 50.00 %): so uncas be it better take to leave that the fire would put on this gid again and trust depending for want of speed.
MI-FGSM Hypothesis (WER 41.67 %): and so one couldst be a better take to leave while i would put on the skin again and trust to cunning of her want of speed.
VMI-FGSM Hypothesis (WER 41.67%): so when 1st you had better take to leave for i will put on the skin again and trust you cunning for one to sweep with.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 2
Reference: but in this friendly pressure raoul could detect the nervous agitation of a great internal conflict.
White Noise Hypothesis (WER 0.00 %): but in this friendly pressure raoul could detect the nervous agitation of a great internal conflict.
PGD Hypothesis (WER 43.74 %): while in this friendly treasure raoul could detect a nervous agitation out of great internocondland.
SAGO Hypothesis (WER 81.25%): by amos friendly cardinal robbed with the technine nervous agitation out of dray into no conduined.
MI-FGSM Hypothesis (WER 75.00 %): by in this ranly treasure robbed with detecting nervous agitation out of gray internal pond life.
VMI-FGSM Hypothesis (WER 81.25 %): but in the friendly creature ralph could detect being merely is education at her grave interlook on link.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 3
Reference: i thought we were stumped again when i 1st saw that picture but it has been of some use after all.
White Noise Hypothesis (WER 0.00 %): i thought we were stumped again when i 1st saw that picture but it has been of some use after all.
PGD Hypothesis (WER 33.33 %): i thought we were stumped again when i 1st saw that fiction of bennett and a snail of some native after all.
SAGO Hypothesis (WER 76.19 %): i thought we were stumpy getting and i 1st was all that vexioner the vedder and it would have sung me to fantaker wall.
MI-FGSM Hypothesis (WER 66.67 %): i thought we were something having in my 1st saw map pension or credit spend of some need of fast parov.
VMI-FGSM Hypothesis (WER 52.38 %): i thought we were something in then if you saw that bachelor betting a stand of some needes after all.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 4
Reference: i say you do know what this means and you must tell us.
White Noise Hypothesis (WER 0.00 %): i say you do know what this means and you must tell us.
PGD Hypothesis (WER 15.38 %): i said you do know what this means and you must tell us good.
SAGO Hypothesis (WER 69.23 %): i say you do know what bess me losing eh give mars a towel after.
MI-FGSM Hypothesis (WER 53.85 %): i say you am clearer wi this minx ere you must halves.
VMI-FGSM Hypothesis (WER 30.77 %): i said you do know what this realist ere do you must tell us.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 5
Reference: for some time after that i remembered nothing distinctly.
White Noise Hypothesis (WER 0.00 %): for some time after that i remembered nothing distinctly.
PGD Hypothesis (WER 11.11 %): for some time after that i remembered nothing disdainfully.
SAGO Hypothesis (WER 44.44 %): and for some time after that i remembered when nothing is daintily.
MI-FGSM Hypothesis (WER 44.44 %): for some time after that i remembered nothing to stay with me.
VMI-FGSM Hypothesis (WER 44.44%): for some time after that i remember nothing of his dignity.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 6
Reference: the delawares are children of the tortoise and they outstrip the deer.
White Noise Hypothesis (WER 0.00 %): the delawares are children of the tortoise and they out strip the deer.
PGD Hypothesis (WER 41.67 %): the delawares are children in the tortoise and ants drifted youth.
SAGO Hypothesis (WER 75.00 %): the delawares i have children in the northern savoyance drift in youth.
MI-FGSM Hypothesis (WER 83.33 %): a delaware as i children of the torres announced drifted youth.
VMI-FGSM Hypothesis (WER 66.67 %): but delawares our children of the tortuous amounts drifted here.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Clean | White Noise | PGD | SAGO | MI-FGSM | VMI-FGSM | |
---|---|---|---|---|---|---|
Samples |
Reference: but in this friendly pressure raoul could detect the nervous agitation of a great internal conflict.
White Noise Hypothesis (WER 0.00 %): but in this friendly pressure raoul could detect the nervous agitation of a great internal conflict.
PGD Hypothesis (WER 43.74 %): while in this friendly treasure raoul could detect a nervous agitation out of great internocondland.
SAGO Hypothesis (WER 81.25%): by amos friendly cardinal robbed with the technine nervous agitation out of dray into no conduined.
MI-FGSM Hypothesis (WER 75.00 %): by in this ranly treasure robbed with detecting nervous agitation out of gray internal pond life.
VMI-FGSM Hypothesis (WER 81.25 %): but in the friendly creature ralph could detect being merely is education at her grave interlook on link.
Clean | White Noise | PGD | SAGO | MI-FGSM | VMI-FGSM | |
---|---|---|---|---|---|---|
Samples |
Example 3
Reference: i thought we were stumped again when i 1st saw that picture but it has been of some use after all.
White Noise Hypothesis (WER 0.00 %): i thought we were stumped again when i 1st saw that picture but it has been of some use after all.
PGD Hypothesis (WER 33.33 %): i thought we were stumped again when i 1st saw that fiction of bennett and a snail of some native after all.
SAGO Hypothesis (WER 76.19 %): i thought we were stumpy getting and i 1st was all that vexioner the vedder and it would have sung me to fantaker wall.
MI-FGSM Hypothesis (WER 66.67 %): i thought we were something having in my 1st saw map pension or credit spend of some need of fast parov.
VMI-FGSM Hypothesis (WER 52.38 %): i thought we were something in then if you saw that bachelor betting a stand of some needes after all.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 4
Reference: i say you do know what this means and you must tell us.
White Noise Hypothesis (WER 0.00 %): i say you do know what this means and you must tell us.
PGD Hypothesis (WER 15.38 %): i said you do know what this means and you must tell us good.
SAGO Hypothesis (WER 69.23 %): i say you do know what bess me losing eh give mars a towel after.
MI-FGSM Hypothesis (WER 53.85 %): i say you am clearer wi this minx ere you must halves.
VMI-FGSM Hypothesis (WER 30.77 %): i said you do know what this realist ere do you must tell us.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 5
Reference: for some time after that i remembered nothing distinctly.
White Noise Hypothesis (WER 0.00 %): for some time after that i remembered nothing distinctly.
PGD Hypothesis (WER 11.11 %): for some time after that i remembered nothing disdainfully.
SAGO Hypothesis (WER 44.44 %): and for some time after that i remembered when nothing is daintily.
MI-FGSM Hypothesis (WER 44.44 %): for some time after that i remembered nothing to stay with me.
VMI-FGSM Hypothesis (WER 44.44%): for some time after that i remember nothing of his dignity.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 6
Reference: the delawares are children of the tortoise and they outstrip the deer.
White Noise Hypothesis (WER 0.00 %): the delawares are children of the tortoise and they out strip the deer.
PGD Hypothesis (WER 41.67 %): the delawares are children in the tortoise and ants drifted youth.
SAGO Hypothesis (WER 75.00 %): the delawares i have children in the northern savoyance drift in youth.
MI-FGSM Hypothesis (WER 83.33 %): a delaware as i children of the torres announced drifted youth.
VMI-FGSM Hypothesis (WER 66.67 %): but delawares our children of the tortuous amounts drifted here.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Clean | White Noise | PGD | SAGO | MI-FGSM | VMI-FGSM | |
---|---|---|---|---|---|---|
Samples |
Reference: i say you do know what this means and you must tell us.
White Noise Hypothesis (WER 0.00 %): i say you do know what this means and you must tell us.
PGD Hypothesis (WER 15.38 %): i said you do know what this means and you must tell us good.
SAGO Hypothesis (WER 69.23 %): i say you do know what bess me losing eh give mars a towel after.
MI-FGSM Hypothesis (WER 53.85 %): i say you am clearer wi this minx ere you must halves.
VMI-FGSM Hypothesis (WER 30.77 %): i said you do know what this realist ere do you must tell us.
Clean | White Noise | PGD | SAGO | MI-FGSM | VMI-FGSM | |
---|---|---|---|---|---|---|
Samples |
Example 5
Reference: for some time after that i remembered nothing distinctly.
White Noise Hypothesis (WER 0.00 %): for some time after that i remembered nothing distinctly.
PGD Hypothesis (WER 11.11 %): for some time after that i remembered nothing disdainfully.
SAGO Hypothesis (WER 44.44 %): and for some time after that i remembered when nothing is daintily.
MI-FGSM Hypothesis (WER 44.44 %): for some time after that i remembered nothing to stay with me.
VMI-FGSM Hypothesis (WER 44.44%): for some time after that i remember nothing of his dignity.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Example 6
Reference: the delawares are children of the tortoise and they outstrip the deer.
White Noise Hypothesis (WER 0.00 %): the delawares are children of the tortoise and they out strip the deer.
PGD Hypothesis (WER 41.67 %): the delawares are children in the tortoise and ants drifted youth.
SAGO Hypothesis (WER 75.00 %): the delawares i have children in the northern savoyance drift in youth.
MI-FGSM Hypothesis (WER 83.33 %): a delaware as i children of the torres announced drifted youth.
VMI-FGSM Hypothesis (WER 66.67 %): but delawares our children of the tortuous amounts drifted here.
Clean
White Noise
PGD
SAGO
MI-FGSM
VMI-FGSM
Samples
Clean | White Noise | PGD | SAGO | MI-FGSM | VMI-FGSM | |
---|---|---|---|---|---|---|
Samples |
Reference: the delawares are children of the tortoise and they outstrip the deer.
White Noise Hypothesis (WER 0.00 %): the delawares are children of the tortoise and they out strip the deer.
PGD Hypothesis (WER 41.67 %): the delawares are children in the tortoise and ants drifted youth.
SAGO Hypothesis (WER 75.00 %): the delawares i have children in the northern savoyance drift in youth.
MI-FGSM Hypothesis (WER 83.33 %): a delaware as i children of the torres announced drifted youth.
VMI-FGSM Hypothesis (WER 66.67 %): but delawares our children of the tortuous amounts drifted here.
Clean | White Noise | PGD | SAGO | MI-FGSM | VMI-FGSM | |
---|---|---|---|---|---|---|
Samples |