I am currently a Research Scientist at I2R, A*STAR. Prior to that, I was a Research Fellow at the Department of Electrical and Computer Engineering (ECE), National University of Singapore (NUS). I have received a Ph.D. degree from the National University of Singapore, supervised by Prof. Haizhou Li (IEEE Fellow) and Prof. Shuzhi Sam Ge (IEEE Fellow). During my PhD studies, I was a visiting research scholar at National Institute of Informatics (Japan), supervised by Prof. Junichi Yamagishi. I also studied at the Speech Processing Courses Summer School at the University of Crete with Prof. Yannis Stylianou (IEEE Fellow). I received a B.Sc degree from Nanjing University, Nanjing, China in 2017.

My research interest includes speech synthesis, audio large language models, automatic lyrics transcription, speech recognition, speech-to-singing conversion, singing information processing, music information retrieval and multi-modal processing. I have published more than 15 papers in leading journals and conferences, including IEEE/ACM Transaction on Audio, Speech and Language Processing (TALSP), IEEE Transactions on Multimedia (TMM), EMNLP, IEEE Signal Processing Letters (SPL), Speech Communications, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), INTERSPEECH, IEEE APSIPA ASC, IEEE Spoken Language Technology Workshop (SLT) and Speaker Odyssey.

🔥 News

2025: 🎉🎉 Our ACL paper has been accepted for publication!
2025: 🎉🎉 Our TALSP regular paper has been accepted for publication!
2024: 🎉🎉 Our ICASSP paper has been accepted for publication!
2024: 🎉🎉 Our AAAI has been accepted for publication!
2024: 🎉🎉 Our TMM has been accepted for publication!
2024: 🎉🎉 Our EMNLP has been accepted for publication!
2024: 🎉🎉 Our SLT has been accepted for publication!
2024: 🎉🎉 Two Signal Processing Letters have been accepted for publication!
2023: 🎉🎉 Dr. Gao was invited as the leading Guest Editor of the special issue “Modeling of Multimodal Speech Recognition and Language Processing” in Electronics (IF:2.9, ISSN 2079-9292).
2023: 🎉🎉 Our TALSP regular paper has been accepted for publication!
2023: 🎉🎉 Two papers have been accepted by ICASSP 2023!
2020: 🎉🎉 Won first places for two tasks in Automatic Lyrics-to-Audio Alignment Task in Music Information Retreval Evaluation eXchange International Benchmarking Competition 2020. Check it in NUS ECE news.
2019: 🎉🎉 Received Best Poster Award Runner Up Prize at 4th Workshop for Young Female Researchers in INTERSPEECH, Graz, Austria. Check it in NUS ECE news.

📜 Research Area

Speech Processing : Automatic speech recognition；Speech-to-singing conversion; Voice conversion; Speech synthesis; Audio security	Singing Processing : Speech-to-singing conversion; Singing voice conversion; Automatic lyrics transcription of solo-singing; Lyrics-to-audio alignment	Music Information Retrieval : Automatic lyrics transcription of polyphonic music; Automatic chord transcription; Music source separation; Automatic musical genre recognition
Multi-modal Processing : Audio-visual active speaker detection	Self-supervised Learning : Self-supervised speech processing; Self-supervised language processing	Large Language Models : Audio large language models; speech LLMs; speech synthesis with large language models

💻 Research Experiences

2024.02 - Present, Research Scientist, I2R, A*STAR.
2023.11 - 2024.01, Visiting Researcher, Academia Sinica.
2022.11 - 2023.11, Research Fellow, National University of Singapore (NUS), Singapore.
2022.07 - 2022.08, Research Scholar, National Institute of Informatics, Japan.
2019.07, Research Scholar, University of Crete, Greece.
2018.11 - 2021.12, Research Engineer, National University of Singapore (NUS), Singapore.
2018.01 - 2018.11, Research Asistant, National University of Singapore (NUS), Singapore.

📖 Educations

2017.08 - 2022.10, Ph.D. in Electrical and Computer Engineering, National University of Singapore (NUS), Singapore.
2013.09 - 2017.07, B.Sc. in Electronic Information Science and Technology, Nanjing University, Nanjing, China.

📝 Publications

– Journal Papers –

Xiaoxue Gao, Yiming Chen, Xianghu Yue, Yu Tsao and Nancy F. Chen, TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations, TALSP, 2025.
Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang and Haizhou Li, Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training, TMM, 2024.
Xiaoxue Gao, Zexin Li, Yiming Chen, Cong Liu and Haizhou Li, Transferable Adversarial Attacks against ASR, SPL, 2024.
Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Jiadong Wang and Haizhou Li, Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks, SPL, 2024.
Xianghu Yue, Xiaoxue Gao^*, Xinyuan Qian, Haizhou Li, Adapting Pre-Trained Self-Supervised Learning Model for Speech Recognition with Light-Weight Adapters, Electronics, 2024.
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li, PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic Music, TALSP, 2023.
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li, Automatic Lyrics Transcription of Polyphonic Music with Lyrics-Chords Multi-Task Learning, TALSP, 2022.
Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian and Haizhou Li, NHSS: A Speech and Singing Parallel Database, Speech Communication, 2021.

– Conference Papers –

Zhengyuan Liu, Geyu Lin, Hui Li Tan, Huayun Zhang, Yanfeng Lu, Xiaoxue Gao, Stella Xin Yin, He Sun, Hock Huan Goh, Lung Hsiang Wong and Nancy F. Chen, SingaKids: A Multilingual Multimodal Dialogic Tutor for Language Learning, ACL, 2025.
Xiaoxue Gao, Chen Zhang, Yiming Chen, Huayun Zhang and Nancy F. Chen, Emo-dpo: Controllable emotional speech synthesis through direct preference optimization, ICASSP, 2025.
Kuluhan Binici, Abhinav Ramesh Kashyap, Viktor Schlegel, Andy T Liu, Vijay Prakash Dwivedi, Thanh-Tung Nguyen, Xiaoxue Gao, Nancy F Chen, Stefan Winkler, MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues, AAAI, 2025.
Yiming Chen, Xianghu Yue, Xiaoxue Gao, Chen Zhang, Luis Fernando D’Haro, Robby T. Tan and Haizhou Li, Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models, EMNLP, 2024.
Xiaoxue Gao and Nancy F. Chen, Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models, SLT, 2024.
Xiaoxue Gao, Xianghu Yue, Haizhou Li, Self-Transriber: Few-shot Lyrics Transcription with Self-training, ICASSP, 2023.
Xianghu Yue, Xiaoxue Gao^*, Haizhou Li, token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text, ICASSP, 2023.
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li, Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music, ICASSP, 2022.
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li, Music-robust Automatic Lyrics Transcription of Polyphonic Music, SMC, 2022.
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li, Lyrics Transcription and Lyrics-to-audio Alignment with Music Informed Acoustic Models, MIREX, 2020.
Xiaoxue Gao, Xiaohai Tian, Rohan Kumar Das, Yi Zhou and Haizhou Li, Personalized Singing Voice Generation Using WaveRNN, Speaker Odyssey, 2020.
Xiaoxue Gao, Xiaohai Tian, Rohan Kumar Das, Yi Zhou and Haizhou Li, Speaker-independent Spectral Mapping for Speech-to-Singing Conversion, IEEE APSIPA ASC, 2019.
Chitralekha Gupta, Karthika Vijayan, Bidisha Sharma, Xiaoxue Gao and Haizhou Li, NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion, INTERSPEECH, 2019.
Karthika Vijayan, Xiaoxue Gao and Haizhou Li, Analysis of Speech and Singing Signals for Temporal Alignment,IEEE APSIPA ASC 2018
Xiaoxue Gao, Berrak Sisman, Rohan Kumar Das and Karthika Vijayan, NUS-HLT Spoken Lyrics and Singing (SLS) Corpus, IEEE ICOT, 2018

🎖 Honors and Awards

2020 Ranked first in Automatic Lyrics-to-Audio Alignment Task in Music Information Retreval Evaluation eXchange International Benchmarking Competition 2020. The winning Lyrics-to-Audio Alignment system NUS Auto Lyrix Align is now available online as an interactive web interface: The winning Lyrics-to-Audio Alignment system NUS Auto Lyrix Align is now available online as an interactive web interface: https://autolyrixalign.hltnus.org/
2020 Ranked first in Automatic Lyrics Transcription Task in Music Information Retreval Evaluation eXchange International Benchmarking Competition 2020.
2019 Best Poster Award Runner Up Prize, “Speech-to-Singing Conversion and Synthesis” at 4th Workshop for Young Female Researchers in INTERSPEECH, Graz, Austria.
2019 ISCA Grants,“Average Modeling for Spectral Mapping in Speech-to-Singing Conversion” at 2019 Speech Processing Courses in Crete Conversational Speech Synthesis: from design to evaluation, University of Crete, Heraklion Crete, Greece.
2016 Meritorious Winner (Top 8% winner), American Mathematical Contest in Modeling.
2015 National Second Prize, National Undergraduate Electronic Design Contest.

💬 Talks

2022.08, Automatic Lyrics Transcription of Polyphonic Music, National Institute of Informatics, Japan.
2022.06, Music-robust Automatic Lyrics Transcription of Polyphonic Music, SMC 2022, France (virtual).
2022.05, Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music, ICASSP, Singapore.
2019.11, Speaker-independent Spectral Mapping for Speech-to-Singing Conversion, IEEE APSIPA ASC, Lanzhou, China.
2018.10, NUS-HLT Spoken Lyrics and Singing (SLS) Corpus, IEEE ICOT, Bali, Indonesia.

💻 Internships

2023.11 - 2024.01, Visiting Researcher, Academia Sinica.
2022.07 - 2022.08, National Institute of Informatics, Japan.
2019.07, Research Scholar at the Speech Processing Courses Summer School, University of Crete, Heraklion Crete, Greece.

📚 Research Web Platform

Personalized Speech-to-Singing Web 🎼: https://m.youtube.com/watch?v=zjtNUbo-v7w&feature=youtu.be
Speaker-independent Spectral Mapping for Speech-to-Singing Conversion: https://xiaoxue1117.github.io/sample/
Speech and Singing Parallel Database: https://hltnus.github.io/NHSSDatabase/
Personalized Singing Voice Generation Demo: https://xiaoxue1117.github.io/odysseysample/
Lyrics-to-Audio Alignment Interactive Web Interface 🎼: https://autolyrixalign.hltnus.org/
Few-shot Lyrics Transcription Demo: https://xiaoxue1117.github.io/icassp2023/
Integrated Training and Extractor and Lyrics Transcriber Demo: https://xiaoxue1117.github.io/PaperSample/

👔 Projects

Human-Robot Collaborative AI for Advanced Manufacturing And Engineering, NUS, Singapore.
Perfect Singing Vocals, NUS, Singapore.

Gao Xiaoxue