I am currently a Research Scientist at I2R, A*STAR. Prior to that, I was a Research Fellow at the Department of Electrical and Computer Engineering (ECE), National University of Singapore (NUS). I have received a Ph.D. degree from the National University of Singapore, supervised by Prof. Haizhou Li (IEEE Fellow) and Prof. Shuzhi Sam Ge (IEEE Fellow). During my PhD studies, I was a visiting research scholar at National Institute of Informatics (Japan), supervised by Prof. Junichi Yamagishi. I also studied at the Speech Processing Courses Summer School at the University of Crete with Prof. Yannis Stylianou (IEEE Fellow). I received a B.Sc degree from Nanjing University, Nanjing, China in 2017.

My research interest includes speech synthesis, audio large language models, automatic lyrics transcription, speech recognition, speech-to-singing conversion, singing information processing, music information retrieval and multi-modal processing. I have published more than 15 papers in leading journals and conferences, including IEEE/ACM Transaction on Audio, Speech and Language Processing (TALSP), IEEE Transactions on Multimedia (TMM), EMNLP, IEEE Signal Processing Letters (SPL), Speech Communications, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), INTERSPEECH, ACL, IEEE ASRU, IEEE APSIPA ASC, IEEE Spoken Language Technology Workshop (SLT) and Speaker Odyssey.

πŸ”₯ News

  • 2025: Β πŸŽ‰πŸŽ‰ Dr. Gao serves as one of the organizers of AAAI 2026 worshop on audio AI. Our AAAI 2026 workshop has been accepted and is now open for submissions! 🌟 Check it in audio-aaai.
  • 2025: Β πŸŽ‰πŸŽ‰ Dr. Gao serves as one of the advisory board of ICASSP 2026 grand challenge. Our ICASSP 2026 grand challenge has been accepted and is now open for submissions! 🌟 Check it in ICASSP 2026 Cadenza Challenge website.
  • 2025: Β πŸŽ‰πŸŽ‰ Our ICMI CCMI paper has been accepted for publication!
  • 2025: Β πŸŽ‰πŸŽ‰ Our IEEE ASRU paper has been accepted for publication!
  • 2025: Β πŸŽ‰πŸŽ‰ Our ACL paper has been accepted for publication!
  • 2025: Β πŸŽ‰πŸŽ‰ Our TALSP regular paper has been accepted for publication!
  • 2024: Β πŸŽ‰πŸŽ‰ Our ICASSP paper has been accepted for publication!
  • 2024: Β πŸŽ‰πŸŽ‰ Our AAAI has been accepted for publication!
  • 2024: Β πŸŽ‰πŸŽ‰ Our TMM has been accepted for publication!
  • 2024: Β πŸŽ‰πŸŽ‰ Our EMNLP has been accepted for publication!
  • 2024: Β πŸŽ‰πŸŽ‰ Our SLT has been accepted for publication!
  • 2024: Β πŸŽ‰πŸŽ‰ Two Signal Processing Letters have been accepted for publication!
  • 2023: Β πŸŽ‰πŸŽ‰ Dr. Gao was invited as the leading Guest Editor of the special issue β€œModeling of Multimodal Speech Recognition and Language Processing” in Electronics (IF:2.9, ISSN 2079-9292).
  • 2023: Β πŸŽ‰πŸŽ‰ Our TALSP regular paper has been accepted for publication!
  • 2023: Β πŸŽ‰πŸŽ‰ Two papers have been accepted by ICASSP 2023!
  • 2020: Β πŸŽ‰πŸŽ‰ Won first places for two tasks in Automatic Lyrics-to-Audio Alignment Task in Music Information Retreval Evaluation eXchange International Benchmarking Competition 2020. Check it in NUS ECE news.
  • 2019: Β πŸŽ‰πŸŽ‰ Received Best Poster Award Runner Up Prize at 4th Workshop for Young Female Researchers in INTERSPEECH, Graz, Austria. Check it in NUS ECE news.

πŸ“œ Research Area

Speech Processing :
   Automatic speech recognitionοΌ›Speech-to-singing conversion; Voice conversion; Speech synthesis; Audio security
Singing Processing :
   Speech-to-singing conversion; Singing voice conversion; Automatic lyrics transcription of solo-singing; Lyrics-to-audio alignment
Music Information Retrieval :
   Automatic lyrics transcription of polyphonic music; Automatic chord transcription; Music source separation; Automatic musical genre recognition
Multi-modal Processing :
   Audio-visual active speaker detection
Self-supervised Learning :
   Self-supervised speech processing; Self-supervised language processing
Large Language Models :
   Audio large language models; speech LLMs; speech synthesis with large language models

πŸ’» Research Experiences

  • 2024.02 - Present, Research Scientist, I2R, A*STAR.
  • 2023.11 - 2024.01, Visiting Researcher, Academia Sinica.
  • 2022.11 - 2023.11, Research Fellow, National University of Singapore (NUS), Singapore.
  • 2022.07 - 2022.08, Research Scholar, National Institute of Informatics, Japan.
  • 2019.07, Research Scholar, University of Crete, Greece.
  • 2018.11 - 2021.12, Research Engineer, National University of Singapore (NUS), Singapore.
  • 2018.01 - 2018.11, Research Asistant, National University of Singapore (NUS), Singapore.

πŸ“– Educations

  • 2017.08 - 2022.10, Ph.D. in Electrical and Computer Engineering, National University of Singapore (NUS), Singapore.
  • 2013.09 - 2017.07, B.Sc. in Electronic Information Science and Technology, Nanjing University, Nanjing, China.

πŸ“ Publications

– Journal Papers –

– Conference Papers –

πŸŽ– Honors and Awards

  • 2020 Ranked first in Automatic Lyrics-to-Audio Alignment Task in Music Information Retreval Evaluation eXchange International Benchmarking Competition 2020. The winning Lyrics-to-Audio Alignment system NUS Auto Lyrix Align is now available online as an interactive web interface: The winning Lyrics-to-Audio Alignment system NUS Auto Lyrix Align is now available online as an interactive web interface: https://autolyrixalign.hltnus.org/
  • 2020 Ranked first in Automatic Lyrics Transcription Task in Music Information Retreval Evaluation eXchange International Benchmarking Competition 2020.
  • 2019 Best Poster Award Runner Up Prize, β€œSpeech-to-Singing Conversion and Synthesis” at 4th Workshop for Young Female Researchers in INTERSPEECH, Graz, Austria.
  • 2019 ISCA Grants,β€œAverage Modeling for Spectral Mapping in Speech-to-Singing Conversion” at 2019 Speech Processing Courses in Crete Conversational Speech Synthesis: from design to evaluation, University of Crete, Heraklion Crete, Greece.
  • 2016 Meritorious Winner (Top 8% winner), American Mathematical Contest in Modeling.
  • 2015 National Second Prize, National Undergraduate Electronic Design Contest.

πŸ’¬ Talks

  • 2022.08, Automatic Lyrics Transcription of Polyphonic Music, National Institute of Informatics, Japan.
  • 2022.06, Music-robust Automatic Lyrics Transcription of Polyphonic Music, SMC 2022, France (virtual).
  • 2022.05, Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music, ICASSP, Singapore.
  • 2019.11, Speaker-independent Spectral Mapping for Speech-to-Singing Conversion, IEEE APSIPA ASC, Lanzhou, China.
  • 2018.10, NUS-HLT Spoken Lyrics and Singing (SLS) Corpus, IEEE ICOT, Bali, Indonesia.

πŸ’» Internships

  • 2023.11 - 2024.01, Visiting Researcher, Academia Sinica.
  • 2022.07 - 2022.08, National Institute of Informatics, Japan.
  • 2019.07, Research Scholar at the Speech Processing Courses Summer School, University of Crete, Heraklion Crete, Greece.

πŸ“š Research Web Platform

πŸ‘” Projects

  • Human-Robot Collaborative AI for Advanced Manufacturing And Engineering, NUS, Singapore.
  • Perfect Singing Vocals, NUS, Singapore.