| CARVIEW |
![]() |
Kai YuPh.D. (Cantab) FISCA FIEEE Distinguished Professor Head of the Cross-media Language Intelligence (X-LANCE) Lab (Former SpeechLab) Director of the Machine Intelligence Institute School of Computer Science Shanghai Jiao Tong University Email: kai.yu [AT] sjtu [DOT] edu [DOT] cn Address: School of Computer Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China [中文]|[English] |
Biography
I am currently a distinguished professor and the director of Machine Intelligence Institute of the School of Computer Science at Shanghai Jiao Tong University (SJTU), as well as the co-founder and chief scientist of AISpeech. I am a fellow of ISCA (International Speech Communication Association), fellow of IEEE (Institute of Electrical and Electronics Engineers) and distinguished member of CCF (China Computer Federation).
My academic journey began at the Department of Automation at Tsinghua University, where I completed my bachelor and master degrees in 1999 and 2002 respectively. I obtained my PhD at the Machine Intelligence Lab of the Engineering Department, Cambridge University, U.K. in 2006 and then worked as a senior research associate there. I joined SJTU in 2012 and founded SpeechLab at SJTU. Later, SpeechLab is extended and renamed as Cross-media Language Intelligence (X-LANCE) Lab as it is now. I have served as a member of IEEE Speech and Language Processing Technical Committee (2017-2019) as well as an associate editor of IEEE/ACM Transactions on Audio, Speech, and Language Processing (2019-2024). I am currently a board member of the IEEE Signal Processing Society Conferences Board and Membership Board. I am also a member of the CCF (China Computer Federation) council and serve as the director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF.
My research interests primarily lie in the field of conversational AI, including rich aspects of speech and language processing as well as multi-modal linguistic computing. The goal of my research is to build cognitive conversational agent which can operate in complex real-world environment, deal with uncertainty, deliver information in a humanized way and evolve via interacting with environment. I have published over 200 peer-reviewed journal and conference papers and won numerous paper awards. I used to serve as program chairs for Interspeech, ICMI and SigDial, general chair for National Conference on Man-machine Communication (the largest domestic speech conference in China), as well as area chairs of speech processing or dialogue systems for Interspeech, ACL, EMNLP etc.
The outcome of my research have been both recognized in academia and successfully industrialized. I founded AISpeech to commercialize state-of-the-art speech and language processing technology. AISpeech has been selected into the “AI Key Players” list in the Equity Research Report of AI by Goldman Sachs in 2016 and one of the Cool Vendors for AI (East Asia) by Gartner in 2017. On behalf of AISpeech, I am also leading the National AI Open Innovation Platform on Language Computing, granted by Ministry of Science and Technology of China in 2022.
SJTU X-LANCE Lab
We are looking for self-motivated Ph.D./master/undergraduate students and postdocs interested in speech and language processing. Please send your CV to me if you want to join us.Research Interests
- Speech and Audio Processing: neural speech signal processing, robust speech and speaker recognition, high-fidelity speech synthesis, audio analysis and auditory cognition, multi-modal speech processing and universal audio model
- Natural Language Processing: structured language understanding, KBQA and machine reading comprehension, statistical dialogue systems, multi-lingual language processing, foundation language model, large language model agent
- Multi-modal interaction: digital avatar, GUI understanding and manipulation, AGI for science
Selected Publication [Google Scholar][More Papers]
Speech and Audio Processing
-
ASR TDT-KWS: Fast and Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu and Kai Yu
ICASSP 2024 -
TTS Text-To-Speech With Latent Diffusion
Zhijun Liu, Yiwei Guo and Kai Yu
ICASSP 2023 -
TTS VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
Chenpeng Du, Yiwei Guo, Xie Chen and Kai Yu
Interspeech 2022 -
RAA Towards Duration Robust Weakly Supervised Sound Event Detection
Heinrich Dinkel, Mengyue Wu and Kai Yu
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 887-900, 2021
Signal Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking
Wenbin Jiang and Kai Yu
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1758-1770, 2023
Natural Language Processing
-
LLM SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen and Kai Yu
AAAI 2024 -
LLM Large Language Models Are Semi-Parametric Reinforcement Learning Agents.
Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao and Kai Yu
NeurIPS 2023 -
NLP A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL
Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, Kai Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 11, pp. 13796-13813, 2023 -
NLP LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations
Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu and Kai Yu
ACL 2021
NLP OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue
Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu and Kai Yu
Transactions of the Association for Computational Linguistics (TACL), vol.11, pp. 68-84, 2022
Multi-modal Interaction
-
Avatar DIFFDUB: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen and Kai Yu
ICASSP 2024 -
Avatar DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao and Jiang Bian
ACM-MM 2023 -
GUI Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu and Kai Yu
EMNLP 2022 -
GUI TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages
Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen and Kai Yu
NAACL 2022
Professional Qualification and Service
Institute of Electrical and Electronics Engineers (IEEE)
- Fellow of IEEE
- Board Member of IEEE Signal Processing Society Conferences Board
- Board Member of IEEE Signal Processing Society Membership Board
- Member of IEEE Speech and Language Processing Technical Committee (2017-2019)
- Associate Editor of IEEE/ACM Transactions on Audio Speech and Language Processing (2019-2024)
- General Chair of ICASSP 2025 Satellite Event in Suzhou
International Speech Communication Association (ISCA)
- Fellow of ISCA
- Program Chair of Interspeech 2020
China Computer Federation (CCF)
- Distinguished Member of CCF
- Member of the 13th Council of CCF
- Director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF
- Standing Committee Member of the Large Model Forum of CCF
Chinese Information Processing Society of China (CIPSC)
- Member of the 9th Council of CIPSC
- Associate Director of the Speech Information Processing Technical Committee of CIPSC
Industry Service
- Director of the National AI Open Innovation Platform on Language Computing, Ministry of Science and Technology of China (MOST)
- Member of the AI Key Technology and Application Evaluation Academic Committee of the Key Laboratory of the Ministry of Industry and Information Technology of China
- Member of the Information System User Interfaces Branch (TC28/SC35) of the National Information Technology Standardization Technical Committee
- Member of the 4th National Computer Science and Technology Terminology Approval Committee
- Director of the Academic and Intellectual Property Working Group of the China Artificial Intelligence Industry Alliance (AIIA)
- Associate Director of the Technical Committee of the Alliance of Intelligent Speech Technology Industry of China
Other Service
- Vice President of the Shanghai Overseas Returned Scholar Association (SORSA)
- Chairman of the AI Branch of SORSA
- Member of the Young Scientists Committee of the World Laureates Forum
Academic Conference Service
- ICASSP
- IEEE SLTC Member
- General Chair of ICASSP 2025 Satellite Event
- Interspeech
- Program Chair, Area Chair (Speech Recognition/Dialogue Systems)
- EUSIPCO
- Area chair (Speech Processing)
- ACL
- (Senior) Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems/Spoken Language Technology)
- NAACL
- Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems)
- EMNLP
- Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems)
- NeurIPS
- Area Chair
- SigDial
- Program Chair
- ICMI
- Program Chair
- NCMMSC
- General Chair, Program Chair
Reviewer Service
- Journal
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- IEEE Signal Processing Letters
- IEEE Signal Processing Magazine
- Speech Communication
- Computer Speech and Language
- Journal of Computer Science (Chinese)
- Journal of Automation (Chinese)
- Conference
- ICASSP, Interspeech, IEEE ASRU, IEEE SLT, APSIPA, ISCSLP, NCMMSC
- ACL/NAACL/EACL, EMNLP, SigDial
- AAAI, NeurIPS
- Proposal and Award
- EPSRC, U.K.
- Science and Engineering Research Council, Agency for Science and Technology Research, Singapore
- Israel Science Foundation (ISF), Israel
- Foundation for Polish Science
- Research Grants Council (RGC) of Hong Kong
- National Natural Science Foundation of China
- Ministry of Science and Technology of China
- Ministry of Industry and Information Technology of China
- Ministry of Education of China
- Chinese Academy of Sciences
Award
Best Paper Award
- EURASIP Speech Communication Best Paper Award
- International Symposium on Chinese Spoken Language Processing Best Paper Award
- ISCA Computer Speech and Language Best Paper Award
- Interspeech Best Paper Award
- IEEE SLT Best Paper Award
- NCMMSC Best Paper Award
National and Provincial Award
- Leading Talents in Scientific and Technological Innovation by Ministry of Science and Technology of China
- Excellent Young Researcher Fund by National Science Foundation of China (NSFC)
- Chinese Patent Excellence Award by China National Intellectual Property Administration
- Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning by Shanghai Municipal Education Commission
Professional Society Academic Award
- Bamboo Award by China Computer Federation (CCF)
- Distinguished Lecturer of Advanced Disciplines Lectures by China Computer Federation (CCF)
- Second Prize for Scientific and Technological Progress, WuWenJun AI Science and Technology Award by Chinese Association for Artificial Intelligence (CAAI)
- First Prize for Natural Science, WuWenJun AI Science and Technology Award by Chinese Association for Artificial Intelligence (CAAI)
Other Award
- Scientific Chinese (2016) Person of the Year by Scientific Chinese Magazine
