I am a full-time research scientist at Appier AI Research Team, where I work with advisors Prof. Yun-Nung Chen and Prof. Hung-yi Lee. Previously, I earned my Master’s degree in Computer Science at National Taiwan University (NTU), where I was advised by Prof. Hsin-Hsi Chen, leader of the NLPLab. I am also a licensed medical doctor, having obtained my Doctor of Medicine (M.D.) degree from NTU.
Research Interests
I am broadly interested in the fields of natural language processing (NLP), deep learning (DL), and their relationships with psychology. Recently, I am especially intrigued by the mysteries behind large language models (LLMs), and find the works that discover their surprising properties most exciting, especially the ones that find connections to human cognition. Some of my recent favorite works:
In line with the above interests, my long-term research goal is to uncover how these models acquire knowledge and perform reasoning. I believe that this research direction would deepen our understanding of the nature of intelligence.
My Favorite Quote
Research is the search for reality. It is a wonderful search. It keeps us humble. Authentic humility is striving to see things how they are, rather than how we want them to be. - by Kevin Gimpel in his advice on being a happier researcher
Structured generation, the process of producing content in standardized formats like JSON and XML, is widely utilized in real-world applications to extract key output information from large language models (LLMs). This study investigates whether such constraints on generation space impact LLMs abilities, including reasoning and domain knowledge comprehension. Specifically, we evaluate LLMs performance when restricted to adhere to structured formats versus generating free-form responses across various common tasks. Surprisingly, we observe a significant decline in LLMs reasoning abilities under format restrictions. Furthermore, we find that stricter format constraints generally lead to greater performance degradation in reasoning tasks.
@article{tam2024let,title={Let me speak freely? a study on the impact of format restrictions on performance of large language models},author={Tam, Zhi Rui and Wu, Cheng-Kuang and Tsai, Yi-Lin and Lin, Chieh-Yen and Lee, Hung-yi and Chen, Yun-Nung},journal={arXiv preprint arXiv:2408.02442},year={2024},}
EMNLP 2024
I Need Help! Evaluating LLM’s Ability to Ask for Users’ Support: A Case Study on Text-to-SQL Generation
Cheng-Kuang Wu*, Zhi Rui Tam*, Chao-Chung Wu, and 3 more authors
This study explores the proactive ability of LLMs to seek user support. We propose metrics to evaluate the trade-off between performance improvements and user burden, and investigate whether LLMs can determine when to request help under varying information availability. Our experiments show that without external feedback, many LLMs struggle to recognize their need for user support. The findings highlight the importance of external signals and provide insights for future research on improving support-seeking strategies.
@article{wu2024need,title={I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation},author={Wu, Cheng-Kuang and Tam, Zhi Rui and Wu, Chao-Chung and Lin, Chieh-Yen and Lee, Hung-yi and Chen, Yun-Nung},journal={arXiv preprint arXiv:2407.14767},year={2024},}
NeurIPS 2024
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
Cheng-Kuang Wu*, Zhi Rui Tam*, Chieh-Yen Lin, and 2 more authors
arXiv preprint arXiv:2406.08747. NeurIPS 2024 (Datasets and Benchmarks) , 2024
Recent works have shown that large language model (LLM) agents are able to improve themselves from experience, which is an important ability for continuous enhancement post-deployment. However, existing benchmarks primarily evaluate their innate capabilities and do not assess their ability to improve over time. To address this gap, we introduce StreamBench, a pioneering benchmark designed to evaluate the continuous improvement of LLM agents over an input-feedback sequence. StreamBench simulates an online learning environment where LLMs receive a continuous flow of feedback stream and iteratively enhance their performance. In addition, we propose several simple yet effective baselines for improving LLMs on StreamBench, and provide a comprehensive analysis to identify critical components that contribute to successful streaming strategies. Our work serves as a stepping stone towards developing effective online learning strategies for LLMs, paving the way for more adaptive AI systems in streaming scenarios.
@article{wu2024streambench,title={StreamBench: Towards Benchmarking Continuous Improvement of Language Agents},author={Wu, Cheng-Kuang and Tam, Zhi Rui and Lin, Chieh-Yen and Chen, Yun-Nung and Lee, Hung-yi},journal={arXiv preprint arXiv:2406.08747},year={2024},}