| CARVIEW |
Seungwon Lim
Hi, I'm Seungwon Lim. I'm a researcher at Yonsei University, LangAGI (Language & AGI Lab) advised by Jinyoung Yeo. I received my bachelor's degree in Computer Science, and I am currently pursuing an integrated MS/PhD program in Computer Science.
Currently, I'm working as a research scientist intern at
, Exaone Lab. I am conducting research for making advanced foundation large language models.
My research question centers on developing reliable agent systems. To achieve this, I am currently focusing on agent’s reasoning, action-decision, and human-centric AI.
Publications
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu
EMNLP2025 Main
TLDR; We introduce VisEscape inspired by Escape Room games, and evaluate the Reasoning and Decision-making of diverse MLLMs in exploration-driven and dynamic environments.
When AI co-scientists fail: SPOT-a benchmark for automated verification of scientific research
Guijin Son, Jiwoo Hong, Honglu Fan, Heejeong Nam, Hyunwoo Ko, Seungwon Lim, Jinyeop Song, Jinha Choi, Gonçalo Paulo, Youngjae Yu, Stella Biderman
Under Review
TLDR; We introduce SPOT, a benchmark for automated verification of scientific research, and show a substantial margin exists for AI-assisted academic verification.
Persona Dynamics: Unveiling the Impact of Persona Traits on Agents in Text-Based Games
Seungwon Lim, Seungbeen Lee, Dongjun Min, Youngjae Yu
ACL2025 Main (Oral)
TLDR; We introduce PANDA, which incorporates Human Personality Traits into AI agents for Text-based Games and examines how these traits impact their behavior and performance.
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungwon Lim*, Seungbeen Lee*, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu
NAACL2025 Findings
TLDR; We introduce a psychometric-based benchmark TRAIT to measure the personality revealed in the Behavior Patterns of LLMs along with verification of Reliability and Validity.
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee and Youngjae Yu
EMNLP2024 Main
TLDR; We introduce UNPIE, a new benchmark crafted to evaluate how multimodal inputs influence the Resolution of Lexical Ambiguities.
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents
Jeongeun Park, Seungwon Lim, Joonhyung Lee, Sangbeom Park, Minsuk Chang, Youngjae Yu and Sungjoon Choi
ICRA2024
TLDR; We introduce CLARA, a LLM-empowered method for robots to estimate Uncertainty of user commands and to Disambiguate them via question generation for clarification.