Featured Publications and Preprints
Ziyi Liu, Bahar Sarrafzadeh, Pei Zhou, Longqi Yang, Jieyu Zhao, Ashish Sharma
Oct 29, 2025 · arXiv Preprint
ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation
ProMediate is the first framework designed to evaluate proactive AI mediator agents in
complex, multi-topic, multi-party negotiation. It consists of a simulation testbed and
a socio-cognitive evaluation framework with new metrics to measure consensus change,
intervention latency, and mediator effectiveness. Results show a socially intelligent
mediator increases consensus change and responds faster than a generic baseline.
Ziyi Liu, Priyanka Dey, Jen-tse Huang, Zhenyu Zhao, Bowen Jiang, Rahul Gupta, Yang Liu, Yao Du, Jieyu Zhao
Apr, 2025 · arXiv Preprint
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Cultural Intelligence with CQ-Bench
CQBench introduces a benchmark for evaluating large language models’ cultural intelligence by testing their ability to infer implicit cultural values from natural, multi-character conversations. Built from World Values Survey and GlobalOpinions data, CQBench includes three tasks—attitude detection, value selection, and value extraction—and is generated via a rigorous validation pipeline achieving 94.5% human–model agreement. Results show that while frontier models approach human performance in value selection, they still struggle with nuanced attitude inference, and that targeted fine-tuning on small, culturally rich datasets can yield substantial gains.
Jingyuan Huang, Jen-tse Huang, Ziyi Liu, Xiaoyuan Liu, Wenxuan Wang, Jieyu Zhao
July, 2025 - In the proceedings of ACL 2025
AI Sees Your Location, But With A Bias Toward The Wealthy World
Visual-Language Models (VLMs) demonstrate geographic recognition capabilities from images but exhibit
significant regional biases: they perform better on developed, densely populated areas than on less
developed, sparsely populated regions. This benchmark study also highlights privacy concerns arising from
strong geographic inference performance.
Ziyi Liu, Abhishek Anand, Pei Zhou, Jen-tse Huang, Jieyu Zhao
Jun 2024 · In the proceedings of EMNLP 2024
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context
We introduce InterIntent, a framework for evaluating large language models’ social intelligence
by testing their ability to understand and manage intentions within interactive game settings. The
paper proposes four dimensions of social intelligence—situational awareness, self-regulation,
self-awareness, and theory of mind—each linked to specific game tasks such as intention selection,
following, summarization, and guessing. Results show that models perform well on intention selection
but lag behind humans in inference tasks, highlighting areas for improvement in assessing social
reasoning in LLMs.
Ziyi Liu, Soumya Sanyal, Isabelle Lee, Yongkang Du, Rahul Gupta, Yang Liu, Jieyu Zhao
Nov 2024 · Findings EMNLP 2024
Self-contradictory reasoning evaluation and detection
This work investigates self-contradictory reasoning in large language models (LLMs), where the
model’s internal reasoning fails to support its answers. The authors define and measure the
Self-Contra rate across multiple datasets and identify finer-grained categories of contradiction.
Results show that models often produce correct answers via reasoning shortcuts or by ignoring
contextual evidence, compromising reliability. They further evaluate GPT-4’s ability to detect
self-contradictory reasoning and find that even with aided detection, performance (~52.2% F1) lags
behind humans (~66.7% F1), underscoring limitations in current LLM reasoning robustness. :contentReference[oaicite:0]{index=0}