| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 21 Dec 2025 20:44:11 GMT
access-control-allow-origin: *
etag: W/"69485c1b-f545"
expires: Tue, 30 Dec 2025 12:00:25 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 0AF8:123DE:A03324:B3EC5C:6953BC81
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 11:50:26 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210086-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767095426.820216,VS0,VE216
vary: Accept-Encoding
x-fastly-request-id: 25dc7f8a2d629d296656df578996ca48abe4dec4
content-length: 12282
Ryo Kamoi
Ryo Kamoi is a Ph.D. student in Computer Science at Penn State University advised by Dr. Rui Zhang. He received his master’s degree in CS from UT Austin where he was advised by Dr. Greg Durrett, and received his bachelor’s degree in Statistics from Keio University where he was advised by Dr. Kei Kobayashi. He previously interned at Microsoft OAR and Amazon Alexa.
His research interests lie in large language models (LLMs), with a particular focus on the reasoning capabilities and self-improvement of LLMs. His recent research work includes:
News
May 2025
I have started my internship at Microsoft OAR!
Apr 2025
I passed my comprehensive exam! Thank you to my advisor, committee members, and co-authors for their support!
Process Reward Models (PRMs) provide step-level verification to LLM reasoning. However, prior work rely on training data with human annotated labels or noisy labels. We propose FoVer, a novel method to create PRM training data by using formal verification tools like Z3 and Isabelle. FoVer is the first method that creates accurate PRM training data without relying on human annotation. FoVer creates PRM training data using symbolic tasks compatible with formal verification tools, but FoVer improves LLM-based PRMs over broad reasoning tasks.
We critically survey broad papers and discuss the conditions required for successful self-correction. Our survey indicates that (1) no prior work demonstrates successful self-correction with feedback from prompted LLMs, except for studies in tasks that are exceptionally suited for self-correction, (2) self-correction works well in tasks that can use reliable external feedback, and (3) large-scale fine-tuning enables self-correction.
ReaLMistake is a benchmark for evaluating error detection methods that detects errors in LLM responses. This benchmark includes errors made by GPT-4 and Llama 2 70B on three tasks (math word problem generation, fine-grained fact verification, and answerability classification). We observe that LLMs still cannot reliably detect mistakes made by LLMs. Strong LLMs like GPT-4 and Claude 3 detect errors made by LLMs at very low recall, and all LLM-based error detectors perform much worse than humans.
Selected publications. For the full list, please see Google Scholar or Semantic Scholar
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
PDF
Cite
Dataset
Website
Poster
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang (2025). COLM 2025.
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang (2025). COLM 2025.
AAAR-1.0: Assessing AI's Potential to Assist Research
PDF
Cite
Renze Lou, Hanzi Xu, Sijia Wang, Jiangshu Du, Ryo Kamoi, Xiaoxin Lu, Jian Xie, Yuxuan Sun, Yusen Zhang, Jihyun Janice Ahn, Hongchao Fang, Zhuoyang Zou, Wenchao Ma, Xi Li, Kai Zhang, Congying Xia, Lifu Huang, Wenpeng Yin (2025). ICML 2025. Best Paper Award at the 2nd AI4Research workshop @ AAAI 2025.
Renze Lou, Hanzi Xu, Sijia Wang, Jiangshu Du, Ryo Kamoi, Xiaoxin Lu, Jian Xie, Yuxuan Sun, Yusen Zhang, Jihyun Janice Ahn, Hongchao Fang, Zhuoyang Zou, Wenchao Ma, Xi Li, Kai Zhang, Congying Xia, Lifu Huang, Wenpeng Yin (2025). ICML 2025. Best Paper Award at the 2nd AI4Research workshop @ AAAI 2025.
Generalizable Process Reward Models via Formally Verified Training Data
PDF
Cite
Code
Dataset
Model
Website
Ryo Kamoi, Yusen Zhang, Nan Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang (2025). arXiv preprint arXiv:2505.15960.
Ryo Kamoi, Yusen Zhang, Nan Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang (2025). arXiv preprint arXiv:2505.15960.
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
PDF
Cite
Video
Slides
Paper List
Ryo Kamoi, Yusen Zhang, Nan Zhang, Jiawei Han, Rui Zhang (2024). TACL 2024. Oral at EMNLP 2024.
Ryo Kamoi, Yusen Zhang, Nan Zhang, Jiawei Han, Rui Zhang (2024). TACL 2024. Oral at EMNLP 2024.
Evaluating LLMs at Detecting Errors in LLM Responses
PDF
Cite
Code
Dataset
Poster
Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang (2024). COLM 2024.
Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang (2024). COLM 2024.
WiCE: Real-World Entailment for Claims in Wikipedia
PDF
Cite
Dataset
Slides
Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett (2023). EMNLP 2023. Oral.
Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett (2023). EMNLP 2023. Oral.
Shortcomings of Question Answering Based Factuality Frameworks for Error Localization
PDF
Cite
Dataset
Video
Poster
Ryo Kamoi, Tanya Goyal, Greg Durrett (2023). EACL 2023.
Ryo Kamoi, Tanya Goyal, Greg Durrett (2023). EACL 2023.
Why is the Mahalanobis Distance Effective for Anomaly Detection?
PDF
Cite
Ryo Kamoi, Kei Kobayashi (2020). arXiv preprint arXiv:2003.00402.
Ryo Kamoi, Kei Kobayashi (2020). arXiv preprint arXiv:2003.00402.
Education
Penn State University
— Ph.D. Student in Computer Science
Aug 2023 –
Present
State College, PA
PhD Advisor: Rui Zhang
University of Texas at Austin
— M.S. in Computer Science
Aug 2020 –
Dec 2022
Austin, TX
Advisor: Greg Durrett, Mentor: Tanya Goyal
Keio University
— B.E. in Statistics
Apr 2016 –
Mar 2020
Tokyo, Japan
Advisor: Kei Kobayashi, Keio Engineering Foundation Award (Top student in the Department of Mathematics)
Work Experience
Research on the quality evaluation of Alexa
Services
Awards
Scholarship for alumni of Keio University to pursue degrees at overseas graduate schools
Graduation with highest honors - First place in the Department of Mathematics at Keio University
Media Mentions
Interview about our survey paper on LLM self-correction.
Invited Talks
AiTech Seminar at Okazaki Laboratory
Nov 17, 2025
Institute of Science Tokyo
Michinoku Communication Science Seminar (MiCS)
Nov 14, 2025
Tohoku University
AI for Math Mini Minisymposium at SIAM-NNP
Nov 1, 2025
2025 SIAM New York-New Jersey-Pennsylvania Section Conference
How to Pronounce My Name
- Preferred English Pronunciation:
- Native Pronunciation: