| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 04 Jan 2026 17:44:36 GMT
access-control-allow-origin: *
etag: W/"695aa704-30b1"
expires: Wed, 28 Jan 2026 03:07:48 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 6C0C:130EA3:21870:29575:69797B2B
accept-ranges: bytes
age: 0
date: Wed, 28 Jan 2026 02:57:48 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210021-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1769569069.552080,VS0,VE235
vary: Accept-Encoding
x-fastly-request-id: 1d0cb2560e5494197697e2b07246833fa54d1d64
content-length: 4073
Prasanna Sattigeri | Principal Research Scientist at IBM Research AI and MIT-IBM Watson AI Lab, focusing on reliable AI, LLM governance, uncertainty quantification, and trustworthy machine learning.
Prasanna Sattigeri
Principal Research Scientist at IBM Research AI and MIT-IBM Watson AI Lab, focusing on reliable AI, LLM governance, uncertainty quantification, and trustworthy machine learning.
Prasanna Sattigeri
I am a Principal Research Scientist at IBM Research AI and the MIT-IBM Watson AI Lab, where my primary focus is on developing reliable AI solutions.
My current projects establish both theoretical frameworks and practical systems that ensure large language models are reliable and trustworthy. I lead the Granite Guardian project — IBM’s state-of-the-art LLM safeguarding models.
Research Interests
- Generative Modeling and Large Language Models
- Uncertainty Quantification for AI systems
- Learning with Limited Data
- LLM Governance, Safety, and Alignment
- Human-AI Collaboration
- Agentic AI Systems
Open-Source Contributions
I lead and contribute to widely-adopted trustworthy AI toolkits:
- Granite Guardian — LLM safeguarding for risks, jailbreaking, and hallucination detection (#1 on GuardBench)
- AI Fairness 360 — Detecting and mitigating bias in ML models (2,300+ GitHub stars)
- AI Explainability 360 — Explaining AI decisions
- Uncertainty Quantification 360 — Quantifying uncertainty in AI predictions
- ICX360 — Multi-level explanations for generative language models
Links
Recent News
2025
- April 2025 — IBM Research Blog: Granite Guardian tops third-party AI benchmark — Granite Guardian holds 6 of top 10 spots on GuardBench, scoring 86% across 40 datasets
- April 2025 — Paper accepted at ACL 2025: “Multi-Level Explanations for Generative Language Models”
- April 2025 — Paper accepted at NAACL 2025: “Evaluating the Prompt Steerability of Large Language Models”
- April 2025 — Paper accepted at NAACL 2025 Industry Track: “Granite Guardian: Comprehensive LLM Safeguarding”
- March 2025 — IBM Research Blog: IBM Granite now has adapters designed to control AI outputs — LLM calibration work from MIT-IBM Watson AI Lab
- February 2025 — New preprint: “On the Trustworthiness of Generative Foundation Models” — comprehensive guideline and assessment (66 co-authors)
- February 2025 — New preprint: “Agentic AI Needs a Systems Theory” — position paper on holistic approaches to agentic AI
- February 2025 — IBM Research Blog: How we slimmed down Granite Guardian — Granite Guardian 3.2 5B and MoE 3B models
2024
- December 2024 — Released Granite Guardian — achieving AUC 0.871 on harmful content and 0.854 on RAG-hallucination benchmarks
- October 2024 — New preprint: “Building a Foundational Guardrail for General Agentic Systems” — safeguarding agentic AI via synthetic data
- October 2024 — New preprint: “Graph-based Uncertainty Metrics for Long-form LLM Outputs”
- December 2024 — Papers accepted at NeurIPS 2024:
- “Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?”
- “WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts”
- “Attack Atlas: Challenges and Pitfalls in Red Teaming GenAI”
- November 2024 — Papers accepted at EMNLP 2024:
- “Language Models in Dialogue: Conversational Maxims for Human-AI Interactions”
- “Value Alignment from Unstructured Text” (Industry Track)
- July 2024 — Paper accepted at ICML 2024: “Thermometer: Towards Universal Calibration for Large Language Models”
- June 2024 — Invited talk on LLM Governance and Alignment at the NAACL TrustNLP Workshop. Slides
- 2024 — Panel and talk on Reliable AI-assisted Decision Making at the National Academy of Sciences Decadal Survey
- 2024 — Speaking at MIT AI Conference on AI ethics and change management
2023
- December 2023 — Papers accepted at NeurIPS 2023:
- “Efficient Equivariant Transfer Learning from Pretrained Models”
- “Effective Human-AI Teams via Learned Natural Language Rules and Onboarding”
- August 2023 — Invited talk on Uncertainty Calibration at KDD Workshop on Uncertainty Reasoning
- August 2023 — Panel on Generative AI and Safety at DSHealth Workshop, KDD
- August 2023 — Panel on Trustworthy LLMs at AI for Open Society Day, KDD
- February 2023 — Papers at AAAI 2023 and EACL 2023
Featured Research
Granite Guardian — State-of-the-Art LLM Safeguarding
I lead the Granite Guardian project at IBM Research, developing open-source models for LLM risk detection:
- #1 on GuardBench — First independent AI guardrail benchmark (86% accuracy across 40 datasets)
- #1 on REVEAL — Reasoning chain correctness evaluation (outperforms GPT-4o)
- #3 on LLM-AggreFact — Comprehensive fact-checking benchmark
- Covers social bias, profanity, violence, jailbreaking, and RAG hallucination risks
- Available on Hugging Face and GitHub
MIT-IBM Watson AI Lab Collaborations
- With Prof. Greg Wornell (MIT): Trustworthy Learning with Limited Data — uncertainty quantification and calibration for foundation models
- With Prof. David Sontag (MIT): Human-Centric AI — algorithms for shared decision making and human-AI team onboarding
Professional Service
- Associate Editor: Pattern Recognition (Elsevier)
- Senior Program Committee / Area Chair: AAAI, ICLR, NeurIPS, ICML
- Reviewer: NeurIPS, ICML, AAAI, ICLR, EMNLP, ACL, IEEE TPAMI