Carview!

CARVIEW

MOTORHOMES

Select Language

Publications

I don't keep this list up-to-date anymore...For more recent publications, see: Google Scholar

Don't trust your eyes: on the (un)reliability of feature visualizations

Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim
[ICML 2024]

Concept-based Understanding of Emergent Multi-Agent Behavior

Niko Grupen, Natasha Jaques, Been Kim, Shayegan Omidshafiei
[arxiv]

Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero

TL;DR: Pushing the frontier of human knowledge by developing interpretability tools to teach humans something new. This work provides quantitative evidence that learning from something only machines know (M-H space) is possible. We discover super-human chess strategies from AlphaZero and teach them to four amazing grandmasters. The quantitative evidence: we measure grandmasters' baseline performance on positions that invoke the concept. After teaching (shown AZ moves), they can solve puzzles better on unseen positions.

Lisa Schut, Nenad Tomasev, Tom McGrath, Demis Hassabis, Ulrich Paquet, Been Kim
[arxiv 2023]

Impossibility Theorems for Feature Attribution

TL;DR: We can theoretically prove that just because popular attribution methods tell you there is X attribution to a feature, doesn’t mean you can conclude anything about the actual model's behavior.

Blair Bilodeau, Natasha Jaques, Pang Wei Koh, Been Kim
[PNAS 2023]

Socially intelligent machines that learn from humans and help humans learn

TL;DR: We need AI systems that can consider human minds so that they can learn more effectively from humans (as learners) and even help humans learn (as teachers).

Hyowon Gweon, Judith Fan and Been Kim
[Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2023]

Model evaluation for extreme risks

TL;DR: Model evaluation is critical for addressing extreme risks.

Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe
[arxiv]

Gaussian Process Probes (GPP) for Uncertainty-Aware Probing

TL;DR: A probing method that can also provide epistemic and aleatory uncertainties about its probing.

Zi Wang, Alexander Ku, Jason Baldridge, Thomas L. Griffiths, Been Kim
[Neurips2023]

State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding

TL;DR: Protégé Effect: use joint embedding model to 1) inform RL reward shaping and 2) provide explanations that improves task performance for users.

Devleena Das, Sonia Chernova, Been Kim
[Neurips2023]

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

TL;DR: Surprisingly, localization (where a fact is stored) in LLM has no correlation with editing success.

Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun
[Neurips2023]

On the Relationship Between Explanation and Prediction: A Causal View

TL;DR: There is not much.

Amir-Hossein Karimi, Krikamol Muandet, Simon Kornblith, Bernhard Schölkopf, Been Kim
[ICML 2023]

Subgoal-based explanations for unreliable intelligent decision support systems

TL;DR: Even when explanations are not perfect, some types of explanations (subgoal-based) can be helpful for training humans in complex tasks.

Devleena Das, Been Kim, Sonia Chernova
[IUI 2023]

Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis

TL;DR:Treat neural networks as if they were a new species in the wild. Conduct an observational study to learn emergent behaviors of the multi-agent RL system.

Shayegan Omidshafiei, Andrei Kapishnikov, Yannick Assogba, Lucas Dixon, Been Kim
[Neurips 2022]

Mood board search & CAV Camera

TL;DR: Together with artists, designers and ML experts, we experiment with ways in which machine learning can inspire creativity--especially in photography. We open sourced the back-end, and published an Android app.

Google AI blog post: Enabling Creative Expression with Concept Activation Vectors
Mood Board Search: AI Experiments Page / GitHub
CAV Camera: AI Experiments Page / Play Store

Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation

TL;DR: If you know what type of spurious correlations your model may have, you can test them using existing methods. But if you don't know what they are, you can't test them. Many existing interpretability methods can't help you either.

Julius Adebayo, Michael Muelly, Hal Abelson, Been Kim
[ICLR 2022]

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

TL;DR: Can we automatically learn concepts tht are relevant to a prediction (e.g., pigmentation), and generate new set of images that would follow the concept trajectory (more or less concept)? Yes.

Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff, Rosalind W. Picard
[ICLR 2022]

Acquisition of Chess Knowledge in AlphaZero

TL;DR: How does the super-human self-taught chess play machine--AlphaZero--learn to play chess, and what can we learn about chess from it? We investigate the emergence of human concepts in AlphaZero and the evolution of its play through training.

Thomas McGrath, Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, Vladimir Kramnik
[PNAS] [visualization]

Machine Learning Techniques for Accountability

TL;DR: Pros and cons of accountability methods

Been Kim, Finale Doshi-Velez
[PDF]

Do Neural Networks Show Gestalt Phenomena? An Exploration of the Law of Closure

TL;DR: It does. And it might be related to how NNs can generalize.

Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio
[arxiv link] [Computational Brain & Behavior 2021]
[MIT Technology Review]

On Completeness-aware Concept-Based Explanations in Deep Neural Networks

TL;DR: Let's find set of concepts that are "sufficient" to explain predictions.

Chih-Kuan Yeh, Been Kim, Sercan O. Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar
[Neurips 20]

Debugging Tests for Model Explanations

TL;DR: Sanity check2.

Julius Adebayo, Michael Muelly, Ilaria Liccardi, Been Kim
[Neurips 20]

Concept Bottleneck Models

TL;DR: Build a model where concepts are built-in so that you can control influential concepts.

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang
[ICML 20] [Featured at Google Research review 2020]

Explaining Classifiers with Causal Concept Effect (CaCE)

TL;DR: Make TCAV causal.

Yash Goyal, Amir Feder, Uri Shalit, Been Kim
[arxiv]

Towards Automatic Concept-based Explanations

TL;DR: Automatically discover high-level concepts that explain a model's prediction.

Amirata Ghorbani, James Wexler, James Zou, Been Kim
[Neurips 19] [code]

BIM: Towards Quantitative Evaluation ofInterpretability Methods with Ground Truth

TL;DR: Datasets, models and metrics to quantitatively evaluate your interpretability methods with groundtruth. We compare many widely used methods and report their rankings.

Sharry Yang, Been Kim
[arxiv] [code]

Visualizing and Measuring the Geometry of BERT

TL;DR: Studying geometry of BERT to gain insights behind their impressive performance.

Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg
[Neurips 19] [blog post]

Evaluating Feature Importance Estimates

TL;DR: One idea to evaluate attribution methods.

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim
[Neurips 19]

Human Evaluation of Models Built for Interpretability

TL;DR: What are the factors of explanation that matter for better interpretability and in what setting? A large-scale study to answer this question.

Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Samuel Gershman and Finale Doshi-Velez
[HCOMP 19] (best paper honorable mention)

Human-Centered Tools for Coping with Imperfect Algorithms during Medical Decision-Making

TL;DR: A tool to help doctors to navigate medical images using medically-relevant similarties. This work uses a part of TCAV idea to sort images with concepts.

Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, Michael Terry
CHI 2019 (best paper honorable mention)
[pdf]

Interpreting Black Box Predictions using Fisher Kernels

TL;DR: Answering "which training examples are most responsible for a given set of predictions?" Follow up of MMD-critic [NeurIPS 16]. The difference is that now we pick examples informed by how the classifier sees them!

Rajiv Khanna, Been Kim, Joydeep Ghosh, Oluwasanmi Koyejo
[AISTATS 2019]

To Trust Or Not To Trust A Classifier

TL;DR: A very simple method that tells you whether to trust your prediction or not, that happens to also have nice theoretical properties!

Heinrich Jiang, Been Kim, Melody Guan, Maya Gupta
[Neurips 2018 ] [code]

Human-in-the-Loop Interpretability Prior

TL;DR: Ask humans which models are more interpretable DURING the model training. This gives us a more interpretable model for the end-task.

Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale Doshi-Velez
[Neurips 2018]

Sanity Checks for Saliency Maps

TL;DR: Saliency maps are popular post-training interpretability methods that claim to show the 'evidence' of predictions. But it turns out that they have little to do with the model's prediction! Some saliency maps produced from a trained network and a random network (with random prediction) are visually indistinguishable.

Julius Adebayo, Justin Gilmer, Ian Goodfellow, Moritz Hardt, Been Kim
[Neurips 18]

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

TL;DR: We can learn to represent human-concepts in any layer of already-trained neural networks. Then we can ask how important were those concepts for a prediction.

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
[ICML 18] [code] [bibtex] [slides]

Sundar Pichai (CEO of Google)'s presenting TCAV as a tool to build AI for everyone at his keynote speech at Google I/O 2019 [video]

The (Un)reliability of saliency methods

TL;DR: Existing saliency methods could be unreliable; we can make them show whatever we want by simply introducing constant shift in the input (not even adversarial!).

Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim
[NIPS workshop 2017 on Explaining and Visualizing Deep Learning] [bibtex]

SmoothGrad: removing noise by adding noise

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, Martin Wattenberg
[ICML workshop on Visualization for deep learning 2017] [code]

QSAnglyzer: Visual Analytics for Prismatic Analysis of Question Answering System Evaluations

Nan-chen Chen and Been Kim

[VAST 2017]

Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez and Been Kim

Springer Series on Challenges in Machine Learning: "Explainable and Interpretable Models in Computer Vision and Machine Learning" [pdf]

Examples are not Enough, Learn to Criticize! Criticism for Interpretability

Been Kim, Rajiv Khanna and Sanmi Koyejo

[NIPS 16] [NIPS oral slides] [talk video] [code]

Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction

Been Kim, Finale Doshi-Velez and Julie Shah
[NIPS 15] [variational inference in gory detail]

iBCM: Interactive Bayesian Case Model Empowering Humans via Intuitive Interaction

Been Kim, Elena Glassman, Brittney Johnson and Julie Shah
[Chapter X in thesis] [demo video]

Bayesian Case Model:
A Generative Approach for Case-Based Reasoning and Prototype Classification

Been Kim, Cynthia Rudin and Julie Shah

[NIPS 14] [poster] This work was featured on MIT news and MIT front page spotlight.

Scalable and interpretable data representation for
high-dimensional complex data

Been Kim, Kayur Patel, Afshin Rostamizadeh and Julie Shah

[AAAI 15]

A Bayesian Generative Modeling with Logic-Based Prior

Been Kim, Caleb Chacha and Julie Shah
[Journal of Artificial Intelligence Research (JAIR) 2014]

Learning about Meetings

Been Kim and Cynthia Rudin

[Data Mining and Knowledge Discovery Journal 2014] This work was featured in Wall Street Journal.

Inferring Robot Task Plans from Human Team Meetings:
A Generative Modeling Approach with Logic-Based Prior

Been Kim, Caleb Chacha and Julie Shah

[AAAI 13] [video] This work was featured in:
"Introduction to AI" course at Harvard (COMPSCI180: Computer science 182) by Barbara J. Grosz.
[Course website]
"Human in the loop planning and decision support" tutorial at AAAI15 by Kartik Talamadupula and Subbarao Kambhampati.
[slides From the tutorial] <