| CARVIEW |
Pluralistic Alignment
@ NeurIPS 2024 Workshop
December 14, 2024.
Vancouver Convention Center (West Meeting Room 116, 117).
Exploring Pluralistic Perspectives in AI
Welcome to the Pluralistic Alignment Workshop! Aligning AI with human preferences and values is increasingly important. Yet, today’s AI alignment methods have been shown to be insufficient for capturing the vast space of complex – and often conflicting – real-world values. Our workshop will discuss how to integrate diverse perspectives, values, and expertise into pluralistic AI alignment. We aim to explore new methods for multi-objective alignment by drawing inspiration from governance and consensus-building practices to address conflicting values in pluralistic AI alignment. Discussion will include technical approaches for dataset collection, algorithms development, and the design of human-AI interaction workflows that reflect pluralistic values among diverse populations. By gathering experts from various fields, this workshop seeks to foster interdisciplinary collaboration and push the boundaries of the understanding, development and practice of pluralistic AI alignment.
Stay tuned by following us on Twitter @pluralistic_ai.
Speakers
Yejin Choi
University of Washington
Melanie Mitchell
Santa Fe Institute
Seth Lazar
Australian National University
Michael Bernstein
Stanford University
Monojit Choudhury
MBZUAI
Hannah Rose Kirk
University of Oxford
Schedule
| Time | Program |
|---|---|
| 9:00-9:10 | Opening remarks |
| 9:10-9:55 | Keynote Talk: Monojit Choudhury, LLMs for a Multi-cultural World: A case for On-demand Value Alignment |
| 9:55-10:40 | Keynote Talk: Hannah Rose Kirk, Interpersonal and Intrapersonal Dilemmas in Achieving Pluralistic Alignment |
| 10:40-11:40 | Poster Session & Coffee Break |
| 11:40-12:25 | Keynote Talk: Yejin Choi, Pluralism, Creativity, and Humanism |
| 12:25-13:30 | Lunch Break |
| 13:30-14:15 | Keynote Talk: Seth Lazar, Philosophical Foundations for Pluralistic Alignment |
| 14:15-15:00 | Keynote Talk: Melanie Mitchell, The Role of Metacognition in Wise and Aligned Machine Intelligence |
| 15:00-15:30 | Coffee Break |
| 15:30-16:45 | 5 Contributed Talks (10 mins talk + 5 mins Q&A) |
| 16:45-17:30 | Keynote Talk: Michael Bernstein, Interactive Simulacra of Human Attitudes and Behavior |
| 17:30-17:40 | Closing Remarks |
Keynote Talks
LLMs for a Multi-cultural World: A Case for On-demand Value Alignment
Speaker: Monojit Choudhury
Aligning AI models to human values is of utmost importance, but to which values and of which humans? In our multicultural world there are no universal set or hierarchy of values. Therefore, if foundation models are aligned to some particular values, we run the risk of excluding users, usage contexts and applications that require alignment to conflicting values. I will discuss a set of experiments with moral dilemmas with various LLMs and languages that shows that while moral reasoning capability of LLMs grow with model size, this ability is greatly compensated when the model is strongly aligned to certain set of values. This seriously limits the usability of the model in diverse applications and regions that prefer conflicting value hierarchies. I will use this as a case to argue against generic value alignment for foundation model; instead, foundation models should possess the ability to reason with any arbitrary value system specified in their prompt or through knowledge injection.
Interpersonal and Intrapersonal Dilemmas in Achieving Pluralistic Alignment
Speaker: Hannah Rose Kirk
Early work in AI alignment relied on restrictive assumptions about human behaviour to make progress even in simple 1:1 settings with a single operator. This talk addresses two key challenges in developing more pluralistic and realistic models of human preferences for alignment today. In Part I, we challenge the assumption that values and preferences are universal or acontextual through examining interpersonal dilemmas - what happens when we disagree with one another? I'll introduce the PRISM Alignment Dataset as a key new resource that contextualizes preference ratings across diverse human groups with detailed sociodemographic data. In Part II, we challenge the assumption that values and preferences are stable or exogenous by exploring intrapersonal dilemmas - what happens when we disagree with ourselves? I'll introduce ongoing research on anthropomorphism in human-AI interaction, examining how revealed preferences often conflict with stated preferences, especially regarding AI systems' social capabilities and in longitudinal interactions.
Pluralism, Creativity, and Humanism
Speaker: Yejin Choi
The thesis of this talk is that we humans are all different from each other, and that’s a beautiful thing. I’ll share our recent attempts at conceptualizing pluralistic alignments, and visit an orthogonal but related question of the nature of creativity of LLMs compared to that of humans.
Philosophical Foundations for Pluralistic Alignment
Speaker: Seth Lazar
Why does pluralism matter, and how do different arguments for pluralism condition the methods by which we should realise it? This talk considers different possible justifications for pluralistic AI, and argues that, as long as we’re not using AI systems to exercise significant degrees of power, the best way to achieve pluralism is through ensuring a vibrant ecosystem of competing, varied, and (at least in some cases) open models.
The Role of Metacognition in Wise and Aligned Machine Intelligence
Speaker: Melanie Mitchell
I will argue that AI alignment, especially in pluralistic contexts, will require machines to be able to understand and reason about concepts appropriately in diverse situations, and to be able to explain these reasoning processes. In humans, such capacities are enabled by metacognitive abilities: being sensitive to context, grasping others' perspectives, and recognizing the limits of one's own capabilities. I will discuss possible approaches toward AI metacognition and why such abilities may be paramount in developing machines with the “wisdom" needed for pluralistic alignment.
Interactive Simulacra of Human Attitudes and Behavior
Speaker: Michael Bernstein
Effective models of human attitudes and behavior can empower applications ranging from immersive environments to social policy simulation. However, traditional simulations have struggled to capture the complexity and contingency of human behavior. I argue that modern artificial intelligence models allow us to re-examine this limitation. I make my case through computational software agents that simulate human attitudes and behavior. I discuss how we used this approach, which we call generative agents, to model a representative sample of 1,000 Americans and replicate their attitudes and behavior 85% as well as they replicate themselves two weeks later. Extending my line of argument, I explore how modeling human behavior and attitudes can help us design more effective online social spaces, understand the societal disagreement underlying modern AI models, and better embed societal values into our algorithms.
Accepted Papers
Accepted papers are available on OpenReview.
Oral presentation
- MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces
- Multilingual Trolley Problems for Language Models
- Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
- Representative Social Choice: From Learning Theory to AI Alignment
- Toward Democracy Levels for AI
Posters
- Are Large Language Models Consistent over Value-laden Questions?
- Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
- Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation
- Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs
- Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning
- AGR: Age Group fairness Reward for Bias Mitigation in LLMs
- Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment
- Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions
- A Case Study in Plural Governance Design
- Contrastive Learning Neuromotor Interface From Teacher
- Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI
- Bridging in Social Media Feeds Censors Controversial Topics
- Pareto-Optimal Learning from Preferences with Hidden Context
- Learning from Personal Preferences
- Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
- Tractable Agreement Protocols
- Model Plurality: A Taxonomy for Pluralistic AI
- PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
- AI, Pluralism, and (Social) Compensation
- Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training
- Selective Preference Aggregation
- PersonalLLM: Tailoring LLMs to Individual Preferences
- From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
- Plurality of value pluralism and AI value alignment
- Value Alignment from Unstructured Text
- Rules, Cases, and Reasoning: Positivist Legal Theory as a Framework for Pluralistic AI Alignment
- Intuitions of Compromise: Utilitarianism vs. Contractualism
- Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment
- Virtual Personas for Language Models via an Anthology of Backstories
- FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness
- "There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations.
- Policy Aggregation
- Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study
- Diverging Preferences: When do Annotators Disagree and do Models Know?
- Mechanism Design for LLM Fine-tuning with Multiple Reward Models
- Plurals: A system for pluralistic AI via simulated social ensembles
- Value-Aligned Imitation via focused Satisficing
- Pluralistic Alignment Over Time
- Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI
- Critique-out-Loud Reward Models
- Evaluating the Prompt Steerability of Large Language Models
- Can Language Models Reason about Individualistic Human Values and Preferences?
- Group Robust Best-of-K Decoding of Language Models for Pluralistic Alignment
- Aligning to Thousands of Preferences via System Message Generalization
Call for Papers
Our workshop aims to bring together researchers with diverse scientific backgrounds, including (but not limited to) machine learning, human-computer interaction, philosophy, and policy studies. More broadly, our workshop lies at the intersection of computer and social sciences. We welcome all interested researchers to discuss the aspects of pluralistic AI, from its definition to the technical pipeline to broad deployment and social acceptance.
We invite submissions that discuss the technical, philosophical, and societal aspects of pluralistic AI. We provide a non-exhaustive list of topics we hope to cover below. We also broadly welcome any submissions which are broadly relevant to pluralistic alignment.
- Philosophy:
- Definitions and frameworks for Pluralistic Alignment
- Ethical considerations in aligning AI with diverse human values
- Machine learning:
- Methods for pluralistic ML training and learning algorithms
- Methods for handling annotation disagreements
- Evaluation metrics and datasets suitable for pluralistic AI
- Human-computer interaction:
- Designing human-AI interaction that reflects diverse user experiences and values
- Integrating existing surveys on human values into AI design
- Navigating privacy challenges in pluralistic AI systems
- Social sciences:
- Methods for achieving consensus and different forms of aggregation
- Assessment and measurement of the social impact of pluralistic AI
- Dealing with pluralistic AI representing values that are offensive to some cultural groups
- Policy studies:
- Policy and laws for the deployment of pluralistic AI
- Democratic processes for incorporating diverse values into AI systems on a broad scale
- Applications:
- Case studies in areas such as hate speech mitigation and public health
Submission Instructions
We invite authors to submit anonymized papers up to 4 pages, excluding references and appendices. All submissions should be in PDF format and made through OpenReview submission portal. Submissions must follow the NeurIPS 2024 template. Checklists are not required for submissions. Reviews will be double-blind, with at least three reviewers assigned to each paper to ensure a thorough evaluation process.
We welcome various types of papers including works in progress, position papers, policy papers, academic papers. All accepted papers will be available on the workshop website, but are to be considered non-archival.
Travel Support
There will be a limited amount of travel grants to cover expenses. Financial support will be made available for lodging, and registration, subject to our available funding. Travel expenses are handled via reimbursement. We extend our thanks to OpenAI for their generous sponsorship of our workshop.
Please fill out the travel support application form by Oct 18, 2025 AOE.
Important Dates
All deadlines are 11:59 pm UTC-12h (“Anywhere on Earth”).
| July 15, 2024 | Call for Workshop Papers |
| Paper Submission Deadline | |
| Oct 9, 2024 | Notification of Acceptance |
| Oct 18, 2024 | Travel Support Application Deadline |
| Nov 04, 2024 | Notification of Travel Support Decisions |
| November 14, 2024 | Camera-Ready Version Due |
| Dec 14, 2024 | Workshop Date |
|---|
Organization
Organizing Committee
Ruyuan Wan
Pennsylvania State University
Mikhail Terekhov
EPFL
Mitchell L. Gordon
OpenAI and MIT CSAIL
Caglar Gulcehre
EPFL
Dongyeop Kang
University of Minnesota
Maarten Sap
CMU LTI
Amy Zhang
University of Washington
He He
New York University
Scientific Advisory Board
Yoshua Bengio
Mila & Université de Montréal
Jeffrey P. Bigham
Carnegie Mellon University
Contact us
Please email pluralistic-alignment-neurips2024@googlegroups.com if you have any questions.