| CARVIEW |
ICML 2024 Workshop
Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)
ICML 2024 @ Vienna, Austria, Jul 27 Sat
Straus 1
Schedule
| Time | Session | Description | Duration(mins) |
|---|---|---|---|
Jul 27, 07:30 | Opening Remark | Opening Remark Jing Shao (Shanghai AI Lab) | 10 |
Jul 27, 07:40 | Keynote Talk | A data-centric view on reliable generalization Ludwig Schmit (University of Washington; Anthropic) | 30 |
Jul 27, 08:10 | Keynote Talk | Robust Alignment and Control with Representation Engineering Matt Fredrikson (Carnegie Mellon University; Gray Swan AI) | 30 |
Jul 27, 08:40 | Coffee Break | - | 10 |
Jul 27, 08:50 | Panel Discussion | Theme: Security and Safety of AI Agents Panelists: - Alan Chan (Center for the Governance of AI; Mila - Quebec Al Institute) - Tomek Korbak (UK AI Safety Institute) - Ivan Evtimov (Meta AI) - Kai Greshake (NVIDIA) - Matt Fredrikson (Carnegie Mellon University; Gray Swan AI) Moderator: Daniel Paleka (ETH Zurich) | 50 |
Jul 27, 09:40 | Contributed Talk | The Safety in Large Language Models Yisen Wang (Peking University) | 20 |
Jul 27, 10:00 | Outstanding Paper Talk | Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? | 10 |
Jul 27, 10:10 | Outstanding Paper Talk | Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques | 10 |
Jul 27, 10:20 | Lunch Break | - | 70 |
Jul 27, 11:30 | Keynote Talk | Agent Governance Alan Chan (Center for the Governance of AI; Mila - Quebec Al Institute) | 30 |
Jul 27, 12:00 | Keynote Talk | UK AI Safety Institute: Overview & Agents Evals Herbie Bradley (UK AI Safety Institute) | 30 |
Jul 27, 12:30 | Contributed Talk | Summary and Prospect of TiFA Challenge Lijun Li (Shanghai AI Lab) & Bowen Dong (Shanghai AI Lab) | 20 |
Jul 27, 12:50 | Break | - | 20 |
Jul 27, 13:10 | Paper Lightning Talks | Games for AI-Control: Models of Safety-Evaluations of AI Deployment Protocols (Outstanding paper; remote) Decomposed evaluations of geographic disparities in text-to-image models (Outstanding paper; remote) --- WebCanvas: Benchmarking Web Agents in Online Environments (Dehan Kong) Can Editing LLMs Inject Harm? (Shiyang Lai) MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants (John Heibel) Models That Prove Their Own Correctness (Orr Paradise) Bias Begets Bias: the Impact of Biased Embeddings on Diffusion Models (Marvin Li & Jeffrey Wang) | 40 |
Jul 27, 13:50 | Poster Session + Social | - | 60 |
Jul 27, 14:50 | End of Program | - | 0 |
Description/Call for Papers
- Adversarial attack and defense, poisoning, hijacking and security [18, 13, 19, 20, 21]
- Robustness to spurious correlations and uncertainty estimation
- Technical approaches to privacy, fairness, accountability and regulation [12, 22, 28]
- Truthfulness, factuality, honesty and sycophancy [23, 24]
- Transparency, interpretability and monitoring [25, 26]
- Identifiers of AI-generated material, such as watermarking [27]
- Technical alignment / control , such as scalable overslight [29], representation control [26] and machine unlearning [30]
- Model auditing, red-teaming and safety evaluation benchmarks [31, 32, 33, 16]
- Measures against malicious model fine-tuning [34]
- Novel safety challenges with the introduction of new modalities
Submission Guide
Submission Instructions
- Submission site: Submissions should be made on OpenReview.
- Submission are non-archival: we receive submissions that are also undergoing peer review elsewhere at the time of submission, but we will not accept submissions that have already been previously published or accepted for publication at peer-reviewed conferences or journals. Submission is permitted for papers presented or to be presented at other non-archival venues (e.g. other workshops). No formal workshop proceedings will be published.
- Social Impact Statement: authors are required to include a "Social Impact Statement" that highlights "potential broader impact of their work, including its ethical aspects and future societal consequences".
- Submission Length and Format: Submissions should be anonymised papers up to 5 pages (appendices can be added to the main PDF); excluding references and Social Impacts Statement. You must format your submission using the ICML_2024_LaTeX_style_file.
- Paper Review: All reviews are double-blinded, with at least two reviewers assigned to each paper.
- Camera Ready Instructions: The camera ready version is composed of a main body, which can be up to 6 pages long, followed by unlimited pages for a Social Impact Statement, references and an appendix, all in a single file. The camera-ready versions of all accepted submissions should be uploaded by the authors to the OpenReview page for corresponding submissions. The camera ready version will be publicly available to everyone on the Camera-Ready Deadline displayed below.
Key Dates
| Submissions Open | May 11, 2024 |
| Submission Deadline | May 30, 2024 |
| Acceptance Notification | June 17, 2024June 19, 2024 |
| Camera-Ready Deadline | July 7, 2024 |
| Workshop Date | July 27, 2024 |
Speakers & Panelists










Program Committee
Frequently Asked Questions
Can we submit a paper that will also be submitted to NeurIPS 2024?
Yes.
Can we submit a paper that was accepted at ICLR 2024?
No. ICML prohibits main conference publication from appearing concurrently at the workshops.
Will the reviews be made available to authors?
Yes.
I have a question not addressed here, whom should I contact?
Email organizers at icmltifaworkshop@gmail.com
References
[1] Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2024). Visual instruction tuning. Advances in neural information processing systems, 36.
[2] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
[3] OpenAI. (2023). GPT-4 with vision (GPT-4v) system card.
[4] Z. Yang, L. Li, K. Lin, J. Wang, C.-C. Lin, Z. Liu, and L. Wang. The dawn of lmms: Preliminary explorations with gpt-4v(ision), 2023.
[5] Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., ... & Sun, M. (2023). ToolIIM: Facilitating large language models to master 16000+ real-world APIs.
[6] C. Zhang, Z. Yang, J. Liu, Y. Han, X. Chen, Z. Huang, B. Fu, and G. Yu. Appagent: Multimodal agents as smartphone users, 2023.
[7] T. Eloundou, S. Manning, P. Mishkin, and D. Rock. Gpts are gpts: An early look at the labor market impact potential of large language models, 2023.
[8] Ormazabal, Aitor, et al. "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models." arXiv preprint arXiv:2404.12387 (2024).
[9] Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., ... & Zhou, J. (2023). Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.
[10] Sora: Creating video from text. (n.d.). https://openai.com/sora
[11] Ma, X., Wang, Y., Jia, G., Chen, X., Liu, Z., Li, Y. F., ... & Qiao, Y. (2024). Latte: Latent diffusion transformer for video generation. arXiv preprint arXiv:2401.03048.
[12] Shavit, Y., Agarwal, S., Brundage, M., Adler, S., O’Keefe, C., Campbell, R., ... & Robinson, D. G. (2023). Practices for Governing Agentic AI Systems. Research Paper, OpenAI, December.
[13] N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, A. Awadalla, P. W. Koh, D. Ippolito, K. Lee, F. Tramer, and L. Schmidt. Are aligned neural networks adversarially aligned?, 2023.
[14] A. Chan, R. Salganik, A. Markelius, C. Pang, N. Rajkumar, D. Krasheninnikov, L. Langosco, Z. He, Y. Duan, M. Carroll, M. Lin, A. Mayhew, K. Collins, M. Molamohammadi, J. Burden, W. Zhao, S. Rismani, K. Voudouris, U. Bhatt, A. Weller, D. Krueger, and T. Maharaj. Harms from increasingly agentic algorithmic systems. In 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23. ACM, June 2023. doi: 10.1145/3593013.3594033. URL https://dx.doi.org/10.1145/3593013.3594033.
[15] Gemini Team. Gemini: A family of highly capable multimodal models, 2023.
[16] T. Shevlane, S. Farquhar, B. Garfinkel, M. Phuong, J. Whittlestone, J. Leung, D. Kokotajlo, N. Marchal, M. Anderljung, N. Kolt, L. Ho, D. Siddarth, S. Avin, W. Hawkins, B. Kim, I. Gabriel, V. Bolina, J. Clark, Y. Bengio, P. Christiano, and A. Dafoe. Model evaluation for extreme risks, 2023.
[17] L. Weidinger, M. Rauh, N. Marchal, A. Manzini, L. A. Hendricks, J. Mateos-Garcia, S. Bergman, J. Kay, C. Griffin, B. Bariach, I. Gabriel, V. Rieser, and W. Isaac. Sociotechnical safety evaluation of generative ai systems, 2023.
[18] L. Bailey, E. Ong, S. Russell, and S. Emmons. Image hijacks: Adversarial images can control generative models at runtime, 2023.
[19] Jain, N., Schwarzschild, A., Wen, Y., Somepalli, G., Kirchenbauer, J., yeh Chiang, P., ... & Goldstein, T. (2023). Baseline defenses for adversarial attacks against aligned language models.
[20] Robey, A., Wong, E., Hassani, H., & Pappas, G. J. (2023). SmoothLLM: Defending large language models against jailbreaking attacks.
[21] B. Wang, W. Chen, H. Pei, C. Xie, M. Kang, C. Zhang, C. Xu, Z. Xiong, R. Dutta, R. Schaeffer, et al. Decodingtrust: A comprehensive assessment of trustworthiness in gpt models. 2023.
[22] Chan, A., Ezell, C., Kaufmann, M., Wei, K., Hammond, L., Bradley, H., ... & Anderljung, M. (2024). Visibility into AI Agents. arXiv preprint arXiv:2401.13138.
[23] Huang, Q., Dong, X., Zhang, P., Wang, B., He, C., Wang, J., Lin, D., Zhang, W., & Yu, N. (2023). Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation.
[24] Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., ... & Perez, E. (2023). Towards understanding sycophancy in language models.
[25] Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and editing factual associations in GPT.
[26] A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowski, S. Goel, N. Li, M. J. Byun, Z. Wang, A. Mallen, S. Basart, S. Koyejo, D. Song, M. Fredrikson, J. Z. Kolter, and D. Hendrycks. Representation engineering: A top-down approach to ai transparency, 2023.
[27] Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning.
[28] Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., ... & Lee, K. (2023). Scalable extraction of training data from (production) language models.
[29] S. R. Bowman, J. Hyun, E. Perez, E. Chen, C. Pettit, S. Heiner, K. Lukoˇsi ̄ut ̇e, A. Askell, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Olah, D. Amodei, D. Amodei, D. Drain, D. Li, E. Tran-Johnson, J. Kernion, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, L. Lovitt, N. Elhage, N. Schiefer, N. Joseph, N. Mercado, N. DasSarma, R. Larson, S. McCandlish, S. Kundu, S. Johnston, S. Kravec, S. E. Showk, S. Fort, T. Telleen-Lawton, T. Brown, T. Henighan, T. Hume, Y. Bai, Z. Hatfield-Dodds, B. Mann, and J. Kaplan. Measuring progress on scalable oversight for large language models, 2022.
[30] Yao, Y., Xu, X., & Liu, Y. (2023). Large language model unlearning. arXiv preprint arXiv:2310.10683.
[31] S. Casper, C. Ezell, C. Siegmann, N. Kolt, T. L. Curtis, B. Bucknall, A. Haupt, K. Wei, J. Scheurer, M. Hobbhahn, L. Sharkey, S. Krishna, M. V. Hagen, S. Alberti, A. Chan, Q. Sun, M. Gerovitch, D. Bau, M. Tegmark, D. Krueger, and D. Hadfield-Menell. Black-box access is insufficient for rigorous ai audits, 2024.
[32] M. Bhatt, S. Chennabasappa, C. Nikolaidis, S. Wan, I. Evtimov, D. Gabi, D. Song, F. Ahmad, C. Aschermann, L. Fontana, S. Frolov, R. P. Giri, D. Kapil, Y. Kozyrakis, D. LeBlanc, J. Milazzo, A. Straumann, G. Synnaeve, V. Vontimitta, S. Whitman, and J. Saxe. Purple llama cyberseceval: A secure coding benchmark for language models, 2023.
[33] D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran-Johnson, D. Amodei, T. Brown, N. Joseph, S. McCandlish, C. Olah, J. Kaplan, and J. Clark. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned, 2022.
[34] Henderson, P., Mitchell, E., Manning, C., Jurafsky, D., & Finn, C. (2023). Self-destructing models: Increasing the costs of harmful dual uses of foundation models. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23.
[35] Y. Dong, H. Chen, J. Chen, Z. Fang, X. Yang, Y. Zhang, Y. Tian, H. Su, and J. Zhu. How robust is google’s bard to adversarial image attacks?, 2023.
[36] Yin, Z., Wang, J., Cao, J., Shi, Z., Liu, D., Li, M., Sheng, L., Bai, L., Huang, X., Wang, Z., & others (2023). LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark. arXiv preprint arXiv:2306.06687.
ICML TiFA Workshop


















