| CARVIEW |
Arjun Akula (Arjun Reddy Akula)
Senior Research ScientistGoogle DeepMind
Email: aakula-at-ucla.edu
Previous: UCLA; Amazon Alexa AI; IBM Research; IIIT Hyderabad
[Curriculum Vitae] [Google Scholar]
About
|
I am a Senior Research Scientist at Google DeepMind in Mountain View, California. I got my PhD from UCLA, under joint guidance of Prof. Song-Chun Zhu (UCLA) and Prof. Joyce Chai (UMich). My research interests are in computer vision, natural language processing (NLP), statistical modeling and inference, and deep learning, with the focuses on interpretability, robustness, and trust in vision & language grounding models. I also work closely with Prof. Siva Reddy (Mila & Stanford), Prof. Sinisa Todorovic (Oregon State University), Dr. Spandana Gella (Amazon AI & University of Edinburgh), Dr. Xiaodan Liang (CMU), Dr. Varun Jampani (Google Research), and Dr. Changsong Liu (UCLA & DMAI). Prior to this, I was a Research Software Engineer at IBM Research in the Cognitive Research Department supervised by Dr. Gargi B Dasgupta. I did my Bachelors and Masters (by Research) in Computer Science and Engineering at IIIT Hyderabad, India. For my masters thesis, I worked on Question Answering Systems and Dialogue Systems at Language Technologies Research Center (LTRC) under the supervision of Prof. Radhika Mamidi and Prof. Rajeev Sangal. Academic Activities Reviewer/PC: JAIR 2022,EMNLP 2022, ECCV 2022, ACL Rolling Review (ACL 2022, NAACL 2022), CVPR 2022, Artificial Intelligence Review Journal 2022, ACM Computing Surveys 2022, EAAI - AAAI22, AAAI 2022, EMNLP 2021, ACL-IJCNLP 2021, ICCV 2021, CVPR 2021, ACM TiiS 2021, NAACL 2021, AAAI 2021, EACL 2021, ACL 2020, EMNLP 2020, ICON 2018, 17
Sub-Reviewer: CVPR 2019, EMNLP-IJCNLP 2019, ACL 2019, ECCV 2018, EMNLP 2018, EMNLP 2017
Panelist/Organizer: UCLA Data Science Workshop 2018, UCLA ASA DataFest 2017
Workshop Reviewier/Organizer/Program Chair: In2Writing @ ACL 2022, WIT @ ACL 2022, NLP-Power @ ACL 2022, MML @ ACL 2022, CSRR @ ACL 2022, Insights @ ACL 2022, XAI Debugging @ NeurIPS 2021, XAI-AAAI 2021, ALVR 2021, RepL4NLP 2021, RepL4NLP 2020
Graduate Admission Review: CS Department, UCLA, 2020
|
News
- [2022, Jan] My recent iScience journal is covered by UCLA Newsroom.
- [2022, Jan] I completed my PhD from UCLA and joined a full-time position at Google as Research Scientist.
- [2021, Nov] I defended my PhD thesis.
- [2021, Sep] One paper accepted in NeurIPS 2021.-- New.
- [2021, Aug] Two papers accepted in EMNLP 2021 (Long Paper, Main).-- New.
- [2021, Aug] Our journal work on Explainable AI (X-ToM) is accepted in iScience Cell Press Journal 2021.-- New.
- [2021, Mar] I will be interning at Amazon Alexa AI, Sunnyvale, in Summer 2021 as Applied Scientist Intern working with Dr. Spandana Gella, Prof. Jesse Thomason, Prof. Mohit Bansal, and Dr. Dilek Hakkani-Tur.
- [2020, Jul] Our recent vision and language grounding work 'Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions' is accepted in ACL 2020 (acceptance rate: 17.6%) [Oral Presentation].
- [2020, Mar] I will be interning at Google Research, Los Angeles, in Summer 2020 as Research Intern working with Dr. Soravit Changpinyo and Dr. Radu Soricut.
- [2020, Mar] Presented our poster and demos on Explainable AI (XAI) at DARPA XAI Phase-2 vPI Meeting.
- [2020, Feb] Presented our CoCo-X work (Oral, Spotlight) at AAAI 2020, Hilton New York Midtown, New York USA.
- [2019, Nov] Our explainable AI work 'CoCo-X: Generating Conceptual and Counterfactual Explanations via Fault-Lines' is accepted in AAAI 2020 (acceptance rate: 20.5%) [Oral, Spotlight].
- [2019, Sep] Our work 'X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust' is now on arXiv (arXiv:1909.06907).
- [2019, June] Presented our work on XAI at CVPR 2019 Workshop on XAI, Long Beach, California.
- [2019, May] Our paper 'Visual Discourse Parsing' has been accepted in CVPR 2019 workshop on Language and Vision. Also, selected as one among the three best oral papers. (pdf)
- [2019, Apr] I will be at Amazon AI, Palo Alto, in Summer 2019 as Applied Scientist-I (PhD Research Intern) working with Dr. Spandana Gella, Prof. Siva Reddy and Dr. Yaser Al Onaizan.
- [2019, Mar] Our paper 'Explainable AI as Collaborative Task Solving' has been accepted in CVPR 2019 workshop on Explainable AI.(pdf)
- [2019, Mar] Our paper 'Natural Language Interaction with Explainable AI Models' has been accepted in CVPR 2019 workshop on Explainable AI.(pdf)
- [2019, Feb] Presented our work 'Explainable AI as Collaborative Task Solving' at DARPA XAI PI Meeting, UC Berkeley.
- [2019, Feb] Our work 'Natural Language Interaction with Explainable AI Models' is now on arXiv (arXiv:1903.05720v1).
- [2019, Feb] Our work 'Visual Discourse Parsing' is now on arXiv (arXiv:1903.02252v2).
- [2018, May] Presented our demo on Explainable AI (XAI) at DARPA XAI Phase-1 Meeting, Westpoint, New York.
- [2017, Dec] Passed PhD written and oral qualifiers at UCLA. Advanced to Candidacy. I am now a PhD Candidate!
- [2017, Sep] Visited IBM Research Labs, India.
- [2017, Aug] Attended ACL 2017, Vancouver, Canada.
- [2016, Oct] Starting my PhD at UCLA.
- [2014, Mar] Joined IBM Research, India, as Research Software Engineer.
Selected Publications
![]() |
Robust Visual Reasoning via Language Guided Neural Module Networks
Arjun Reddy Akula, Varun Jampani, Soravit Changpinyo, Song-Chun Zhu. NeurIPS 2021 More details coming soon ... |
|
CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization
Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu and Radu Soricut. EMNLP 2021 (Long Paper, Main) More details coming soon ... |
|
|
Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions
Arjun Akula, Spandana Gella, Keze Wang, Song-Chun Zhu and Siva Reddy EMNLP 2021 (Long Paper, Main) More details coming soon ... |
|
![]() |
X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust
Arjun R Akula, Keze Wang, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Y Chai, Song-Chun Zhu. iScience Cell Press 2021 (arXiv:1909.06907). (A short version was presented at CVPRW 2019). [abs] [pdf] [code] We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations. We pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, the machine generates sequence of explanations in a dialog which takes into account three important aspects at each dialog turn: (a) human's intention (or curiosity); (b) human's understanding of the machine; and (c) machine's understanding of the human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human's intention, machine's mind as inferred by the human as well as human's mind as inferred by the machine. In other words, these explicit mental representations in ToM are incorporated to learn an optimal explanation policy that takes into account human's perception and beliefs. Furthermore, we also show that ToM facilitates in quantitatively measuring justified human trust in the machine by comparing all the three mental representations. We applied our framework to three visual recognition tasks, namely, image classification, action recognition, and human body pose estimation. We argue that our ToM based explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex machine learning models. To the best of our knowledge, this is the first work to derive explanations using ToM. Extensive human study experiments verify our hypotheses, showing that the proposed explanations significantly outperform the state-of-the-art XAI methods in terms of all the standard quantitative and qualitative XAI evaluation metrics including human trust, reliance, and explanation satisfaction. |
![]() |
CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines
Arjun R Akula, Shuai Wang, Song-Chun Zhu. AAAI 2020 (acceptance rate: 20.5%) [Oral, Spotlight] [abs] [pdf] [code][poster][slides] We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as Fault-Lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model M predicts class c_pred, our fault-line based explanation identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class c_alt. We argue that, due to the conceptual and counterfactual nature of fault-lines, our CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, showing that CoCoX significantly outperform the state-of-the-art explainable AI models. |
![]() |
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions
Arjun R Akula, Spandana Gella, Yaser Al-Onaizan, Song-Chun Zhu, Siva Reddy. ACL 2020 (acceptane rate: 17.6%) [Oral Presentation] [abs] [data] [pdf] [code] [slides] [poster] Visual referring expression recognition is a challenging task that requires natural language understanding in the context of an image. We critically examine RefCOCOg, a standard benchmark for this task, using a human study and show that 83.7% of test instances do not require reasoning on linguistic structure. To measure the true progress of existing models, we split the test set into two sets, one which requires reasoning on linguistic structure and the other which doesn't. Additionally, we create an adversarial dataset Ref-Adv that tests the model's ability to generalize to unseen distribution of target referring objects. Using these datasets, we empirically show that existing methods fail to exploit linguistic structure and are 12% to 23% lower in performance than the established progress for this task. We also propose two methods, one based on negative sampling and the other based on multi-task learning, to increase the robustness of ViLBERT, the current state-of-the-art model for this task. |
![]() |
Visual Discourse Parsing
Arjun R Akula, Song-Chun Zhu. CVPR 2019 workshop on Language and Vision (arXiv:1903.02252v1) [Oral]. Also, selected as one among the three best oral papers. [abs] [pdf] [code] [slides] [poster] Text-level discourse parsing aims to unmask how two segments (or sentences) in the text are related to each other. We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video. Here we use the term scene to refer to a subset of video frames that can better summarize the video. In order to collect a dataset for learning discourse cues from videos, one needs to manually identify the scenes from a large pool of video frames and then annotate the discourse relations between them. This is clearly a time consuming, expensive and tedious task. In this work, we propose an approach to identify discourse cues from the videos without the need to explicitly identify and annotate the scenes. We also present a novel dataset containing 310 videos and the corresponding discourse cues to evaluate our approach. We believe that many of the multi-discipline Artificial Intelligence problems such as Visual Dialog and Visual Storytelling would greatly benefit from the use of visual discourse cues. |
|
Explainable AI as Collaborative Task Solving
Arjun R Akula, Changsong Liu, Sinisa Todorovic, Joyce Y Chai, Song-Chun Zhu. CVPR 2019 workshop on Explainable AI [abs] [pdf] [code] [poster] other versions: writeup1.pdf
|
|
![]() |
Natural Language Interaction with Explainable AI Models
Arjun R Akula, Sinisa Todorovic, Joyce Y Chai, Song-Chun Zhu. CVPR 2019 workshop on Explainable AI [abs] [pdf] [code] This paper presents an explainable AI (XAI) system that provides explanations for its predictions. The system consists of two key components -- namely, the prediction And-Or graph (AOG) model for recognizing and localizing concepts of interest in input data, and the XAI model for providing explanations to the user about the AOG's predictions. In this work, we focus on the XAI model specified to interact with the user in natural language, whereas the AOG's predictions are considered given and represented by the corresponding parse graphs (pg's) of the AOG. Our XAI model takes pg's as input and provides answers to the user's questions using the following types of reasoning: direct evidence (e.g., detection scores), part-based inference (e.g., detected parts provide evidence for the concept asked), and other evidences from spatio-temporal context (e.g., constraints from the spatio-temporal surround). We identify several correlations between user's questions and the XAI answers using Youtube Action dataset. |
|
Automatic problem extraction and analysis from unstructured text in IT tickets
S Agarwal, V Aggarwal, Arjun Akula, Gargi Dasgupta, G Sridhara IBM Journal of Research and Development 2017 [abs] [pdf] Many large IT service providers, having realized the transformational impact of cognitive technology, are experimenting with cognitive service agents like Amelia and IBM Watson for IT problem resolution. We build a system that extracts knowledge about different classes of problems arising in the IT infrastructure, mine problem linkages to recent system changes, and identify the resolution activities to mitigate problems. The system, at its core, uses data mining, machine learning, and natural language parsing techniques. By using extracted knowledge, one can (i) understand the kind of problems and the root causes affecting the IT infrastructure, (ii) proactively remediate the causes so that they no longer result in problems, and (iii) estimate the scope for automation for service management |
|
![]() |
Classification of Attributes in a Natural Language Query into Different SQL Clauses
Ashish Palakurthi, Ruthu S.M., Arjun Akula, and Radhika Mamidi. Recent Advances in Natural Language Processing (RANLP 2015) [abs] [pdf] Attribute information in a natural language query is one of the key features for converting a natural language query into a Structured Query Language (SQL) in Natural Language Interface to Database systems. In this paper, we explore the task of classifying the attributes present in a natural language query into different SQL clauses in a SQL query. In particular, we investigate the effectiveness of various features and Conditional Random Fields for this task. Our system uses a statistical classifier trained on manually prepared data. We report our results on three different domains and also show how our system can be used for generating a complete SQL query. |
![]() |
Towards Auto-Remediation in Services Delivery: Context-based Classification of Noisy and Unstructured Tickets
Gargi Dasgupta, Tapan Nayak, Arjun Akula, Shivali Agarwal, Shripad Nadgowda. International Conference on Service-Oriented Computing (ICSOC 2014) [abs] [pdf] Service interactions account for major source of revenue and employment in many modern economies, and yet the service operations management process remains extremely complex. Ticket is the fundamental management entity in this process and resolution of tickets remains largely human intensive. A large portion of these human executed resolution tasks are repetitive in nature and can be automated. Ticket description analytics can be used to automatically identify the true category of the problem. This when combined with automated remediation actions considerably reduces the human effort. We look at monitoring data in a big provider's domain and abstract out the repeatable tasks from the noisy and unstructured human-readable text in tickets. We present a novel approach for automatic problem determination from this noisy and unstructured text. The approach uses two distinct levels of analysis, (a) correlating different data sources to obtain a richer text followed by (b) context based classification of the correlated data. We report on accuracy and efficiency of our approach using real customer data. |
![]() |
A Novel Approach towards Incorporating Context Processing Capabilities in NLIDB system
Arjun Akula, Rajeev Sangal and Radhika Mamidi. International Joint Conference on Natural Language Processing (IJCNLP 2013) [abs] [pdf] This paper presents a novel approach to categorize, model and identify contextual information in natural language interface to database (NLIDB) systems. The interactions between user and system are categorized and modeled based on the way in which the contextual information is utilized in the interactions. A relationship schema among the responses (user and system responses) is proposed. We present a novel method to identify contextual information in one specific type of usersystem interaction. We report on results of experiments with the university related queries. |
![]() |
A Novel Approach towards Building a Portable NLIDB system using the CPG Framework
Abhijeet Gupta, Arjun Akula, Deepak Malladi, Puneeth Kukkadapu, Vinay Ainavolu and Rajeev Sangal. International Conference on Asian Language Processing (IALP 2012) [abs] [pdf] This paper presents a novel approach to building natural language interface to databases (NLIDB) based on Computational Paninian Grammar (CPG). It uses two distinct stages of processing, namely, syntactic processing followed by semantic processing. Syntactic processing makes the processing more general and robust. CPG is a dependency framework in which the analysis is in terms of syntacticosemantic relations. The closeness of these relations makes semantic processing easier and more accurate. It also makes the systems more portable. |
Professional Experience
![]() |
Worked on Vision-Language-Navigation models for ALFRED benchmark. Mentors: Dr. Spandana Gella, Prof. Jesse Thomason, Prof. Mohit Bansal, Dr. Dilek Hakkani-Tur. |
![]() |
Worked on quantifying cross-dataset distribution shifts in VQA. Mentors: Dr. Soravit Changpinyo, Dr. Radu Soricut. |
![]() |
Worked on improving robustness of visual referring expression grounding models. Mentors: Dr. Spandana Gella, Prof. Siva Reddy, Dr. Yaser Al Onaizan. |
![]() |
Worked on contextualizing neural module networks. Mentors: Dr. Spandana Gella, Prof. Siva Reddy |
![]() |
Worked on human language technologies, IBM Watson QA system. |
Patents
|
Analyzing Unstructured Ticket Text
Using Discourse Cues in Communication Logs
Arjun Akula, Gargi B Dasgupta, Tapan K Nayak Disclosure Number: IN920150227, July 2015. US Patent App. |
|
A System and Method for
Structured Representation and Classification of Unstructured Tickets in Services Delivery
Shivali A, Arjun Akula, Gargi B, Tapan K, Shripad J. Disclosure Number: IN820140677, Oct 2014. US Patent App. |
|
Measuring Effective Utilization of a Service Practitioner for Ticket Resolution via a Wearable Device
Arjun Akula, Gargi Dasgupta, Vijay Ekambaram, Ramasuri Narayanam. Disclosure Number: IN920160178US1, Aug 2016. US Patent App |
Education
|
University of California, Los Angeles (UCLA)
Sep. 2016 -- Present PhD candidate Advisor: Prof. Song-Chun Zhu, Prof. Joyce Chai |
|
IIIT Hyderabad, India
Jun. 2012 -- Mar. 2014 Masters (by Research/Thesis) in Computer Science and Engineering Advisor: Prof. Radhika Mamidi, Prof. Rajeev Sangal |
|
IIIT Hyderabad, India
Aug. 2008 -- Jun. 2012 Bachelors in Computer Science and Engineering |
Awards
|
Invited Talks
|
Research Talk on Interpretability, Robustness, and Trust in Vision and Language Grounding Models
Facebook AI Research, Nov 2021. |
|
Research Talk on Interpretability, Robustness, and Trust in Vision and Language Grounding Models
Google Research, Feb 2021. |
|
Research Talk on Interpretability, Robustness, and Trust in Vision and Language Grounding Models
Microsoft AI, Feb 2021. |
|
Research Talk on Interpretability, Robustness, and Trust in Vision and Language Grounding Models
Baidu Research, Oct 2020. |
|
Research Talk on Interpretability, Robustness, and Trust in Vision and Language Grounding Models
Adobe Research, Oct 2020. |
|
Research Talk on Robustness of Grounding Visual Referring Expression
Google AI, Aug 2020. |
|
Research Talk on Interpretability and Trust in Vision and Language Grounding Models
UCLA, Aug 2019. |
|
Research Talk on Semantic Role Labelling for Watson Question Answering System
IBM Research, Feb 2016. |
|
Research Talk on Context based Natural Language Interfaces to Databases (NLIDB) and Dialogue Systems
Sixth IIIT-H Advanced Summer School on NLP (IASNLP 2015). |













