My current research interest lies in the intersection of vision and language.
Specifically, I am deeply engaged in exploring the field of developing AI Agents
that can understand human language and perceive and comprehend the visual world.
The overarching goal of my research group at IIT Jodhpur, known as Vision, Language, and Learning Group or VL2G in short, is
to advance the development of these intelligent agents towards bridging the gap between human and machine interaction.
To know more about the recent research focus and activities of VL2G, please visit the group's website.
[November 2025] Our debut work on healthcare AI -- PatientVLM meets DocVLM and a work on Few-shot Video Object Detection are accepted at AAAI 2026 Main Track.
[October 2024] Shreya and Nakul from VL2G received (in absentia) the Director's Prize for graduating student with Best Academic Innovation Work among students of all B. Tech. Programs of the class of 2024 at IITJ convocation. See the announcement. View Institute Social Media Post.
[September 2024] Our latest work on scene text-to-scene text translation is now available on our project website.
We have also made the code and data publicly available for those interested in exploring or building upon this work.
[September 2024] Large Multimodal Model-Extension to our work Text-KVQA (Singh et al., ICCV 2019) is accepted at Main Track of EMNLP 2024.
[October 2023] Two of our works, one in the domain of AI for Education and one in query-guided attention in vision transformers, are accepted at WACV'24.
[October 2023] Speaking at Distinguished Researcher Speaker Series (DRSS), Accenture Labs (Virtually) on our video works.
[September 2023] Speaking at NISER, Bhubaneswar (virtually) on our sketch-guided visual understanding works.
[July 2023] Received the Microsoft Academic Partnership Grant (MAPG) 2023 (see the announcement).
[May 2023] Recognized as one of the Outstanding Reviewers at CVPR 2023. The complete list is here.
[February 2022] Speaking at WADLA 2022, IIIT Sri City (Virtual).
[January 2022] Received the SERB-Startup Research Grant.
[December 2021] Two of the graduating members of my research group, Vaibhav Mishra and Mayank Maheshwari got the Director's Prize for the best academic innovation work among all the graduating students of 2021 batch at IIT Jodhpur convocation. (See the award receiving moment).
[November 2021] My PhD Student Abhirama P. got selected as Prime Minister's Research Fellow.
[September 2021] Speaking in a panel at DocVQA workshop under ICDAR 2021.
[September 2021] Recognized as one of the outstanding reviewers at ICCV 2021. See the list.
[July 2021] Our work on Few-shot Visual Relationship Co-Localisation with Revant Teotia, Vaibhav Mishra and Mayank Maheshwari got accepted in ICCV 2021. The paper and code are available now.
[June 2021] Got selected for Microsoft Academic Partnership Grant (MAPG) 2021.
[April 2021] A paper on TextVQG got accepted in ICDAR 2021. Paper is available here.
[February 2021] Presenting a poster on 'Multimodal Machine Learning for Enhanced Image Understanding' under the 'Machine Learning and Big Data Analytics Track', at the 11th Indo German Frontier of Engineering Conference (INDOGFE 2021).
[February 2021] Speaking at WADLA 2021 (Virtual event hosted by IIIT Sri City) [Slides].
[December 2020] Our ICCV 2019 paper on textKVQA is accepted for a presentation at Vision India.
[August 2020] Received IIT-Jodhpur Teaching Excellence Award 2020 (see announcement).
[July 2020] One paper got accepted at ECCV 2020 as spotlight (top-5% paper). This was joint work with Aditay, Rajath and Anirban.
[June 2020] Got a research grant from the Accenture labs.
[April 2020] Co-organizing 2nd workshop on KBMM co-located with AKBC 2020.
[December 2019] Gave a talk on our recent works on knowledge-aware Computer Vision to a small group of developers/researchers from Siemens at IISc Bangalore.
[July 2019] Our paper on the "Knowledge-enabled" VQA model that can read got accepted in ICCV 2019 for an oral presentation.
[July 2019] Joined IIT-J.
[June 2019] Gave a talk on "Reading Text in Scene Images, Bridging it to World Knowledge, and Beyond" at Department of CSE, IIT Guwahati.
[May 2019] Our OCR-VQA paper got accepted in ICDAR 2019.
[April 2019] Gave a talk on "Knowledge-aware Visual Question Answering" at IIIT Bangalore.
[March 2019]: I will be serving as a program committee member for ICDAR 2019.
[February 2019]: I will be serving as a reviewer for ICCV 2019.
[February 2019]: Offered a lecture on Graph Representation Learning in Deep Learning for Computer Vision course at IISc Bangalore.
[January 2019]: Obtained research grant from Siemens. Will be working as co-PI with Dr. Anirban and Dr. Partha on this project.
[January 2019]: I will be serving as a program committee member for IJCAI 2019.
[December 2018]: I am attending AAAI 2019. My travel is supported by a gift of USD 3000 by Google to the Indian Institute of Science. Thank you Google.
[July 2018]: I will be co-teaching an undergraduate introductory course on Algorithms and Programming at IISc Bangalore. [Link]
[August 2017]: Joined IISc Bangalore as a PostDoc.
Selected Publications
Journals
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding,
Anik De, Abhirama Subramanyam Penamakuri, Rajeev Yadav, Aditya Rathore, Harshiv Shah, Devesh Sharma, Sagar Agarwal, Pravin Kumar, Anand Mishra
[Preprint][Bharat Scene Text Dataset] []
Under Peer Review.
Moment Alignment Transformer for Video-to-Video Moment Retrieval,
Yogesh Kumar, Uday Agarwal, Manish Gupta, Anand Mishra
[Preprint]
Under Peer Review.
Bridging language to visuals: towards natural language query-to-chart image retrieval,
Neelu Verma, Anik De, Anand Mishra
Volume: 13 (3), 32 , International Journal of Multimedia Information Retrieval, 2024.
[ Link ]
Multimodal Query-guided Object Localization,
Aditay Tripathi, Rajath R. Dani, Anand Mishra, Anirban Chakraborty
Volume: 83 (5), Pages: 14857-14881, Multimedia Tools and Applications, 2024.
[ Link ]
DHFML: deep heterogeneous feature metric learning for matching photograph and cartoon pairs Anand Mishra
pages: 1-8, International Journal of Multimedia Information Retrieval, 2018.
[Link][bibtex]
Unsupervised refinement of color and stroke features for text binarization Anand Mishra, Karteek Alahari and C. V. Jawahar
Volume 20:105–121, International Journal on Document Analysis and Recognition 2017
[PDF][bibtex]
Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues Anand Mishra, Karteek Alahari and C. V. Jawahar
Volume 145: 30-42, Computer Vision and Image Understanding 2016
[PDF][bibtex]
Conference Papers
PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis
K Lokesh, Uday Agarwal, Abhirama Subramanyam Penamakuri, Apoorva Challa, Shreya K Gowda, Somesh Gupta, Anand Mishra AAAI 2026.(NEW)
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
Abhirama Subramanyam Penamakuri*, Navlika Singh*, Piyush Arora*, Anand Mishra (*: Equal contribution)
EMNLP 2025.(NEW)
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant,
Abhirama Subramanyam Penamakuri, Anand Mishra.
EMNLP 2024.
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation,
Shreyas Vaidya*, Arvind Kumar Sharma*, Prajwal Gatti, Anand Mishra. (*: equal contribution)
ICPR 2024.
Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch,
Aditay Tripathi, Anand Mishra, Anirban Chakraborty
WACV 2024.(NEW)
[Paper][Project Page][Code]
Semantic Labels-Aware Transformer Model for Searching over a Large Collection of Lecture-Slides,
K.V. Jobin, Anand Mishra, C. V. Jawahar
WACV 2024 (Oral).(NEW: Best Paper Award Finalist)
[Paper][Project Page]
[LecSD Dataset][Short Talk]
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering,
Abhirama Subramanyam Penamakuri, Manish Gupta, Mithun Das Gupta, Anand Mishra IJCAI 2023.
[Paper][Project Page][Code]
Towards Making Flowchart Images Machine Interpretable,
Shreya Shukla, Prajwal Gatti, Yogesh Kumar, Vikash Yadav, Anand Mishra ICDAR 2023.
[Paper][Project Page][Code]
Look, Read and Ask: Learning to Ask Questions by Reading Text in Images
,
Soumya Jahagirdar, Shankar Gangisetty, Anand Mishra, ICDAR 2021 (Oral).
[Paper]
From Strings to Things: Knowledge-enabled VQA model that can read and reason,
Ajeet Kumar Singh, Anand Mishra, Shashank Shekhar, and Anirban Chakraborty ICCV 2019 (oral).
[Paper] [bibtex]
[Project page]
OCR-VQA: Visual Question Answering by Reading Text in Images Anand Mishra, Shashank Shekhar, Ajeet Kumar Singh, and Anirban Chakraborty ICDAR 2019.
[Paper] [bibtex]
[Project page]
Deep Embedding using Bayesian Risk Minimization with Application to Sketch Recognition Anand Mishra,and Ajeet Kumar Singh ACCV, 2018. (acceptance rate: 28%)
[Paper (arXiv)] [bibtex]
IIIT-CFW: A Benchmark database of Cartoon Faces in the Wild
Ashutosh Mishra, Shyam N. Roy, Anand Mishra,and C. V. Jawahar ECCVW, 2016. (Oral)
[PDF] [bibtex][ IIIT-CFW dataset]
A Simple and Effective method for Script Identification in the Wild
Ajeet Kumar Singh, Anand Mishra, Pranav Dabaral and C. V. Jawahar DAS, 2016.
[Paper] [bibtex]
Scene Text Recognition and Retrieval for Large Lexicons
Udit Roy, Anand Mishra, Karteek Alhari and C. V. Jawahar ACCV 2014.
[Paper] [bibtex]
Image Retrieval using Textual Cues Anand Mishra, Karteek Alhari and C. V. Jawahar ICCV, 2013.
[Paper] [bibtex]
Whole is Greater than Sum of Parts: Recognizing Scene Text Words
Vibhor Goel, Anand Mishra, Karteek Alhari and C. V. Jawahar ICDAR, 2013.
[Paper] [bibtex]
Scene Text Recognition using Higher Order Language Priors Anand Mishra, Karteek Alhari and C. V. Jawahar BMVC 2012. (Oral)
[Paper] [bibtex]
[ IIIT-5K Word dataset]
Top-down and Bottom-up cues for Scene Text Recognition Anand Mishra, Karteek Alhari and C. V. Jawahar CVPR 2012.
[Paper] [bibtex]
An MRF model for Binarization of Natural Scene Text Anand Mishra, Karteek Alhari and C. V. Jawahar ICDAR 2011. (Oral)
[Paper] [bibtex]
Teaching
At IIT Jodhpur
CSL7130: Mathematical Foundations for Computer Science (Monsoon’25)
CSL2050: Pattern Recognition and Machine Learning (Spring’24/25)
CSL7670: Fundamentals of Machine Learning (Monsoon’23/24)
CSL7360: Computer Vision (Spring’23)
CSL2040: Maths for Computing (Monsoon’22/21/AY 20-21–Tri-3)
CSL7410: Graph Theory and Application (AY 20-21–Tri-1, Spring’22)
CS222: Theory of Computation (Spring’20)
CS212: Object-oriented Design and Analysis (Monsoon’19)
At IISc Bangalore
UE101: Algorithms and Programming (Monsoon'18)
- Co-taught at Indian Institute of Science with Dr. Sathish Govindrajan and Dr. Viraj Kumar.
At IIIT Hyderabad (during PhD)
Computer Problem Solving (Monsoon'16)
-- Introductory course for M.Tech. Bioinformatics
At IIIT Sri City (during PhD)
Computer Architecture (Spring'15)
-- Co-taught at IIIT Sricity as a visiting instructor with Dr. Suresh Purini and Prof. Govindrajulu
Operating Systems (Monsoon'14)
-- Co-taught at IIIT Sricity as a visiting instructor with Dr. Suresh Purini
Open Positions
We do not offer any short-term (summer/winter) internship positions, except occasional special calls for specific project needs.
I apologize for not being able to respond to individual emails regarding these positions.
For Non-IITJ students: below are the current open positions:
PhD/MTech-PhD Position: We have open positions for Full-time PhD and MTech-PhD currently. If you are interested in pursuing PhD or MTech-PhD, please consider applying through the institute's official route:
Link.
Research Assistant/Research Engineers/Pre Doc Position:We hire highly motivated engineering graduates, those who graduated or graduating this semester BTech/BE, preferably in CS/EE/AI for "Full-Time (in-person)" Research Engineer or Research Assistant positions.
This is a rolling call. Exceptional academic credentials, sound machine learning and deep learning knowledge, good programming skills, and passion for doing world-class research (and development) are essential for these positions. If you are eligible and interested, please consider applying HERE.
Next cut-off date: December 15, 2024. (NEW)