Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 02 Dec 2025 07:12:02 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"692e9142-c14a" expires: Tue, 30 Dec 2025 02:48:25 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 1559:1387E:96EE55:A9B076:69533B20 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 02:38:25 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210033-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767062305.928614,VS0,VE206 vary: Accept-Encoding x-fastly-request-id: d15b33ab8d45d501621e96f836799c0f5c66365a content-length: 12716 Anand Mishra

Anand Mishra, PhD
CSE-210, Department of Computer Science and Engineering
Indian Institute of Technology Jodhpur
Jodhpur - 342030 (RJ), India

Currently, I serve as an Assistant Professor at the Department of Computer Science and Engineering at the Indian Institute of Technology Jodhpur. Prior to this role, I had the opportunity to work as a Postdoctoral Researcher under the mentorship of Dr. Partha Pratim Talukdar at the Indian Institute of Science, focusing on Knowledge-aware Computer Vision for nearly two years. For my doctoral studies, I conducted research on the interpretation of text within scene images at IIIT Hyderabad, where I had the privilege of being supervised by Prof. C. V. Jawahar and Dr. Karteek Alahari.

My current research interest lies in the intersection of vision and language. Specifically, I am deeply engaged in exploring the field of developing AI Agents that can understand human language and perceive and comprehend the visual world. The overarching goal of my research group at IIT Jodhpur, known as Vision, Language, and Learning Group or VL2G in short, is to advance the development of these intelligent agents towards bridging the gap between human and machine interaction. To know more about the recent research focus and activities of VL2G, please visit the group's website.

Recent/upcoming professional activities:

Area Chair: CVPR'26.

Program Co-Chair: ICVGIP'25.

Reviewer and/or PC member for: CVPR'20/22/23/24/25, ECCV'20/22/24, ICCV'19/21/23, ACL Rolling Review, ICLR'23, WACV'23, AAAI'20/21/22, IJCAI'19/20, ICDAR'19, IEEE TPAMI, IJCV, IEEE TKDD, CVIU, IJDAR, Pattern Recognition.

Co-organizer: ScalDoc 2023, NCVPRIPG'23, WDAR 2021/WDAR 2023, KBMM 2019/KBMM 2020.

Workshop Co-Chair: ICFHR'22.

News/Activities

[November 2025] Our debut work on healthcare AI -- PatientVLM meets DocVLM and a work on Few-shot Video Object Detection are accepted at AAAI 2026 Main Track.
[November 2025] Our works MATR (ICCV 2025) and MPA (EMNLP 2025) are selected for showcase at Vision India Session.
[August 2025] Our work Model Parity Alignment towards empowering small VLMs is accepted in EMNLP 2025 (Main Track).
[July 2025] Our work on Moment Alignment Transformer (MATR) is accepted in ICCV 2025.
[July 2025] Speaking at CVIT Summer School at IIIT Hyderabad.
[April 2025] Received Google's Gemma Academic Program Award to support our research with cloud resources.
[March 2025] I am a program co-chair for ICVGIP 2025.
[December 2024] Our work PatentLMM is accepted at AAAI 2025.
[December 2024] My upcoming course, CSL2050: Pattern Recognition and Machine Learning, is supported by Google Cloud Teaching Credits.
[October 2024] Shreya and Nakul from VL2G received (in absentia) the Director's Prize for graduating student with Best Academic Innovation Work among students of all B. Tech. Programs of the class of 2024 at IITJ convocation. See the announcement. View Institute Social Media Post.
[September 2024] Our latest work on scene text-to-scene text translation is now available on our project website. We have also made the code and data publicly available for those interested in exploring or building upon this work.
[September 2024] Large Multimodal Model-Extension to our work Text-KVQA (Singh et al., ICCV 2019) is accepted at Main Track of EMNLP 2024.
[July 2024] Speaking at ACM-India ROCS 2024.
[April 2024] Work on sketch-guided image inpainting by undergrad student Nakul Sharma is accepted at CVPR 2024 Workshop.
[December 2023] Our works - QDETRv (query-guided DETR for Video) and CSTBIR (composite sketch+text-based image retrieval) are accepted at AAAI 2024.
[October 2023] Two of our works, one in the domain of AI for Education and one in query-guided attention in vision transformers, are accepted at WACV'24.
[October 2023] Speaking at Distinguished Researcher Speaker Series (DRSS), Accenture Labs (Virtually) on our video works.
[September 2023] Speaking at NISER, Bhubaneswar (virtually) on our sketch-guided visual understanding works.
[July 2023] Received the Microsoft Academic Partnership Grant (MAPG) 2023 (see the announcement).
[May 2023] Recognized as one of the Outstanding Reviewers at CVPR 2023. The complete list is here.
[April 2023] Our work on retVQA and Floco-T5 are accepted in IJCAI 2023 (Main Track) and ICDAR 2023, respectively.
[March 2023] Our work on Few-shot Referring Relationships in Videos is accepted in CVPR 2023.
[January 2023] I am in the organizing team of NCVPRIPG'23. Please consider participating.
[January 2023] Speaking at ACM-India ARCS'23.
[October 2022] Thanks to Accenture Labs for a Gift Grant.
[October 2022] Our works COFAR, VisTOT and Scene Graph Grounding are accpeted at AACL-IJCNLP 2022, EMNLP 2022 and WACV 2023, respectively.
[March 2022] Speaking at Search Technology Centre India (STCI), Microsoft.
[March 2022] Received IIT J-Research Initiation Seed Grant.
[March 2022] Speaking at my undergraduate alma mater on Fundamentals of Neural Networks, Guru Ghasidas Central University Bilaspur.
[February 2022] Speaking at ICMI Pre-Conference Workshop 2022, IIIT Bangalore (Virtual).
[February 2022] Speaking at WADLA 2022, IIIT Sri City (Virtual).
[January 2022] Received the SERB-Startup Research Grant.
[December 2021] Two of the graduating members of my research group, Vaibhav Mishra and Mayank Maheshwari got the Director's Prize for the best academic innovation work among all the graduating students of 2021 batch at IIT Jodhpur convocation. (See the award receiving moment).
[November 2021] My PhD Student Abhirama P. got selected as Prime Minister's Research Fellow.
[September 2021] Speaking in a panel at DocVQA workshop under ICDAR 2021.
[September 2021] Recognized as one of the outstanding reviewers at ICCV 2021. See the list.
[July 2021] Our work on Few-shot Visual Relationship Co-Localisation with Revant Teotia, Vaibhav Mishra and Mayank Maheshwari got accepted in ICCV 2021. The paper and code are available now.
[June 2021] Got selected for Microsoft Academic Partnership Grant (MAPG) 2021.
[April 2021] A paper on TextVQG got accepted in ICDAR 2021. Paper is available here.
[February 2021] Presenting a poster on 'Multimodal Machine Learning for Enhanced Image Understanding' under the 'Machine Learning and Big Data Analytics Track', at the 11th Indo German Frontier of Engineering Conference (INDOGFE 2021).
[February 2021] Speaking at WADLA 2021 (Virtual event hosted by IIIT Sri City) [Slides].
[December 2020] Our ICCV 2019 paper on textKVQA is accepted for a presentation at Vision India.
[August 2020] Received IIT-Jodhpur Teaching Excellence Award 2020 (see announcement).
[July 2020] One paper got accepted at ECCV 2020 as spotlight (top-5% paper). This was joint work with Aditay, Rajath and Anirban.
[June 2020] Speaking as an invited speaker at Deep Learning for Computer Vision Session at SPCOM 2020 going to be organized virtually at IISc Bangalore.
[June 2020] Got a research grant from the Accenture labs.
[April 2020] Co-organizing 2nd workshop on KBMM co-located with AKBC 2020.
[December 2019] Gave a talk on our recent works on knowledge-aware Computer Vision to a small group of developers/researchers from Siemens at IISc Bangalore.
[July 2019] Our paper on the "Knowledge-enabled" VQA model that can read got accepted in ICCV 2019 for an oral presentation.
[July 2019] Joined IIT-J.
[June 2019] Gave a talk on "Reading Text in Scene Images, Bridging it to World Knowledge, and Beyond" at Department of CSE, IIT Guwahati.
[May 2019] Our OCR-VQA paper got accepted in ICDAR 2019.
[April 2019] Gave a talk on "Knowledge-aware Visual Question Answering" at IIIT Bangalore.
[April 2019]: We (with Sameer Singh and Pouya Pezeshkpour from University of California, Irvin and Partha Talukdar from IISc) are organizing a workshop: "Knowledge Bases and Multiple Modalities (KBMM)" at Automated Knowledge Base Construction (AKBC) 2019. Please consider submitting an extended abstract related to Knowledge Bases and Multiple Modalities work.
[March 2019]: I will be serving as a program committee member for ICDAR 2019.
[February 2019]: I will be serving as a reviewer for ICCV 2019.
[February 2019]: Offered a lecture on Graph Representation Learning in Deep Learning for Computer Vision course at IISc Bangalore.
[January 2019]: Obtained research grant from Siemens. Will be working as co-PI with Dr. Anirban and Dr. Partha on this project.
[January 2019]: I will be serving as a program committee member for IJCAI 2019.
[December 2018]: I am attending AAAI 2019. My travel is supported by a gift of USD 3000 by Google to the Indian Institute of Science. Thank you Google.
[November 2018]: I am serving as a program committee member for Workshop on Document Analysis and Recognition (DAR) 2018. (to be held as part of ICVGIP 2018)
[November 2018]: Our work on "Knowledge-Aware Visual Question Answering" got accepted in AAAI 2019 (acceptance rate = 16 %)
[October 2018]: My solo author paper on deep heterogeneous metric learning got accepted for publication at
International Journal of Multimedia Information Retrieval (IJMIR). This paper addresses the problem of matching cartoon and real faces.
[September 2018]: Our paper on deep metric learning got accepted at ACCV 2018. Pre-print will be available soon!
[September 2018]: I will be attending Amazon Research Day on September 28.
[July 2018]: I will be co-teaching an undergraduate introductory course on Algorithms and Programming at IISc Bangalore. [Link]
[August 2017]: Joined IISc Bangalore as a PostDoc.

Selected Publications

Journals

Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding,
Anik De, Abhirama Subramanyam Penamakuri, Rajeev Yadav, Aditya Rathore, Harshiv Shah, Devesh Sharma, Sagar Agarwal, Pravin Kumar, Anand Mishra
[Preprint][Bharat Scene Text Dataset] [ ]
Under Peer Review.

Moment Alignment Transformer for Video-to-Video Moment Retrieval,
Yogesh Kumar, Uday Agarwal, Manish Gupta, Anand Mishra
[Preprint]
Under Peer Review.

Bridging language to visuals: towards natural language query-to-chart image retrieval,
Neelu Verma, Anik De, Anand Mishra
Volume: 13 (3), 32 , International Journal of Multimedia Information Retrieval, 2024.
[ Link ]

Multimodal Query-guided Object Localization,
Aditay Tripathi, Rajath R. Dani, Anand Mishra, Anirban Chakraborty
Volume: 83 (5), Pages: 14857-14881, Multimedia Tools and Applications, 2024.
[ Link ]

DHFML: deep heterogeneous feature metric learning for matching photograph and cartoon pairs
Anand Mishra
pages: 1-8, International Journal of Multimedia Information Retrieval, 2018.
[Link][bibtex]

Unsupervised refinement of color and stroke features for text binarization
Anand Mishra, Karteek Alahari and C. V. Jawahar
Volume 20:105–121, International Journal on Document Analysis and Recognition 2017
[PDF][bibtex]

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues
Anand Mishra, Karteek Alahari and C. V. Jawahar
Volume 145: 30-42, Computer Vision and Image Understanding 2016
[PDF][bibtex]

Conference Papers

PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis
K Lokesh, Uday Agarwal, Abhirama Subramanyam Penamakuri, Apoorva Challa, Shreya K Gowda, Somesh Gupta, Anand Mishra
AAAI 2026.(NEW)

[Paper] [Code] COMING SOON

Temporal Object-Aware Vision Transformer for Few-Shot Video Object Detection
Yogesh Kumar, Anand Mishra
AAAI 2026.(NEW)

[Paper] COMING SOON

When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
Abhirama Subramanyam Penamakuri*, Navlika Singh*, Piyush Arora*, Anand Mishra (*: Equal contribution)
EMNLP 2025.(NEW)

[Paper] [Code]

Aligning Moments in Time using Video Queries
Yogesh Kumar*, Uday Agarwal*, Manish Gupta, Anand Mishra (*: Equal contribution)
ICCV 2025.(NEW)

[Paper] [Code]

AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval
Suyash Maniyar*, Vishvesh Trivedi*, Ajoy Mondal, Anand Mishra, C.V. Jawahar (*: Equal contribution)
ICDAR 2025.(NEW)

[Paper] [Project Page] [Code]

PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures,
Shreya Shukla*, Nakul Sharma*, Manish Gupta, Anand Mishra (*: Equal contribution).
AAAI 2025.(NEW)

[Paper] [Project Page] [Code] [Short Talk]

Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant,
Abhirama Subramanyam Penamakuri, Anand Mishra.
EMNLP 2024.

[Paper] [Project Page] [Code] [Short Talk]

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation,
Shreyas Vaidya^*, Arvind Kumar Sharma^*, Prajwal Gatti, Anand Mishra. (*: equal contribution)
ICPR 2024.

[Paper] [Project Page] [Code] [Poster] [Short Talk]

QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos,
Yogesh Kumar, Saswat Mallick, Anand Mishra, Sowmya Rasipuram, Anutosh Maitra, Roshni Ramnani
AAAI 2024.

[Paper]

Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions,
Prajwal Gatti, Kshitij Parikh, Dhriti Paul, Manish Gupta, Anand Mishra.
AAAI 2024.

[Paper] [Project Page] [CSTBIR Dataset] [Code]

Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch,
Aditay Tripathi, Anand Mishra, Anirban Chakraborty
WACV 2024.(NEW)
[Paper][Project Page][Code]

Semantic Labels-Aware Transformer Model for Searching over a Large Collection of Lecture-Slides,
K.V. Jobin, Anand Mishra, C. V. Jawahar
WACV 2024 (Oral). (NEW: Best Paper Award Finalist)
[Paper][Project Page] [LecSD Dataset][Short Talk]

Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering,
Abhirama Subramanyam Penamakuri, Manish Gupta, Mithun Das Gupta, Anand Mishra
IJCAI 2023.
[Paper][Project Page][Code]

Towards Making Flowchart Images Machine Interpretable,
Shreya Shukla, Prajwal Gatti, Yogesh Kumar, Vikash Yadav, Anand Mishra
ICDAR 2023.
[Paper][Project Page][Code]

Few-Shot Referring Relationships in Videos,
Yogesh Kumar, Anand Mishra
CVPR 2023.
[Paper][Project Page][Code]

Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing
Aditay Tripathi, Anand Mishra, Anirban Chakraborty,
WACV 2023.
[Paper][Project Page][Code]

VISTOT: Vision-Augmented Table-to-Text Generation,
Prajwal Gatti, Anand Mishra, Manish Gupta, Mithun Das Gupta,
EMNLP 2022.
[Paper][Project Page][Code]

COFAR: Commonsense and Factual Reasoning in Image Search
Prajwal Gatti, Abhirama Subramanyam Penamakuri, Revant Teotia, Anand Mishra, Shubhashis Sengupta, Roshni Ramnani
AACL-IJCNLP 2022.
[Paper][Project Page][Code]

Few-shot Visual Relationship Co-localization
Revant Teotia^*, Vaibhav Mishra^*, Mayank Maheshwari^*, Anand Mishra,
ICCV 2021.
[Paper][Project Page][Code] (*: equal contribution)

Look, Read and Ask: Learning to Ask Questions by Reading Text in Images ,
Soumya Jahagirdar, Shankar Gangisetty, Anand Mishra,
ICDAR 2021 (Oral).
[Paper]

Sketch-Guided Object Localization in Natural Images,
Aditay Tripathi, Rajath R. Dani, Anand Mishra, Anirban Chakraborty
ECCV 2020 (Spotlight Presentation).
[Paper] [bibtex] [Project page][Code] [Know the paper in 90 seconds] [Know the paper in ten minutes]

From Strings to Things: Knowledge-enabled VQA model that can read and reason,
Ajeet Kumar Singh, Anand Mishra, Shashank Shekhar, and Anirban Chakraborty
ICCV 2019 (oral).
[Paper] [bibtex] [Project page]

OCR-VQA: Visual Question Answering by Reading Text in Images
Anand Mishra, Shashank Shekhar, Ajeet Kumar Singh, and Anirban Chakraborty
ICDAR 2019.
[Paper] [bibtex] [Project page]

KVQA: Knowledge-aware Visual Question Answering
Sanket Shah*, Anand Mishra*, Naganand Yadati and Partha Pratim Talukdar
(*: equal contribution) AAAI 2019. (acceptance rate: 16.1%)
[Paper] [bibtex] [Project page]

Deep Embedding using Bayesian Risk Minimization with Application to Sketch Recognition
Anand Mishra,and Ajeet Kumar Singh
ACCV, 2018. (acceptance rate: 28%)
[Paper (arXiv)] [bibtex]

IIIT-CFW: A Benchmark database of Cartoon Faces in the Wild
Ashutosh Mishra, Shyam N. Roy, Anand Mishra,and C. V. Jawahar
ECCVW, 2016. (Oral)
[PDF] [bibtex][ IIIT-CFW dataset]

A Simple and Effective method for Script Identification in the Wild
Ajeet Kumar Singh, Anand Mishra, Pranav Dabaral and C. V. Jawahar
DAS, 2016.
[Paper] [bibtex]

Scene Text Recognition and Retrieval for Large Lexicons
Udit Roy, Anand Mishra, Karteek Alhari and C. V. Jawahar
ACCV 2014.
[Paper] [bibtex]

Image Retrieval using Textual Cues
Anand Mishra, Karteek Alhari and C. V. Jawahar
ICCV, 2013.
[Paper] [bibtex]

Whole is Greater than Sum of Parts: Recognizing Scene Text Words
Vibhor Goel, Anand Mishra, Karteek Alhari and C. V. Jawahar
ICDAR, 2013.
[Paper] [bibtex]

Scene Text Recognition using Higher Order Language Priors
Anand Mishra, Karteek Alhari and C. V. Jawahar
BMVC 2012. (Oral)
[Paper] [bibtex] [ IIIT-5K Word dataset]

Top-down and Bottom-up cues for Scene Text Recognition
Anand Mishra, Karteek Alhari and C. V. Jawahar
CVPR 2012.
[Paper] [bibtex]

An MRF model for Binarization of Natural Scene Text
Anand Mishra, Karteek Alhari and C. V. Jawahar
ICDAR 2011. (Oral)
[Paper] [bibtex]

Teaching

At IIT Jodhpur

CSL7130: Mathematical Foundations for Computer Science (Monsoon’25)

CSL2050: Pattern Recognition and Machine Learning (Spring’24/25)

CSL7670: Fundamentals of Machine Learning (Monsoon’23/24)

CSL7360: Computer Vision (Spring’23)

CSL2040: Maths for Computing (Monsoon’22/21/AY 20-21–Tri-3)

CSL7410: Graph Theory and Application (AY 20-21–Tri-1, Spring’22)

CS222: Theory of Computation (Spring’20)

CS212: Object-oriented Design and Analysis (Monsoon’19)

At IISc Bangalore

UE101: Algorithms and Programming (Monsoon'18)

- Co-taught at Indian Institute of Science with Dr. Sathish Govindrajan and Dr. Viraj Kumar.

At IIIT Hyderabad (during PhD)

Computer Problem Solving (Monsoon'16)

-- Introductory course for M.Tech. Bioinformatics

At IIIT Sri City (during PhD)

Computer Architecture (Spring'15)

-- Co-taught at IIIT Sricity as a visiting instructor with Dr. Suresh Purini and Prof. Govindrajulu

Operating Systems (Monsoon'14)

-- Co-taught at IIIT Sricity as a visiting instructor with Dr. Suresh Purini

Open Positions

We do not offer any short-term (summer/winter) internship positions, except occasional special calls for specific project needs. I apologize for not being able to respond to individual emails regarding these positions.

For Non-IITJ students: below are the current open positions:

PhD/MTech-PhD Position: We have open positions for Full-time PhD and MTech-PhD currently. If you are interested in pursuing PhD or MTech-PhD, please consider applying through the institute's official route: Link.
Research Assistant/Research Engineers/Pre Doc Position:We hire highly motivated engineering graduates, those who graduated or graduating this semester BTech/BE, preferably in CS/EE/AI for "Full-Time (in-person)" Research Engineer or Research Assistant positions. This is a rolling call. Exceptional academic credentials, sound machine learning and deep learning knowledge, good programming skills, and passion for doing world-class research (and development) are essential for these positions. If you are eligible and interested, please consider applying HERE. Next cut-off date: December 15, 2024. (NEW)

Original Source | Taken Source