CurriculumVitae AishwaryaAgrawal

Aishwarya Agrawal
aishwarya.agrawal@mila.quebec • https://www.iro.umontreal.ca/~agrawal/
Appointments
• University of Montreal, QC, Canada Start: December 2020

Assistant Professor
Department of Computer Science and Operations Research (DIRO)
• Mila, QC, Canada Start: December 2020
Core Member
Canada CIFAR AI Chair
• Google DeepMind, Montreal, Canada Start: December 2020
20% Research Scientist
• DeepMind, London, UK Aug 2019 - Dec 2020
Research Scientist
Education
• Georgia Institute of Technology, Atlanta, USA Aug 2017 - Aug 2019

Ph.D., Computer Science
GPA: 4.0/4.0, Advisor: Dhruv Batra
• Virginia Tech, Blacksburg, USA Aug 2014 - Aug 2017 (transferred)
Ph.D., Computer Engineering
GPA: 4.0/4.0, Advisor: Dhruv Batra
• Indian Institute of Technology (IIT), Gandhinagar, India July 2010 - May 2014
Bachelor of Technology, Electrical Engineering; Minor, Computer Science and Engineering
GPA: 9.42/10, B.Tech Project Advisor: Shanmuganathan Raman
Institute Silver Medal
Honors and Awards
• Young Alumni Excellence Award 2023 for Outstanding Academic Achievements 2023
(Awarded by IIT Gandhinagar to recognize the achievements of their alumni who have made
outstanding contributions to their respective fields.)
• Runners-Up of 2019 AAAI / ACM SIGAI Dissertation Award 2020
(One of 2 runners-up from nominations from universities across the globe)
• Georgia Tech 2020 College of Computing Dissertation Award 2020
(One of 4 awardees from nominations from across College of Computing at Georgia Tech)
• Georgia Tech 2020 Sigma Xi Best Ph.D. Thesis Award 2020
(One of 10 awardees from nominations from across all schools at Georgia Tech)
• Canada CIFAR AI Chair Award 2019
(One of 34 awardees from nominations from across Canada)
• Google Fellowship in Machine Perception, Speech Technology and Computer Vision (declined) 2019
(One of 54 awardees from PhD applicants from all over the world)
• Facebook Fellowship in Computer Vision (declined) 2019
(One of 21 awardees from more than 900 PhD applicants from all over the world)
• Foley Scholar Finalist 2018
(One of 8 finalists of the 2018 Foley Scholars Awards – an annual fellowship awarded to outstanding
students involved in exceptional research at the GVU center at Georgia Tech)
• Rising Stars in EECS 2018
(An Academic Career Workshop for Women, hosted at MIT)
(One of 60+ selected candidates with funded accommodation and travel)
• NVIDIA Graduate Fellowship 2018
(One of 11 awardees from more than 120 applicants from a host of countries)
• Invitation to Women in Research Lean In event 2018
(An annual event bringing together outstanding computer science and engineering Ph.D. students and
post-docs, organized by Facebook)
• Outstanding Reviewer award, NIPS 2017
(Awarded to ∼3.6% of reviewers)
• Travel award, Women in Machine Learning Workshop 2017
• Finalist, Microsoft Research PhD Fellowship 2017
• Finalist, Adobe Research PhD Fellowship 2017
• Outstanding Reviewer award, CVPR 2017
(Awarded to ∼8.5% of reviewers)
• Travel award, Women in Computer Vision Workshop, CVPR 2017
• Travel award, Women in Machine Learning Workshop 2016
• Best Discussion Participant Award, Advanced Computer Vision Course, Virginia Tech 2016
• Best Poster award, Object Understanding for Interaction Workshop, ICCV 2015
(Awarded for VQA: Visual Question Answering)
• Travel award, Women in Computer Vision Workshop, CVPR 2015
• Institute Silver Medal, IIT Gandhinagar 2014
(Awarded for second highest cumulative performance index in the batch (∼45 students))
• Scholarship for academic excellence, IIT Gandhinagar 2011, 2012
(Awarded to branch topper – 1 out of ∼45 students)
• Dean’s List for academic excellence, IIT Gandhinagar 2011, 2012, 2013, 2014
• Among top 0.76% of students in the IIT-JEE examination 2010
(Out of ∼0.4 million students who appeared for the examination)
Teaching
• IFT 6765 (UdeM) - Links between Computer Vision and Language Winter 2022, 2023, 2024
• IFT 6135 (UdeM) - Representation Learning Fall 2021, 2022, 2023, 2024
Research Group
PhD students
• Oscar Manas Fall 2021 - Present
• Le Zhang Fall 2023 - Present
• Sarvjeet Singh Ghotra Fall 2023 - Summer 2024 (change of supervisor)
(co-supervised with Aaron Courville)
• Qian Yang Fall 2023 - Present
• Kanishk Jain Fall 2023 - Present
• Rabiul Awal Fall 2024 - Present
MSc students
• Saba Ahmadi Winter 2022 - Dec 2023 (graduated; started internship with me)
• Le Zhang Fall 2022 - Summer 2023 (fast-tracked to PhD program)
• Sarvjeet Singh Ghotra Fall 2022 - Summer 2023 (fast-tracked to PhD program)
• Shravan Nayak Fall 2023 - Present
• Ankur Sikarwar Fall 2024 - Present
Research interns
• Oscar Manas Summer 2021 (started PhD with me)
• Rabiul Awal Winter 2023 - Summer 2024 (started PhD with me)
• Saba Ahmadi Winter 2024 - Present
Publications
• O. Manas, P. Astolfi, M. Hall, C. Ross, J. Urbanek, A. Williams, A. Agrawal, A. Romero-Soriano, M.
Drozdzal. Improving Text-to-Image Consistency via Automatic Prompt Optimization. Accepted to the
Transactions on Machine Learning Research (TMLR), 2024. Featured Certification.
• R. Awal, L. Zhang, A. Agrawal. Investigating Prompting Techniques for Zero- and Few-Shot Visual
Question Answering. Accepted to the Workshop on Multimodal Algorithmic Reasoning, NeurIPS, 2024.
• R. Awal, S. Ahmadi, L. Zhang, A. Agrawal. VisMin: Visual Minimal-Change Understanding.
Accepted to the Conference on Neural Information Processing Systems (NeurIPS), 2024.
• S. Nayak, K. Jain, R. Awal, S. Reddy, S. van Steenkiste, L. Anne Hendricks, K. Stanczak, A. Agrawal.
Benchmarking Vision Language Models for Cultural Understanding. Accepted to the Conference on
Empirical Methods in Natural Language Processing (EMNLP), 2024. Oral Presentation.
• Q. Yang, W. Yan, A. Agrawal. Decompose and Compare Consistency: Measuring VLMs’ Answer
Reliability via Task-Decomposition Consistency Comparison. Accepted to the Conference on Empirical
Methods in Natural Language Processing (EMNLP), 2024.
• F. Bordes et al. An Introduction to Vision-Language Modeling. arXiv preprint, arXiv:2405.17247, 2024.
• L. Zhang, R. Awal, A. Agrawal. Contrasting intra-modal and ranking cross-modal hard negatives to
enhance visio-linguistic compositional understanding. In the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2024.
• S. Ahmadi, A. Agrawal. An Examination of the Robustness of Reference-Free Image Captioning
Evaluation Metrics. In the Findings of the Association for Computational Linguistics: EACL, 2024.
• O. Manas, B. Krojer, A. Agrawal. Improving Automatic VQA Evaluation Using Large Language
Models. In the 38th Annual AAAI Conference on Artificial Intelligence, 2024.
• L. Zhang, Y. Wu, F. Mo, J. Nie, A. Agrawal. MoqaGPT: Zero-Shot Multi-modal Open-domain
Question Answering with Large Language Model. In the Findings of the Association for Computational
Linguistics (EMNLP), 2023.
• E. Bugliarello, L. Sartran, A. Agrawal, L. Anne Hendricks, A. Nematzadeh. Measuring Progress in
Fine-grained Vision-and-Language Understanding. In the Association for Computational Linguistics
(ACL), 2023.
• O. Manas, P. Rodrguez*, S. Ahmadi*, A. Nematzadeh, Y. Goyal, A. Agrawal. MAPL:
Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot
Prompting. In the European Chapter of the Association for Computational Linguistics (EACL), 2023.
Oral Presentation.
• A. Agrawal*, I. Kajic*, E. Bugliarello*, E. Davoodi, A. Gergely, P. Blunsom, A. Nematzadeh*.
Rethinking Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution
Generalization. In the Findings of the European Chapter of the Association for Computational
Linguistics (EACL), 2023.
• A. Agrawal. Visual Question Answering and Beyond. PhD Dissertation, 2019.
• A. Agrawal, M. Malinowski, F. Hill, A. Eslami, O. Vinyals and T. Kulkarni. Generating Diverse
Programs with Instruction Conditioned Reinforced Adversarial Learning. arXiv preprint,
arXiv:1812.00898, 2018.
• S. Ramakrishnan, A. Agrawal and S. Lee. Overcoming Language Priors in Visual Question Answering
with Adversarial Regularization. In Neural Information Processing Systems (NIPS), 2018.
• A. Agrawal, D. Batra, D. Parikh and A. Kembhavi. Don’t Just Assume; Look and Answer:
Overcoming Priors for Visual Question Answering. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2018.
• G. Christie*, A. Laddha*, A. Agrawal, S. Antol, Y. Goyal, K. Kochersberger and D. Batra. Resolving
Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution
in Captioned Scenes. In the journal of Computer Vision and Image Understanding (CVIU), 2017.
• A. Agrawal, A. Kembhavi, D. Batra and D. Parikh. C-VQA: A Compositional Split of the Visual
Question Answering (VQA) v1.0 Dataset. arXiv preprint, arXiv:1704.08243, 2017.
• A. Agrawal*, J. Lu*, S. Antol*, M. Mitchell, C. L. Zitnick, D. Parikh and D. Batra. VQA: Visual
Question Answering. In Special Issue on Combined Image and Language Understanding, International
Journal of Computer Vision (IJCV), 2017.
• A. Agrawal, D. Batra and D. Parikh. Analyzing the Behavior of Visual Question Answering Models.
In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
• C. L. Zitnick, A. Agrawal, S. Antol, M. Mitchell, D. Batra and D. Parikh. Measuring Machine
Intelligence Through Visual Question Answering. In AI Magazine, 2016.
• G. Christie*, A. Laddha*, A. Agrawal, S. Antol, Y. Goyal, K. Kochersberger and D. Batra. Resolving
Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution
in Captioned Scenes. In Conference on Empirical Methods in Natural Language Processing (EMNLP),
2016.
• T. Huang, F. Ferraro, N. Mostafazadeh, I. Misra, A. Agrawal, J. Devlin, R. Girshick, X. He, P. Kohli,
D. Batra, C.L. Zitnick, D. Parikh, L. Vanderwende, M. Galley and M. Mitchell. Visual Storytelling. In
North American Chapter of the Association for Computational Linguistics (NAACL), 2016.
• S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick and D. Parikh. VQA: Visual
Question Answering. In International Conference on Computer Vision (ICCV), 2015.
• A. Agrawal and S. Raman. A Novel LBP Based Operator for Tone Mapping HDR Images. In
International Conference on Signal Processing and Communications (SPCOM-2014) and IEEE Xplore,
2014.
• R. Das, A. Agrawal, M. Upton and E.J. Seibel. Optically clearing tissue as an initial step for 3D
imaging of core biopsies to diagnose pancreatic cancer. In SPIE BiOS, pp. 89410N-89410N.
International Society for Optics and Photonics, 2014.
Workshop Paper Presentations
• Enhancing Multi-Agent Multi-Modal Collaboration with Fine-Grained Reward Modeling.

Q. Yang, W. Yan, A. Agrawal
To be presented at the workshop on Adaptive Foundation Models, NeurIPS 2024.
• Controlling Multimodal LLMs via Reward-guided Decoding.
O. Manas, P. D’Oro, K. Sinha, A. Romero-Soriano, M. Drozdzal, A. Agrawal
• Visual Language Alignment Tuning.
L. Zhang, Q. Yang, A. Agrawal
• CTRL-O: Language-Controllable Object-Centric Visual Representation Learning.
A. Rajiv Didolkar, A. Zadaianchuk, R. Awal, M. Seitzer, E. Gavves, A. Agrawal
• Investigating Reliable Question Decomposition for Vision-Language Tasks.
Q. Yang, W. Yan, A. Agrawal
Presented at the 3rd workshop on Computer Vision in the Wild, CVPR 2024.
• CulturalVQA: Benchmarking Vision Language Models for Cultural Understanding.
S. Nayak, K. Jain, K. Stanczak, R. Awal, A. Agrawal
Presented at the 3rd workshop on Computer Vision in the Wild, CVPR 2024, and at the Montreal AI
Symposium, 2024 (Oral Presentation).
• An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics.
S. Ahmadi, A. Agrawal
Presented at the workshop on Open-Domain Reasoning Under Multi-Modal Settings (O-DRUM), CVPR
2023.
• Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering.
R. Awal, L. Zhang, A. Agrawal
2023, and accepted at the workshop on Prompting in Vision, CVPR 2023 (Oral Presentation).
• Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic
Fine-Grained Understanding.
L. Zhang, R. Awal, A. Agrawal
2023 (Spotlight Presentation).
• MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot
Prompting.
O. Manas, P. Rodrguez*, S. Ahmadi*, A. Nematzadeh, Y. Goyal, A. Agrawal
2023.
• Generating Diverse Programs with Instruction Conditioned Reinforced Adversarial Learning
A. Agrawal, M. Malinowski, F. Hill, A. Eslami, O. Vinyals, T. Kulkarni
Presented at Visually-Grounded Interaction and Language (Spotlight Presentation), and Learning by
Instruction workshops, NIPS 2018.
• Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
A. Agrawal, D. Batra, D. Parikh, A. Kembhavi
Presented at Visually-Grounded Interaction and Language workshop, NIPS 2017; at Women in Machine
Learning workshop 2017; at Scene Understanding (Spotlight Presentation), Language and Vision
(Spotlight Presentation), Women in Computer Vision, and VQA Challenge workshops, CVPR 2017.
• Analyzing the Behavior of Visual Question Answering Models
A. Agrawal, D.Batra, D.Parikh
Presented at Mid-Atlantic Computer Vision workshop 2016, Deep Learning Summer School 2016 (Oral
Presentation) and Women in Machine Learning workshop, 2016.
• VQA: Visual Question Answering
S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick and D. Parikh
Presented at the 1st Workshop on Object Understanding for Interaction, ICCV 2015 (Best Poster
Award).
• Holistic Scene Understanding via Multiple Structured Hypotheses from Perception Modules
G. Christie*, A. Laddha*, A. Agrawal, S. Antol, Y. Goyal, K. Kochersberger and D. Batra
Presented at Scene Understanding, Language and Vision and Women in Computer Vision workshops,
CVPR 2015.
Invited Talks and Panels
Advancing multimodal vision-language learning

• Area Chair Workshop @ CVPR 2024
• Understanding LLM Understanding Summer School [Video] 2024
• CIFAR AICan Retreat 2024
• World Summit AI Americas [Video] 2023
• Text in Everything workshop @ ECCV 2022
• Tea Talks @ Mila [Video] 2022
Visual-Language Learning
• Tutorial on “Visual Recognition Beyond the Comfort Zone: Adapting to Unseen Concepts on the Fly”
at ICCV [Video] 2023
Career Panel
• CIFAR Deep Learning + Reinforcement Learning Summer School 2023
Roundtable lead: Robustness
• Montreal AI Symposium (MAIS) 2022
Are current Vision-language models learning to solve the task or merely learning to solve the
dataset
• Tea Talks @ Mila [Video] 2022
Vision and Language: Progress and Challenges
• Research Week with Google (India) 2022
• Microsoft Research Montreal 2021
• Montreal AI Symposium 2021
• Mila Techaide AI Conference 2021
Panel: Paths to Leadership in Data Science
• Women in Data Science Conference (WiDS) [Video] 2021
My journey with Artificial Intelligence so far

• Women in Computer Vision Workshop, CVPR 2021
• Alumni MasterClass @ Indian Institute of Technology (IIT) Gandhinagar [Video] 2020
Visual Grounding in Visual Question Answering
• GeCKo Symposium [Video] 2020
Visual Question Answering: Progress and Challenges

• University of Oxford (Women in CS Seminar) 2020
Visual Question Answering and Beyond
• Vector Institute for Artificial Intelligence 2018
• Montreal Institute of Learning Algorithms 2018
• Facebook AI Research Lab, Montreal 2018
• Google Brain, Montreal 2018
• Emory University (CS Seminar) 2018
• Stanford University (NLP Seminar) 2018
• University of Massachusetts, Amherst (Vision and Graphics Group) 2018
• Massachusetts Institute of Technology (Computer Vision Group) 2018
• Boston University (Computer Vision and Learning Group) 2018
• University of California, Berkeley (Trevor Darrell’s Group) 2018
• Cornell Tech (Pixel Cafe) 2018
• New York University (Machine Learning for Language Group) 2018
• Princeton University (PIXL Lunch) 2018
Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
• Women in Computer Vision workshop, CVPR [Video (32:33 - 56:08)] 2018
Winner announcements, analysis of results
• VQA Challenge Workshop, CVPR [Video (47:22 - 1:07:09)] 2017
Analyzing the Behavior of Deep Visual Question Answering Models
• Deep Learning Summer School, Montreal [Video] 2016
Overview of challenge, winner announcements, analysis of results
• VQA Challenge Workshop, CVPR [Video] 2016
VQA: Visual Question Answering

• GPU Technology Conference (GTC) [Video] 2016
Research Internships
DeepMind, London, UK May 2018 - October 2018

Collaborators: Mateusz Malinowski, Felix Hill, Ali Eslami, Oriol Vinyals, Tejas Kulkarni
• Worked on generating diverse programs with instruction conditioned reinforced adversarial learning.
• Demonstrated the efficacy of our approach on two domains – 1) drawing MNIST digits with a paint
software conditioned on instructions and (2) constructing a scene in a 3D editor that satisfies a certain
instruction.
Microsoft Research, Redmond, USA May 2017 - August 2017

Collaborators: Xiaodong He, Jianfeng Gao
• Worked on developing a novel model for Visual Dialog with the goal of improved modeling of both visual
and textual history.
• Improved the then state-of-art performance on the Visual Dialog task.
Allen Institute for Artificial Intelligence, Seattle, USA January 2017 - May 2017
Collaborators: Dhruv Batra, Devi Parikh, Aniruddha Kembhavi
• Proposed a new setting for VQA and created a new split of the VQA v1.0 dataset – Visual Question
Answering under Changing Priors v1.0 (VQA-CP v1.0)
• Evaluated the performance of existing VQA models on VQA-CP v1.0.
• Developed Grounded VQA (GVQA) – a novel model that outperforms existing VQA models on
VQA-CP v1.0.
Microsoft Research, Redmond, USA May 2015 - August 2015
Collaborators: Xiaodong He, Margaret Mitchell, Dhruv Batra, Devi Parikh, Larry Zitnick
• Played an active role in releasing the VQA dataset to the public. Developed and released the VQA API
and evaluation code (https://github.com/VT-vision-lab/VQA).
• Implemented Deep Structured Semantic Model (DSSM) based initial approaches to solve VQA.
• Trained and tested DSSM models for image sequences and stories, for an ongoing project (Visual
Storytelling) at MSR.
University of Washington, Seattle, USA May 2013 - July 2013
Collaborators: Ronnie Das, Eric Seibel
• Conducted an investigation comparing degree of optical clearance in pancreatic tissue using glycerol and
FocusClear against formalin control.
• Implemented filtered backprojection algorithm for 3D reconstruction of OPTM images using MATLAB
and VolView.
Indian Institute of Technology (IIT), Gandhinagar, India May 2012 - July 2012
Collaborator: Ragavan K.
• Programmed FPGA in Verilog to generate firing pulses for SCRs of the rectifier for open loop speed
control of DC motor.
• Developed Verilog code for automatic regulation of SCR’s firing angle for closed loop speed control of
DC motor.
Service Roles and Academic Activities
• Area Chair @ NeurIPS 2024, ECCV 2024, CVPR 2024, NeurIPS 2023, ICCV 2023, NAACL 2021
• Co-organizer, Workshop on Green Foundation Models @ ECCV 2024
• Co-organizer, tutorial on Visual Recognition Beyond the Comfort Zone: Adapting to Unseen Concepts
on the Fly @ ICCV 2023
• Co-organizer, CIFAR Deep Learning + Reinforcement Learning Summer School 2023
• Co-organizer, tutorial on Vision-Language Pretraining: Current Trends and the Future @ ACL 2022
• Co-organizer, annual VQA Challenge and Workshop @ CVPR 2016 - 2021
• Reviewer, CVPR 2016, ECCV 2016, IJCV 2017, ICLR 2017, CVPR 2017, ICCV 2017, NIPS 2017, ICLR
2018, CVPR 2018, NeurIPS 2019, ACL 2020, CVPR 2020, ECCV 2020, ACL 2023
• Graduate Teaching Assistant (Virginia Tech) Fall 2014, Spring 2015
Course: Introduction to Data Structures and Algorithms
• Teaching Assistant (IIT Gandhinagar) Fall 2013
Course: Computing (Python and C)
Press Coverage
Visual Storytelling
• New Artificial Intelligence Can Tell Stories Based on Photos. Live Science.
• Will Artificial Intelligence Win the Caption Contest? MIT Technology Review.
• Teaching computers to describe images as people would. Microsoft Blog.
• Microsoft researchers are teaching AI to write stories about groups of photos. Venture Beat.
Visual Question Answering (VQA)
• A Giant Leap for Machine Kind; When Robots Can See. Interview with WVTF Radio IQ.
• Whats in This Picture? AI Becomes as Smart as a Toddler. Bloomberg.

CurriculumVitae AishwaryaAgrawal

Uploaded by

Copyright:

Available Formats

CurriculumVitae AishwaryaAgrawal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CurriculumVitae AishwaryaAgrawal

Uploaded by

Copyright:

Available Formats

Aishwarya Agrawal

• University of Montreal, QC, Canada Start: December 2020

• Georgia Institute of Technology, Atlanta, USA Aug 2017 - Aug 2019

Workshop Paper Presentations

• Enhancing Multi-Agent Multi-Modal Collaboration with Fine-Grained Reward Modeling.

Advancing multimodal vision-language learning

My journey with Artificial Intelligence so far

Visual Question Answering: Progress and Challenges

VQA: Visual Question Answering

DeepMind, London, UK May 2018 - October 2018

Microsoft Research, Redmond, USA May 2017 - August 2017

You might also like