Sumit Bhatia

Hello, World!

Sumit Bhatia, PhD

Senior Machine Learning Research Scientist

Media & Data Science Research Lab, Adobe Inc.

Adjunct Faculty, IIIT-Delhi

Information Retrieval Agentic Systems RAG and Conversational Systems Large Language Models Knowledge Graphs

About

I am a Senior Machine Learning Research Scientist at the Media and Data Science Research Lab at Adobe Inc. My primary research interests are in information retrieval, natural language processing, semantic web, and knowledge graphs. My recent work also spans large language models, instruction tuning, and multimodal retrieval.

I am always on the lookout for students interested in working on research problems. Have a look at my publications page and feel free to get in touch if our interests align.

Previously, I was a Research Staff Member at IBM's India Research Laboratory in New Delhi. Before that, I was part of the Watson group at IBM Almaden Research Centre, leading analytic efforts in the Watson Knowledge Graph team. I did my post-doctoral research at Xerox Research Centre Webster in upstate NY. I obtained my PhD in Computer Science and Engineering from The Pennsylvania State University, advised by Dr. Prasenjit Mitra. I completed my undergraduate studies at IIT Roorkee.

Publications

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders.

2026
  • C55 Conference
    Building, Serving, and Growing a Conversational AI Assistant for Enterprise. The 2026 ACM SIGMOD/PODS Conference (SIGMOD Industry Track), 2026. Sumit Bhatia, Vidit Bhatia, Uttaran Bhattacharya, Victor Soares Bursztyn, Liqun Chen, Xiang Chen, Sally Fang, Horia Galatanu, Manas Garg, Shaddy Garg, Rachel Hanessian, Alex Hodorogea, Shun Jiang, Bhumir Jhaveri, Nishant Kapoor, Yongsung Kim, Eunyee Koh, Namita Krishnan, Yunyao Li, Zifan Liu, Akash Maharaj, Tung Mai, Saayan Mitra, Vaishnavi Muppala, Ramasuri Narayanam, Soumyabrata Pal, Kun Qian, Sajjadur Rahman, Ken Russell, Bikas Saha, Siddhartha Sahai, Som Satapathy, Rohan Saxena, Sai Sree Harsha, Anirudh Sureshan, Shashank Tandon, Mehrab Tanjim, Saurabh Tripathy, Anirudh Verma, Fei Wu, Tong Yu
  • C54 Conference
    Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026. Prachi J, Sumit Bhatia, and Srikanta Bedathur.
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007

Work Experience

Senior Machine Learning Research Scientist
May 2021 — Present

Media and Data Science Research Lab.

Senior Research Scientist
Feb. 2017 — Mar. 2021

Member of the Knowledge and Data Engineering team.

Watson Research Scientist
July 2014 — Feb. 2017

Member of the Watson Discovery Analytics Service. Developed query understanding and disambiguation APIs for the Watson analytics service.

Post-doctoral Researcher
July 2013 — July 2014

Developed a social media analytics platform for consumer demographic prediction (age, gender, marital and parental status, personality type, technical expertise, etc.).

Feb. 2013 — June 2013

Developed models for detecting customer complaints via analysis of user tweets. Also developed an algorithm for extracting relevant portions of text from long social media documents.

Research Intern
May 2012 — Aug. 2012

Designed and developed "IM an Expert" — a real-time social question answering service allowing users to find answers by connecting with other knowledgeable users on Facebook.

Research Intern
Sep. 2011 — Dec. 2011

Analyzed Yandex's query logs to study user query intents and proposed a hierarchy of query diversification requirements.

Research Intern
May 2011 — Sep. 2011

Developed a scalable algorithm for verbose patent query retrieval. Achieved 700% faster response times while maintaining the quality of search results.

Research Intern
May 2010 — Aug. 2010

Developed a query-log-oblivious probabilistic query suggestion mechanism that generates suggestions directly from the corpus.

Tutorials

Knowledge Graphs: In Theory and Practice

We are transitioning from the era of Big Data to Big Knowledge, and semantic knowledge bases such as knowledge graphs play an important role in this transition. This is evident from the increased investments in Knowledge Graph research and development by major industrial players resulting in widely used systems such as IBM's Watson, Google's entity search, Apple's Siri, and Amazon's product graph.

Knowledge Graphs can be constructed either manually (facts authored by humans) or automatically (facts extracted from text using Machine Learning tools). Through this tutorial, we cover state-of-the-art approaches in Knowledge Graph Construction from various types of data using both manual and automated methods, review applications that benefit from the structure and semantics offered by knowledge graphs, and present case studies describing experiences in construction of enterprise Knowledge Graphs.

Past Editions

Presenters

Contact

Email Sumit.Bhatia [at] adobe.com

I am always open to discussing research collaborations, student projects, and speaking opportunities. Feel free to reach out!