Hello, World!

Sumit Bhatia, PhD

Senior Machine Learning Research Scientist

Media & Data Science Research Lab, Adobe Inc.

Adjunct Faculty, IIIT-Delhi

Information Retrieval Agentic Systems RAG and Conversational Systems Large Language Models Knowledge Graphs

About

I lead Information Retrieval and NLP Research at Adobe's Media and Data Science Research Lab, managing a team pushing the boundaries of information retrieval, RAG and conversational AI, agentic systems, and knowledge graphs. My recent work spans large language models, instruction tuning, and multimodal retrieval.

I am always on the lookout for students interested in working on research problems. Have a look at my publications page and feel free to get in touch if our interests align.

Previously, I was a Research Staff Member at IBM's India Research Laboratory in New Delhi. Before that, I was part of the Watson group at IBM Almaden Research Centre, leading analytic efforts in the Watson Knowledge Graph team. I did my post-doctoral research at Xerox Research Centre, Webster in upstate NY. I obtained my PhD in Computer Science and Engineering from The Pennsylvania State University, advised by Dr. Prasenjit Mitra. I completed my undergraduate studies at IIT Roorkee.

Publications

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders.

2026

C57 Conference
Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck?. Findings of the 64^th Annual Meeting of the Association for Computational Linguistics (ACL), 2026. H S V N S Kowndinya Renduchintala and Sumit Bhatia.
C56 Conference
Enhancing Enterprise Assistant Responses with Rich Multimodal Artifacts from Product Documentation. The 49^th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR Industry Track), 2026. Sohan Patnaik, Sai Sree Harsha, Kowndinya Renduchintala, Milan Aggarwal, Sumit Bhatia and Yunyao Li
C55 Conference
Building, Serving, and Growing a Conversational AI Assistant for Enterprise. The 2026 ACM SIGMOD/PODS Conference (SIGMOD Industry Track), 2026. Sumit Bhatia, Vidit Bhatia, Uttaran Bhattacharya, Victor Soares Bursztyn, Liqun Chen, Xiang Chen, Sally Fang, Horia Galatanu, Manas Garg, Shaddy Garg, Rachel Hanessian, Alex Hodorogea, Shun Jiang, Bhumir Jhaveri, Nishant Kapoor, Yongsung Kim, Eunyee Koh, Namita Krishnan, Yunyao Li, Zifan Liu, Akash Maharaj, Tung Mai, Saayan Mitra, Vaishnavi Muppala, Ramasuri Narayanam, Soumyabrata Pal, Kun Qian, Sajjadur Rahman, Ken Russell, Bikas Saha, Siddhartha Sahai, Som Satapathy, Rohan Saxena, Sai Sree Harsha, Anirudh Sureshan, Shashank Tandon, Mehrab Tanjim, Saurabh Tripathy, Anirudh Verma, Fei Wu, Tong Yu
C54 Conference
Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026. Prachi J, Sumit Bhatia, and Srikanta Bedathur.

2025

J11 Journal
On the Effect of Instruction Tuning Loss on Generalization. Transactions of the Association for Computational Linguistics (TACL), Vol. 13: 1360–1380, 2025. Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty.
J10 Journal
Benchmarking Neuro-Symbolic Description Logic Reasoners: Existing Challenges and A Way Forward. Neurosymbolic Artificial Intelligence, IOS Press Journal, Vol. 1, 2025. Gunjan Singh, Riccardo Tommasini, Sumit Bhatia, Raghava Mutharaju.
J9 Journal
Dialogue Agents 101: A Beginner's Guide to Critical Ingredients for Designing Effective Conversational Systems. Natural Language Processing, Cambridge University Press; 31(3):874–912, 2025. Shivani Kumar, Sumit Bhatia, Milan Aggarwal, Tanmoy Chakraborty.
C53 Conference
Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts. 14th International Joint Conference on NLP & 4th Asia-Pacific Chapter of ACL (IJCNLP-AACL), 2025. Raavi Gupta, Pranav Hari Panicker, Sumit Bhatia, and Ganesh Ramakrishnan.
C52 Conference
Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning. 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025. Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy.
C51 Conference
Answering Multimodal Exclusion Queries with Lightweight Sparse Disentangled Representations. 11th ACM SIGIR International Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR), 2025. Prachi J, Sumit Bhatia, and Srikanta Bedathur.
C50 Conference
Exploring the Role of Diversity in Example Selection for In-Context Learning. 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025. Janak Kapuriya, Manit Kaushik, Debasis Ganguly and Sumit Bhatia.
C49 Conference
It Helps to Take a Second Opinion: Teaching Smaller LLMs To Deliberate Mutually via Selective Rationale Optimisation. Thirteenth International Conference on Learning Representations (ICLR), 2025. Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy.

2024

C48 Conference
POSIX: A Prompt Sensitivity Index For Large Language Models. Findings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty.
C47 Conference
Thinking Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models. 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. Shaz Furniturewala, Surgan Jandial, Abhinav Java, Pragyan Banerjee, Simra Shahid, Sumit Bhatia, Kokil Jaidka.
C46 Conference
SMART: Submodular Data Mixture Strategy for Instruction Tuning. Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024. H S V N S Kowndinya Renduchintala, Sumit Bhatia, Ganesh Ramakrishnan.
C45 Conference
CABINET: Content Relevance-based Noise Reduction for Table Question Answering. Twelfth International Conference on Learning Representations (ICLR), 2024. ★ Spotlight Sohan Patnaik, Heril Changwal, Milan Aggarwal, Sumit Bhatia, Yaman Kumar, Balaji Krishnamurthy.
C44 Conference
All should be equal in the eyes of LMs: Counterfactually Aware Fair Text Generation. Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024. Pragyan Banerjee, Abhinav Java, Surgan Jandial, Simra Shahid, Shaz Furniturewala, Balaji Krishnamurthy, Sumit Bhatia.
C43 Conference
GenACT: An Ontology based Temporal Web Data Generator. 43rd International Conference on Conceptual Modeling (ER), 2024. Gunjan Singh, Udit Arora, Shashikant Kumar, Riccardo Tommasini, Pieter Bonte, Sumit Bhatia and Raghava Mutharaju.

2023

J8 Journal
Neuro-Symbolic RDF and Description Logic Reasoners: The State-Of-The-Art and Challenges. Compendium of Neurosymbolic Artificial Intelligence, pp 29–63, 2023. Gunjan Singh, Sumit Bhatia, Raghava Mutharaju.
C42 Conference
INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language Models. Findings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings), 2023. H S V N S Kowndinya Renduchintala, Krishnateja Killamsetty, Sumit Bhatia, Milan Aggarwal, Ganesh Ramakrishnan, Rishabh K Iyer, Balaji Krishnamurthy.
C41 Conference
Explain Like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation. 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023. Michael Llordes, Debasis Ganguly, Sumit Bhatia and Chirag Agarwal.
C40 Conference
HyHTM: Hyperbolic Geometry-based Hierarchical Topic Model. Findings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023. Simra Shahid, Tanay Anand, Nikitha Srikanth, Sumit Bhatia, Balaji Krishnamurthy and Nikaash Puri.
W11 Workshop
Graph-Guided Unsupervised Knowledge Identification for Dialogue Agents. 3rd Workshop on Document-Grounded Dialog and Conversational QA (Doc2Dial) at ACL, 2023. Shrinivas Khiste, Tathagata Raha, Milan Aggarwal, Sumit Bhatia, and Simra Shahid.

2022

J7 Journal
Information asymmetry in Wikipedia across different languages: A statistical analysis. Journal of the Association for Information Science and Technology (JASIST), 73(3), 347–361, 2022. Dwaipayan Roy, Sumit Bhatia and Prateek Jain.
C39 Conference
CyCLIP: Cyclic Contrastive Language-Image Pretraining. Annual Conference on Neural Information Processing Systems (NeurIPS), 2022. ★ Oral/Spotlight Shashank Goel, Hritik Bansal, Sumit Bhatia, Ryan A. Rossi, Vishwa Vinay, Aditya Grover.
C38 Conference
LM-CORE: Language Models with Contextually Relevant External Knowledge. Findings of the Association for Computational Linguistics (NAACL-HLT), 2022. Jivat Neet Kaur, Sumit Bhatia, Milan Aggarwal, Rachit Bansal, and Balaji Krishnamurthy.
C37 Conference
CoSe-Co: Text Conditioned Generative CommonSense Contextualizer. 2022 Conference of the North American Chapter of ACL: Human Language Technologies (NAACL-HLT), 2022. Rachit Bansal, Milan Aggarwal, Sumit Bhatia, Jivat Neet Kaur, and Balaji Krishnamurthy.
C36 Conference
Why Did You Not Compare With That? Finding Papers for Use as Baseline. 44th European Conference on Information Retrieval (ECIR), 2022. Manjot Bedi, Tanisha Pandey, Sumit Bhatia and Tanmoy Chakraborty.

2021

W10 Workshop
No Need to Know Everything! Efficiently Augmenting Language Models With External Knowledge. Workshop on Commonsense Reasoning and Knowledge Bases (CSKB) at AKBC, 2021. Jivat Neet Kaur, Sumit Bhatia, Milan Aggarwal, Rachit Bansal, and Balaji Krishnamurthy.
W9 Workshop
CoSe-Co: Sentence Conditioned Generative CommonSense Contextualizer for Language Models. Workshop on Commonsense Reasoning and Knowledge Bases (CSKB) at AKBC, 2021. Rachit Bansal, Milan Aggarwal, Sumit Bhatia, Jivat Neet Kaur, Balaji Krishnamurthy.
C35 Conference
EmEL++: Embeddings for EL++ Description Logic. AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering (AAAI MAKE), 2021. Sutapa Mondal, Sumit Bhatia and Raghava Mutharaju.
C34 Conference
Neuro-Symbolic Techniques for Description Logic Reasoning. Thirty-Fifth AAAI Conference on Artificial Intelligence – Student Abstracts (AAAI), 2021. Gunjan Singh, Sutapa Mondal, Sumit Bhatia and Raghava Mutharaju.
C33 Conference
SERC: Syntactic and Semantic Sequence based Event Relation Classification. 33rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2021. Kritika Venkatachalam, Raghava Mutharaju, and Sumit Bhatia.

2020

C32 Conference
OWL2Bench: A Benchmark for OWL 2 Reasoners. 19th International Semantic Web Conference (ISWC), 2020. Gunjan Singh, Sumit Bhatia and Raghava Mutharaju.
C31 Conference
Schema Aware Semantic Reasoning for Interpreting Natural Language Queries in Enterprise Settings. 28th International Conference on Computational Linguistics (COLING), 2020. Jaydeep Sen, Tanaya Babtiwale, Kanishk Saxena, Yash Butala, Sumit Bhatia, Karthik Sankaranarayanan.
C30 Conference
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages. 12th International Conference on Language Resources and Evaluation (LREC), 2020. Dwaipayan Roy, Sumit Bhatia, and Prateek Dwivedi.

2019

C29 Conference
A Persistent Homology Perspective to the Link Prediction Problem. 8th International Conference on Complex Networks and their Applications (Complex Networks), 2019. Sumit Bhatia, Bapi Chatterjee, Deepak Nathani and Manohar Kaul.
C28 Conference
Towards a Concurrent Approximate Description Logic Reasoner. 18th International Semantic Web Conference (ISWC), 2019. ★ Best Poster Nomination Raj Kamal Yadav, Gunjan Singh, Raghava Mutharaju, and Sumit Bhatia.
C27 Conference
Selecting Discriminative Terms for Relevance Model. 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2019. Dwaipayan Roy, Sumit Bhatia, and Mandar Mitra.
C26 Conference
Go Wide, Go Deep: Quantifying the Impact of Scientific Papers through Influence Dispersion Trees. ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2019. ★ Best Student Paper Dattatreya Mohapatra, Abhishek Maiti, Sumit Bhatia and Tanmoy Chakraborty.
B1 Book Chapter
Entity Linking in Enterprise Search: Combining Textual and Structural Information. Book Chapter in: P D., Jurek-Loughrey A. (eds) Linking and Mining Heterogeneous and Multi-view Data. Springer, Cham, 2019. Sumit Bhatia.

2018

C25 Conference
That's Interesting, Tell Me More! Finding Descriptive Support Passages for Explaining Knowledge Graph Relationships. 17th International Semantic Web Conference (ISWC), 2018. ★ Best Paper Award Sumit Bhatia, Purusharth Dwivedi and Avneet Kaur.
C24 Conference
Know Thy Neighbors, and More! Studying the Role of Context in Entity Recommendation. 29th ACM Conference on Hypertext and Social Media (HT), 2018. ★ Best Paper Nominee Sumit Bhatia and Harit Vishwakarma.
C23 Conference
Bernoulli Embeddings for Graphs. Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018. Vinith Misra and Sumit Bhatia.
C22 Conference
Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance. 27th International Conference on Information and Knowledge Management (CIKM), 2018. Dwaipayan Roy, Debasis Ganguly, Sumit Bhatia, Srikanta Bedathur and Mandar Mitra.
C21 Workshop
Topic-Specific Sentiment Analysis Can Help Identify Political Ideology. 9th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) at EMNLP, 2018. Sumit Bhatia and Deepak P.
C20 Conference
Scalable Reasoning Infrastructure for Large Scale Industrial Applications. 17th International Semantic Web Conference (ISWC), 2018 (Poster). Hima Karanam, Sumit Neelam, Udit Sharma, Sumit Bhatia, et al.

2017

C19 Conference
Tools and Infrastructure for Supporting Enterprise Knowledge Graphs. Advanced Data Mining and Applications, 2017. Sumit Bhatia, Nidhi Rajshree, Anshu Jain and Nitish Aggarwal.

2016

J6 Journal
AlgorithmSeer: A System for Extracting and Searching for Algorithms in Scholarly Big Data. IEEE Transactions on Big Data, 2016. Suppawong Tuarob, Sumit Bhatia, Prasenjit Mitra and C. Lee Giles.
J5 Journal
A Picture Tells a Thousand Words — About You! User Interest Profiling from User Generated Visual Content. Signal Processing (SIGPRO), Vol. 124, July 2016, Special Issue on Big Data Meets Multimedia Analytics. Quanzeng You, Sumit Bhatia, and Jiebo Luo.
J4 Journal
Identifying the Role of Individual User Messages in an Online Discussion and its Applications in Thread Retrieval. Journal of the Association for Information Science and Technology (JASIST), 67(2): 276–288, 2016. Sumit Bhatia, Prakhar Biyani and Prasenjit Mitra.
C18 Conference
Connecting the Dots: Explaining Relationships Between Unconnected Entities in a Knowledge Graph. 13th Extended Semantic Web Conference (ESWC), 2016. Nitish Aggarwal, Sumit Bhatia and Vinith Misra.
C17 Conference
Separating Wheat From the Chaff — A Relationship Ranking Algorithm. 13th Extended Semantic Web Conference (ESWC), 2016. Sumit Bhatia, Alok Goel, Elizabeth Bowen and Anshu Jain.
C16 Conference
Context Sensitive Entity Linking of Search Queries For Enterprise Knowledge Graphs. 13th Extended Semantic Web Conference (ESWC), 2016. Sumit Bhatia and Anshu Jain.

2015

C15 Conference
Using Subjectivity Analysis to Improve Thread Retrieval in Online Forums. 37th European Conference on Information Retrieval (ECIR), 2015. Prakhar Biyani, Sumit Bhatia, Cornellia Caragea and Prasenjit Mitra.
C14 Conference
Predicting Future Scientific Discoveries Based on a Networked Analysis of the Past Literature. 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2015. Meenakshi Nagarajan, Angela Wilkins, Benjamin Bachman, et al.

2014

J3 Journal
Using Non-lexical Features For Identifying Factual and Opinionative Threads in Online Forums. Elsevier Knowledge-Based Systems (KBS), Vol. 69, Oct 2014, pp. 170–178. Prakhar Biyani, Sumit Bhatia, Cornellia Caragea and Prasenjit Mitra.
C13 Conference
Summarizing Online Forum Discussions – Can Dialog Acts of Individual Messages Help?. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. Sumit Bhatia, Prakhar Biyani and Prasenjit Mitra.
W8 Workshop
The eyes of the beholder: Gender prediction using images posted in Online Social Networks. SMDM'14: Workshop on Social Multimedia Data Mining at ICDM, 2014. Quanzeng You, Sumit Bhatia, Tong Sun and Jiebo Luo.
W7 Workshop
Feature Analysis for Computational Personality Recognition Using YouTube Personality Data set. WCPR'14: Workshop on Computational Personality Recognition at ACM Multimedia, 2014. Chandrima Sarkar, Sumit Bhatia, Juan Li and Arvind Agarwal.

2013

C12 Conference
Automatic Detection of Pseudo-codes in Scholarly Documents Using Machine Learning. 12th International Conference on Document Analysis and Recognition (ICDAR), 2013. Suppawong Tuarob, Sumit Bhatia, Prasenjit Mitra and C. Lee Giles.
W6 Workshop
Monitoring and Analyzing Customer Feedback Through Social Media Platforms for Identifying and Remedying Customer Problems. BASNA'13: Workshop on Business Applications of Social Network Analysis at ASONAM, 2013. Sumit Bhatia, Jingxuan Li, Wei Peng and Tong Sun.

2012

J2 Journal
Summarizing Figures, Tables and Algorithms in Scientific Publications to Augment Search Results. ACM Transactions on Information Systems (TOIS), 30(1), 2012. Sumit Bhatia and Prasenjit Mitra.
J1 Journal
Specialized Research Datasets in the CiteSeer^x Digital Library. D-Lib Magazine, Vol. 18, No. 7/8, 2012. Sumit Bhatia, Cornelia Caragea, Hung-Hsuan Chen, et al.
C11 Conference
Thread Specific Features are Helpful for Finding Subjectivity Orientation of Online Forum Threads. 24th International Conference on Computational Linguistics (COLING), 2012. Prakhar Biyani, Sumit Bhatia, Cornelia Caragea and Prasenjit Mitra.
C10 Conference
A Scalable Approach for Performing Proximal Search for Verbose Patent Search Queries. 21st ACM Conference on Information and Knowledge Management (CIKM), 2012 (poster). Sumit Bhatia, Bin He, Qi He and Scott Spangler.
C9 Conference
Analysis and Automatic Classification of Web Search Queries for Diversification Requirements. 75th Annual Meeting of the American Society for Information Science and Technology (ASIST), 2012. Sumit Bhatia, Cliff Brunk and Prasenjit Mitra.
W5 Workshop
Classifying User Messages For Managing Web Forum Data. WebDB'12: 15th International Workshop on the Web and Databases at SIGMOD, 2012. Sumit Bhatia, Prakhar Biyani and Prasenjit Mitra.
W4 Workshop
A Query Classification Scheme for Diversification. DDR'12: 2nd International Workshop on Diversity in Document Retrieval at WSDM, 2012. Sumit Bhatia, Cliff Brunk and Prasenjit Mitra.

2011

C8 Conference
Query Suggestions in the Absence of Query Logs. 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2011. Sumit Bhatia, Debapriyo Majumdar and Prasenjit Mitra.
C7 Conference
Multidimensional search result diversification: diverse search results for diverse users. 34th International ACM SIGIR Conference (SIGIR), 2011 (doctoral consortium). Sumit Bhatia.
W3 Workshop
An Algorithm Search Engine For Software Developers. SUITE '11: ICSE Workshop on Search-driven Development, 2011. Sumit Bhatia, Suppawong Tuarob, Prasenjit Mitra and C. Lee Giles.

2010

C6 Conference
Adopting Inference Networks for Online Thread Retrieval. Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2010. Sumit Bhatia and Prasenjit Mitra.
C5 Conference
Utilizing Context in Generative Bayesian Models for Linked Corpus. Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2010. Saurabh Kataria, Prasenjit Mitra and Sumit Bhatia.
C4 Conference
Finding Algorithms in Scientific Articles. 18th International World Wide Web Conference (WWW), 2010 (poster). Sumit Bhatia, Prasenjit Mitra and C. Lee Giles.

2009

C3 Conference
Generating Synopses for Document-Element Search. 18th ACM Conference on Information and Knowledge Management (CIKM), 2009. Sumit Bhatia, Shibamouli Lahiri and Prasenjit Mitra.
W2 Workshop
Synopsis Generation for Specialized Document-Element Search Engines. Workshop on Web Search Result Summarization and Presentation at WWW, 2009. Sumit Bhatia and Prasenjit Mitra.

2008

C2 Conference
SVM Based Decision Support System for Heart Disease Classification with Integer-Coded Genetic Algorithm to Select Critical Features. World Congress on Engineering and Computer Science (WCECS), 2008. Sumit Bhatia, Praveen Prakash and G. N. Pillai.
W1 Workshop
A Retrievable GA for Solving Sudoku Puzzles. Technical Report, Department of Mathematics, IIT Roorkee, 2008. Kedar Nath Das, Sumit Bhatia, Shubhin Puri, and Kusum Deep.

2007

C1 Conference
Rohit Singh Gautam, Sumit Bhatia, Dharmendra Singh, and Ankush Mittal. Harmonic analysis of time-series NOAA/AVHRR images for hotspot detection and land features classification. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2007.

Work Experience

Adobe Inc.

Senior Machine Learning Research Scientist

May 2021 — Present

Media and Data Science Research Lab.

IBM India Research Labs

Senior Research Scientist

Feb. 2017 — Mar. 2021

Member of the Knowledge and Data Engineering team.

IBM Almaden Research Centre

Watson Research Scientist

July 2014 — Feb. 2017

Member of the Watson Discovery Analytics Service. Developed query understanding and disambiguation APIs for the Watson analytics service.

Xerox Palo Alto Research Centre

Post-doctoral Researcher

July 2013 — July 2014

Developed a social media analytics platform for consumer demographic prediction (age, gender, marital and parental status, personality type, technical expertise, etc.).

Xerox Palo Alto Research Centre

Research Intern

Feb. 2013 — June 2013

Developed models for detecting customer complaints via analysis of user tweets. Also developed an algorithm for extracting relevant portions of text from long social media documents.

Microsoft Research

Research Intern

May 2012 — Aug. 2012

Designed and developed "IM an Expert" — a real-time social question answering service allowing users to find answers by connecting with other knowledgeable users on Facebook.

Yandex Labs

Research Intern

Sep. 2011 — Dec. 2011

Analyzed Yandex's query logs to study user query intents and proposed a hierarchy of query diversification requirements.

IBM Almaden Research Centre

Research Intern

May 2011 — Sep. 2011

Developed a scalable algorithm for verbose patent query retrieval. Achieved 700% faster response times while maintaining the quality of search results.

IBM Research, Bangalore

Research Intern

May 2010 — Aug. 2010

Developed a query-log-oblivious probabilistic query suggestion mechanism that generates suggestions directly from the corpus.

Datasets

The following datasets are available for research purposes.

💾

Entity & Context Query Pairs

HyperText 2018

Entity and context query pairs with relevant answers.

💾

KG Edge List

AAAI 2018

Knowledge graph edge list used in AAAI 2018 paper.

💾

Forum Thread Retrieval

Online Forum Research

Dataset for forum thread retrieval experiments.

💾

Forum Subjectivity

Online Forum Research

Forum thread and discussion subjectivity dataset.

💾

Forum Post Classification

Dialog Act Labels

Forum posts labeled with dialog act categories.

💾

Forum Summarization

Online Forum Research

Dataset for forum discussion summarization research.

Tutorials

Knowledge Graphs: In Theory and Practice

We are transitioning from the era of Big Data to Big Knowledge, and semantic knowledge bases such as knowledge graphs play an important role in this transition. This is evident from the increased investments in Knowledge Graph research and development by major industrial players resulting in widely used systems such as IBM's Watson, Google's entity search, Apple's Siri, and Amazon's product graph.

Knowledge Graphs can be constructed either manually (facts authored by humans) or automatically (facts extracted from text using Machine Learning tools). Through this tutorial, we cover state-of-the-art approaches in Knowledge Graph Construction from various types of data using both manual and automated methods, review applications that benefit from the structure and semantics offered by knowledge graphs, and present case studies describing experiences in construction of enterprise Knowledge Graphs.

Slides

Past Editions

Presenters

Sujan Perera, IBM Watson
Nitish Aggarwal, IBM Watson
Sumit Bhatia, IBM Research, India
Saeedeh Shekarpour, Knoesis Research Centre, Ohio, USA
Amit Sheth, Knoesis Research Centre, Ohio, USA

Contact

Email Sumit.Bhatia [at] adobe.com

I am always open to discussing research collaborations, student projects, and speaking opportunities. Feel free to reach out!