Research at NLIP Lab

Our research group is dedicated to advancing the state-of-the-art in Natural Language Processing (NLP) and Information Retrieval (IR) by exploring various research directions. These directions include, but are not limited to, language modeling, text classification, natural language generation, multilingual NLP, code-mix modeling, machine translation, conversational AI, information retrieval, social network analysis, graph neural networks, image captioning, and other related domains.

Natural Language Understanding (NLU) is a branch of NLP that focuses on mapping linguistic forms onto a representation of their meanings. It involves the analysis of textual language to extract meaning, intent, context, and relationships between words and phrases. NLU has diverse applications, including but not limited to Named Entity Recognition (NER), Sentiment Analysis, Intent Recognition, Relationship Extraction, Coreference Resolution, Text Classification, Question Answering and many more.

In this domain, we have been actively engaged in addressing numerous complex real-life challenges. These challenges encompass diverse areas such as the identification of hostile content from social media platforms, prompt and effective disaster response during emergency situations such as earthquakes, dialogue state tracking for conversational agents, natural language inference, and a myriad of other related tasks.

Conversational AI is a set of technologies that enable computers to understand, process, and respond to human language in a natural and personalized way. It encompasses voice assistants, chatbots, and other AI-powered systems that can carry out meaningful conversations with users. In this research space we have undertaken problems like These include building chat-bot systems for traffic control, creating conversational agents, and developing dialogue state tracking systems, among others.

Multilingual Natural Language Processing (NLP) involves the development of NLP systems capable of handling multiple languages. Among the notable challenges in this domain is addressing low-resource languages (LRLs), characterized by limited available data or resources. Our research group is dedicated to the advancement of modeling frameworks that facilitate cross-lingual/multilingual transfer across various NLP tasks, specifically focusing on LRLs. Our primary objective is to create efficient and effective solutions that can be practically applied in real-world multilingual scenarios. We have achieved notable progress in enabling zero-shot technologies for several LRLs by employing methodologies such as language structure analysis, cross-lingual transfer learning, meta-learning, and other innovative approaches.

Code-mixed NLP is a specialized research direction that deals with developing NLP systems capable of handling multiple languages within a single conversation or text. Our research group has explored several aspects of this field, particularly in the areas of text classification and hostility detection in conversational settings for social media posts, among others.

The augmentation of knowledge in large language models (LLMs) refers to the process of enhancing the existing knowledge of the model by incorporating new data, information, or context in order to achieve a specific desired goal or downstream task. This procedure is of utmost importance in the current era, as retraining LLMs presents a significant challenge. Our team has made notable contributions in this area, including (a) The development of a hybrid query auto-completion system for Bing by augmenting Trie suggestions in natural language generation (NLG). (b) Augmentation of non-toxic or clean data in LLMs to enable safe text generation for a targeted task.

Controllable Text Generation with Language Models (LLMs) refers to the ability to manipulate and direct the output of a language model according to specific requirements or constraints. It involves generating text that adheres to certain predefined characteristics, such as sentiment, style, topic, or context, while maintaining coherence and natural language fluency. Within our laboratory setting, we have undertaken investigations into several tasks related to controllable text generation using LLMs. These include sentiment steering, detoxification to steer away from toxicity in LLMs, and empathetic text generation.

Parameter Efficient Fine-tuning of LLM is a formal term that denotes a technique or methodology employed to refine (update) a Language Model by utilizing minimal parameters and computational resources. Fine-tuning, on the other hand, entails adapting a pre-trained model to a specific task or domain by subjecting it to additional training with task-specific data. Within our laboratory, we have extensively explored various approaches such as Adapters, prefix training, and auxiliary intermediates for achieving parameter efficiency during fine-tuning across diverse domains and languages.

The process of query formulation can prove to be time-consuming, especially for inexperienced users or those with intricate information requirements. To address this issue, Query Auto-Completion (QAC) module is designed to aid users in effectively expressing their information needs through search queries. The primary objective is to assist users in completing their search tasks more efficiently by accurately comprehending their query intent based on the partially-typed prefix. Within the NLIP lab, in collaboration with the Microsoft Auto-suggest team, our research efforts have been focused on developing an improved quality QAC system. Furthermore, we have focused on scaling these systems to non-English language for the Bing search engine.

Responsible NLP encompasses the ethical and socially responsible aspects of NLP technologies. It entails guaranteeing that NLP models and applications are developed and employed in manners that do not inflict harm upon individuals or communities. Furthermore, it emphasizes the significance of respecting privacy and data protection, as well as addressing issues related to bias and fairness. In alignment with this objective, our research has focused on developing modeling approaches that generate safe (non-toxic) completions for the QAC system. Additionally, we have endeavored to create a safe text generation mechanism with LLM, even in non-English languages.

Information retrieval is the process of retrieving relevant information from a large collection of data, such as documents or web pages. The goal is to provide users with a ranked list of documents that are most relevant to their query. The field of information retrieval encompasses various techniques, including indexing, query processing, and ranking algorithms. Information retrieval has applications in many areas, including search engines, recommendation systems, digital libraries, and e-commerce.

In this space we work on event, disaster and other related problems. With the growth of social media platforms, event-related tweet retrieval has become an important research area in information retrieval. The challenge lies in identifying tweets that are related to a specific event and filtering out irrelevant or noise tweets. Techniques such as query expansion, topic modeling, and sentiment analysis have been used to improve the effectiveness of event-related tweet retrieval. Such techniques have applications in areas such as disaster management, political analysis, and marketing research.

Recommendation systems are a type of information filtering system that provide personalized recommendations to users based on their preferences and past behaviors. These systems have become increasingly popular in e-commerce, social media, and content streaming platforms. Collaborative filtering and content-based filtering are the two main approaches to building recommendation systems. Collaborative filtering uses the past behavior of users and item ratings to recommend new items, while content-based filtering uses item attributes and user preferences to make recommendations. Hybrid approaches that combine both methods have also been developed to improve the accuracy of recommendations.

One of the focused area for our research group is event recommendation systems. It aims to suggest events to users based on their preferences, interests, and location. These systems can benefit both users and event organizers, by increasing attendance and engagement. Techniques such as collaborative filtering, content-based filtering, and hybrid models have been used in event recommendation systems. Challenges in this area include cold-start problem, diversity, and scalability.

Graph Neural Networks (GNNs) are a class of deep learning models that operate on graphs or networks. They have shown great promise in a wide range of applications, including social networks, recommendation systems, and drug discovery. GNNs can learn node and edge representations that capture the structural and semantic information of the graph, allowing for more accurate predictions and efficient processing. However, challenges remain in training and interpreting GNNs, particularly in the context of large, complex graphs with noisy or incomplete data. Ongoing research in this field aims to address these challenges and advance the development and deployment of GNNs in practical applications.
Onc application which GNN has explore for disaster management. The GNN have shown great potential for this field, particularly in tasks such as damage assessment, resource allocation, and evacuation planning. GNNs can effectively model the complex relationships between different entities such as buildings, roads, and people, and capture the dependencies between them. By analyzing and predicting the impact of natural disasters using GNNs, emergency responders and policymakers can make informed decisions that can potentially save lives and minimize damage. However, further research is needed to develop GNN-based approaches that can handle the dynamic and uncertain nature of disaster scenarios.

Image captioning is the task of generating a textual description of an image. This is a challenging task that requires both computer vision and natural language processing techniques. Typically, an image captioning system takes an image as input and generates a sentence or a short paragraph that describes the contents of the image. Image captioning has various applications, such as assisting visually impaired individuals and improving image retrieval systems. Deep learning techniques, such as convolutional neural networks and recurrent neural networks, have shown promising results in image captioning.