Trie-NLG: Trie Context Augmentation to Improve Personalized Query Auto-Completion for Short and Unseen Prefixes

Abstract

Query auto-completion (QAC) aims at suggesting plausible completions for a given query prefix. Traditionally, QAC systems have leveraged tries curated from historical query logs to suggest most popular completions. In this context, there are two specific scenarios that are difficult to handle for any QAC system: short prefixes (which are inherently ambiguous) and unseen prefixes. Recently, personalized Natural Language Generation (NLG) models have been proposed to leverage previous session queries as context for addressing these two challenges. However, such NLG models suffer from two drawbacks: (1) some of the previous session queries could be noisy and irrelevant to the user intent for the current prefix, and (2) NLG models cannot directly incorporate historical query popularity. This motivates us to propose a novel NLG model for QAC, Trie-NLG, which jointly leverages popularity signals from trie and personalization signals from previous session queries. We train the Trie-NLG model by augmenting the prefix with rich context comprising of recent session queries and top trie completions. This simple modeling approach overcomes the limitations of trie-based and NLG-based approaches and leads to state-of-the-art performance. We evaluate the Trie-NLG model using two large QAC datasets. On average, our model achieves huge ∼57% and ∼14% boost in MRR over the popular trie-based lookup and the strong BART-based baseline methods, respectively.

BibTeX

@article{10.1007/s10618-023-00966-0,
    author     = {Maurya, Kaushal Kumar and Desarkar, Maunendra Sankar and Gupta, Manish and Agrawal, Puneet},
    title      = {Trie-Nlg: Trie Context Augmentation to Improve Personalized Query Auto-Completion for Short and Unseen Prefixes},
    year       = {2023},
    issue_date = {Nov 2023},
    publisher  = {Kluwer Academic Publishers},
    address    = {USA},
    volume     = {37},
    number     = {6},
    issn       = {1384-5810},
    url        = {https://doi.org/10.1007/s10618-023-00966-0},
    doi        = {10.1007/s10618-023-00966-0},
    abstract   = {Query auto-completion (QAC) aims at suggesting plausible completions for a given query prefix. Traditionally, QAC systems have leveraged tries curated from historical query logs to suggest most popular completions. In this context, there are two specific scenarios that are difficult to handle for any QAC system: short prefixes (which are inherently ambiguous) and unseen prefixes. Recently, personalized Natural Language Generation (NLG) models have been proposed to leverage previous session queries as context for addressing these two challenges. However, such NLG models suffer from two drawbacks: (1) some of the previous session queries could be noisy and irrelevant to the user intent for the current prefix, and (2) NLG models cannot directly incorporate historical query popularity. This motivates us to propose a novel NLG model for QAC, Trie-NLG, which jointly leverages popularity signals from trie and personalization signals from previous session queries. We train the Trie-NLG model by augmenting the prefix with rich context comprising of recent session queries and top trie completions. This simple modeling approach overcomes the limitations of trie-based and NLG-based approaches, and leads to state-of-the-art performance. We evaluate the Trie-NLG model using two large QAC datasets. On average, our model achieves huge ∼57\% and ∼14\% boost in MRR over the popular trie-based lookup and the strong BART-based baseline methods, respectively. We make our code publicly available at .},
    journal    = {Data Min. Knowl. Discov.},
    month      = {aug},
    pages      = {2306–2329},
    numpages   = {24},
    keywords   = {Transformers, Pre-trained Models, Trie-NLG, Natural Language Generation, Query Auto Completion, AutoSuggest}
}