Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity

by   Ankush Chopra, et al.

Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates unnormalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a metaclassifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset as well as a dataset specific to the financial domain.


page 1

page 2

page 3

page 4


Effective FAQ Retrieval and Question Matching With Unsupervised Knowledge Injection

Frequently asked question (FAQ) retrieval, with the purpose of providing...

LSTM-based Deep Learning Models for Non-factoid Answer Selection

In this paper, we apply a general deep learning (DL) framework for the a...

A Syntactic Approach to Domain-Specific Automatic Question Generation

Factoid questions are questions that require short fact-based answers. A...

How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

We present a large-scale dataset for the task of rewriting an ill-formed...

MK-SQuIT: Synthesizing Questions using Iterative Template-filling

The aim of this work is to create a framework for synthetically generati...

Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading

Automatic Short Answer Grading (ASAG) is the process of grading the stud...

WordNet and Cosine Similarity based Classifier of Exam Questions using Bloom’s Taxonomy

Assessment usually plays an indispensable role in the education and it i...

Please sign up or login with your details

Forgot password? Click here to reset