Bridging the Gap: Incorporating a Semantic Similarity Measure for Effectively Mapping PubMed Queries to Documents

08/05/2016
by   Sun Kim, et al.
0

The main approach of traditional information retrieval (IR) is to examine how many words from a query appear in a document. A drawback of this approach, however, is that it may fail to detect relevant documents where no or only few words from a query are found. The semantic analysis methods such as LSA (latent semantic analysis) and LDA (latent Dirichlet allocation) have been proposed to address the issue, but their performance is not superior compared to common IR approaches. Here we present a query-document similarity measure motivated by the Word Mover's Distance. Unlike other similarity measures, the proposed method relies on neural word embeddings to compute the distance between words. This process helps identify related words when no direct matches are found between a query and a document. Our method is efficient and straightforward to implement. The experimental results on TREC Genomics data show that our approach outperforms the BM25 ranking function by an average of 12 average precision. Furthermore, for a real-world dataset collected from the PubMed search logs, we combine the semantic measure with BM25 using a learning to rank method, which leads to improved ranking scores by up to 25 experiment demonstrates that the proposed approach and BM25 nicely complement each other and together produce superior performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Beyond [CLS] through Ranking by Generation

Generative models for Information Retrieval, where ranking of documents ...
research
09/04/2019

Affect Enriched Word Embeddings for News Information Retrieval

Distributed representations of words have shown to be useful to improve ...
research
06/23/2016

Toward a Deep Neural Approach for Knowledge-Based IR

This paper tackles the problem of the semantic gap between a document an...
research
05/03/2023

Understanding Differential Search Index for Text Retrieval

The Differentiable Search Index (DSI) is a novel information retrieval (...
research
07/14/2021

A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU

The Word Movers Distance (WMD) measures the semantic dissimilarity betwe...
research
05/14/2020

An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance

The Word Mover's Distance (WMD) is a metric that measures the semantic d...
research
09/27/2018

Consistency and Variation in Kernel Neural Ranking Model

This paper studies the consistency of the kernel-based neural ranking mo...

Please sign up or login with your details

Forgot password? Click here to reset