Accelerating Real-Time Question Answering via Question Generation
Existing approaches to real-time question answering (RTQA) rely on learning the representations of only key phrases in the documents, then matching them with the question representation to derive answer. However, such approach is bottlenecked by the encoding time of real-time questions, thus suffering from detectable latency in deployment for large-volume traffic. To accelerate RTQA for practical use, we present Ocean-Q (an Ocean of Questions), a novel approach that leverages question generation (QG) for RTQA. Ocean-Q introduces a QG model to generate a large pool of question-answer (QA) pairs offline, then in real time matches an input question with the candidate QA pool to predict the answer without question encoding. To further improve QG quality, we propose a new data augmentation method and leverage multi-task learning and diverse beam search to boost RTQA performance. Experiments on SQuAD(-open) and HotpotQA benchmarks demonstrate that Ocean-Q is able to accelerate the fastest state-of-the-art RTQA system by 4X times, with only a 3+
READ FULL TEXT