A Divide-and-Conquer Approach to the Summarization of Academic Articles
We present a novel divide-and-conquer method for the summarization of long documents. Our method processes the input in parts and generates a corresponding summary. These partial summaries are then combined in order to produce a final complete summary. Splitting the problem of long document summarization into smaller and simpler problems, reduces the computational complexity of the summarization process and leads to more training examples that at the same time contain less noise in the target summaries compared to the standard approach of producing the whole summary at once. Using a fairly simple sequence to sequence architecture with a combination of LSTM units and Rotational Units of Memory (RUM) our approach leads to state-of-the-art results in two publicly available datasets of academic articles.
READ FULL TEXT