Extracting Procedural Knowledge from Technical Documents

by   Shivali Agarwal, et al.

Procedures are an important knowledge component of documents that can be leveraged by cognitive assistants for automation, question-answering or driving a conversation. It is a challenging problem to parse big dense documents like product manuals, user guides to automatically understand which parts are talking about procedures and subsequently extract them. Most of the existing research has focused on extracting flows in given procedures or understanding the procedures in order to answer conceptual questions. Identifying and extracting multiple procedures automatically from documents of diverse formats remains a relatively less addressed problem. In this work, we cover some of this ground by – 1) Providing insights on how structural and linguistic properties of documents can be grouped to define types of procedures, 2) Analyzing documents to extract the relevant linguistic and structural properties, and 3) Formulating procedure identification as a classification problem that leverages the features of the document derived from the above analysis. We first implemented and deployed unsupervised techniques which were used in different use cases. Based on the evaluation in different use cases, we figured out the weaknesses of the unsupervised approach. We then designed an improved version which was supervised. We demonstrate that our technique is effective in identifying procedures from big and complex documents alike by achieving accuracy of 89


page 1

page 2

page 3

page 4


Mining Procedures from Technical Support Documents

Guided troubleshooting is an inherent task in the domain of technical su...

Answer Extraction in Question Answering using Structure Features and Dependency Principles

Question Answering (QA) research is a significant and challenging task i...

Unfolding the Structure of a Document using Deep Learning

Understanding and extracting of information from large documents, such a...

Enhanced vectors for top-k document retrieval in Question Answering

Modern day applications, especially information retrieval webapps that i...

CHIC: Corporate Document for Visual question Answering

The massive use of digital documents due to the substantial trend of pap...

Landmarks and Regions: A Robust Approach to Data Extraction

We propose a new approach to extracting data items or field values from ...

How practical is it? Machine Learning for Identifying Conceptual Interoperability Constraints in API Documents

Building meaningful interoperation with external software units requires...

Please sign up or login with your details

Forgot password? Click here to reset