i-HUMO: An Interactive Human and Machine Cooperation Framework for Entity Resolution with Quality Guarantees
Even though many approaches have been proposed for entity resolution (ER), it remains very challenging to find one with quality guarantees. To this end, we propose an interactive HUman and Machine cOoperation framework for ER, denoted by i-HUMO. Similar to the existing HUMO framework, i-HUMO enforces both precision and recall levels by dividing an ER workload between the human and the machine. It essentially makes the machine label easy instances while assigning more challenging instances to the human. However, i-HUMO is a major improvement over HUMO in that it is interactive: its process of human workload selection is optimized based on real-time risk analysis on human-labeled results as well as pre-specified machine metrics. In this paper, we first introduce the i-HUMO framework and then present the risk analysis technique to prioritize the instances for manual labeling. Finally, we empirically evaluate i-HUMO's performance on real data. Our extensive experiments show that i-HUMO is effective in enforcing quality guarantees, and compared with the state-of-the-art alternatives, it can achieve better quality control with reduced human cost.
READ FULL TEXT