Empirical Evaluations of Seed Set Selection Strategies for Predictive Coding

03/21/2019
by   Christian J. Mahoney, et al.
0

Training documents have a significant impact on the performance of predictive models in the legal domain. Yet, there is limited research that explores the effectiveness of the training document selection strategy - in particular, the strategy used to select the seed set, or the set of documents an attorney reviews first to establish an initial model. Since there is limited research on this important component of predictive coding, the authors of this paper set out to identify strategies that consistently perform well. Our research demonstrated that the seed set selection strategy can have a significant impact on the precision of a predictive model. Enabling attorneys with the results of this study will allow them to initiate the most effective predictive modeling process to comb through the terabytes of data typically present in modern litigation. This study used documents from four actual legal cases to evaluate eight different seed set selection strategies. Attorneys can use the results contained within this paper to enhance their approach to predictive coding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2019

Evaluation of Seed Set Selection Approaches and Active Learning Strategies in Predictive Coding

Active learning is a popular methodology in text classification - known ...
research
04/03/2019

Empirical Evaluations of Preprocessing Parameters' Impact on Predictive Coding's Effectiveness

Predictive coding, once used in only a small fraction of legal and busin...
research
03/24/2023

Predictive modeling for limited distributed targets

Many forecasting applications have a limited distributed target variable...
research
04/03/2019

Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding

In today's legal environment, lawsuits and regulatory investigations req...
research
12/19/2019

Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review

Research has shown that Convolutional Neural Networks (CNN) can be effec...
research
10/07/2022

To tree or not to tree? Assessing the impact of smoothing the decision boundaries

When analyzing a dataset, it can be useful to assess how smooth the deci...

Please sign up or login with your details

Forgot password? Click here to reset