Cross-project Classification of Security-related Requirements
We investigate the feasibility of using a classifier for security-related requirements trained on requirement specifications available online. This is helpful in case different requirement types are not differentiated in a large existing requirement specification. Our work is motivated by the need to identify security requirements for the creation of security assurance cases that become a necessity for many organizations with new and upcoming standards like GDPR and HiPAA. We base our investigation on ten requirement specifications, randomly selected from a Google Search and partially pre-labeled. To validate the model, we run 10-fold cross-validation on the data where each specification constitutes a group. Our results indicate the feasibility of training a model from a heterogeneous data set including specifications from multiple domains and in different styles. However, performance benefits from revising the pre-labeled data for consistency. Additionally, we show that classifiers trained only on a specific specification type fare worse and that the way requirements are written has no impact on classifier accuracy.
READ FULL TEXT