The RareDis corpus: a corpus annotated with rare diseases, their signs and symptoms

The RareDis corpus contains more than 5,000 rare diseases and almost 6,000 clinical manifestations are annotated. Moreover, the Inter Annotator Agreement evaluation shows a relatively high agreement (F1-measure equal to 83.5 exact match criteria for the entities and equal to 81.3 Based on these results, this corpus is of high quality, supposing a significant step for the field since there is a scarcity of available corpus annotated with rare diseases. This could open the door to further NLP applications, which would facilitate the diagnosis and treatment of these rare diseases and, therefore, would improve dramatically the quality of life of these patients.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset