Nested Biomedical Named Entity Recognition

Document Type : Original Article

Authors

1 Department of Information Systems, Faculty of Computer and Information Sciences, Ain shams Univ.

2 Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, 11566, Egypt

Abstract

Named entity recognition has been regarded as an important task in natural language processing. Extracting biomedical entities such as RNAs, DNAs, cell lines, proteins, and cell types has been recognized as a challenging task. Most of the existing research focuses on the extraction of flat named entities only and ignores the nested entities. Nested entities, on the other hand, are commonly used in real world biomedical applications due to their ability to represent semantic meaning of the named entity. This paper proposes an approach to improve the performance of nested biomedical named entity recognition by using a combination of diverse types of features namely morphological, orthographical, context, part of speech and word representation features while using Structured Support Vector Machine as a machine learning technique. The results obtained from the proposed approach were compared with those from popular benchmark approaches. The popular dataset “Genia” is utilized to evaluate the proposed approach which achieved Recall, Precision and F1-Measure of 84.033%, 85.946 %, and 84.113% respectively.

Keywords