Machine learning model for clinical NER

Ravikumar J., Dr. Ramakanth Kumar P.


To extract important concepts (named entities) from clinical notes, most widely used NLP task is Named Entity Recognition (NER). It is found from the literature that several researchers have extensively used machine learning models for clinical NER. Databases such as PubMed which include medical publications have generated lot of interest among researchers for applying information extraction techniques to medical literature. These techniques generate tremendous advantages for both medial applications and research. The most fundamental tasks among the medical data mining tasks are medical named entity recognition and normalization. The task of the medial named entity recognition is to identify the boundaries of mentions from text that contains medical data. It is different from general NER in various ways. Huge number of alternate spellings and synonyms create explosion of word vocabulary sizes. This reduces the medicine dictionary efficiency. Entities often consist of long sequences of tokens, making harder to detect boundaries exactly. The notes written by clinicians written notes are less structured and are in minimal grammatical form with cryptic short hand. Because of this, it poses challenges in named entity recognition. Generally, NER systems are either rule based or pattern based. The rules and patterns are not generalizable because of the diverse writing style of clinicians. These issues can be addressed by making use of technologies like machine learning. Named Entity Recognition is grouped into three approaches. Machine learning based approaches, rule-based approaches and dictionary based approaches. The systems that use machine learning based approach focus on choosing effective features for classifier building. In this work, machine learning based approach has been used to extract the clinical data in a required manner.


clinical NER; machine learning; named entity recognition; natural language processing;

Total views : 0 times

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

ISSN 2088-8708, e-ISSN 2722-2578