Hola
I'm working on a project that deals with clinical named entity recognition, relation extraction etc. I'm currently using Scispacy library for NER work. However, I'm searching for a open source package for relation extraction from clinical notes (Eg. In the following sentence "Dementia due to Alzheimer disease." I except a model that should recognize the relationship that its not just dementia and its is dementia due to Alzheimer.)
Spending sometime on reading articles and surfing google
I found the following packages:
1. SemRep
2. BioBERT
3. Clincal BioBERT
etc.
from the articles, I also got to know that clincal BioBERT to be the suitable model. However, when I tried running the model from transformer library I just found the following output
Code
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
model = AutoModelForTokenClassification.from_pretrained("emilyalsentzer/Bio_Discharge_Summary_BERT")
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_Discharge_Summary_BERT")
nlp = pipeline('ner', model=model, tokenizer=tokenizer)
text = "Dementia due to Alzheimers disease. Kidney failure due to liver disease."
nlp(text)
Out put:
[{'entity': 'LABEL_1', 'index': 1, 'score': 0.562394917011261, 'word': 'dementia'}, {'entity': 'LABEL_0', 'index': 2, 'score': 0.5325632691383362, 'word': 'due'}, {'entity': 'LABEL_1', 'index': 3, 'score': 0.5473843812942505, 'word': 'to'}, {'entity': 'LABEL_1', 'index': 4, 'score': 0.5070908069610596, 'word': 'alzheimer'}, {'entity': 'LABEL_0', 'index': 5, 'score': 0.5742462873458862, 'word': '##s'}, {'entity': 'LABEL_1', 'index': 6, 'score': 0.5498184561729431, 'word': 'disease'}, {'entity': 'LABEL_1', 'index': 7, 'score': 0.5163406133651733, 'word': '.'}, {'entity': 'LABEL_1', 'index': 8, 'score': 0.5038259625434875, 'word': 'kidney'}, {'entity': 'LABEL_1', 'index': 9, 'score': 0.5872519612312317, 'word': 'failure'}, {'entity': 'LABEL_0', 'index': 10, 'score': 0.523786723613739, 'word': 'due'}, {'entity': 'LABEL_1', 'index': 11, 'score': 0.5193214416503906, 'word': 'to'}, {'entity': 'LABEL_1', 'index': 12, 'score': 0.5457456707954407, 'word': 'liver'}, {'entity': 'LABEL_1', 'index': 13, 'score': 0.5755748748779297, 'word': 'disease'}, {'entity': 'LABEL_1', 'index': 14, 'score': 0.5418881177902222, 'word': '.'}]
From the above output, I except labels such as disease, organ etc. However, the model labeled the entity as 'LABEL_1' or 'LABEL_0'.
How do I use the clinical BioBERT to extract relations. Please advice.