Rule-Based Spelling Correction for Clinical Texts in Gynecology Using Domain-Specific Terminology

Authors

  • Maryam SABAH 00 Author

Keywords:

Natural Language Processing (NLP), Medical NLP, Spell Correction, Clinical Text Processing, Gynecology, Text Normalization, Domain-Specific Terminology, Lexical Augmentation

Abstract

Spelling errors or domain-specific expressions are common in clinical texts, especially in gynecology, which impairs the performance of Natural Language Processing (NLP) tools applied in medical use cases. In this study, we customized it to correct spelling errors based on typical gynecological medical texts used therein. The module relies on a rule-based methodology, using a structured dataset that includes gynecological symptoms, diagnostic tests, chronic diseases, and medications, along with additional words and characters that express the patient's condition.

The system relies not only on general spelling rules but also on medical expressions and keywords to improve correction performance. It used sample sentences generated with intentional misspellings, mimicking the type of input expected from physicians. The system proved effective at finding misspelled clinical terms and words from its internal dataset. However, it did not modify any terms that were missing or irrelevant to the dataset, except in a few cases where they were very close to known terms.

These results confirm that combining rule-based methods with medical domain knowledge improves correction accuracy and prepares clinical text for reliable NLP analysis.

Downloads

Published

2025-08-21

Issue

Section

Articles