-
SimpMedLexSp (Simple Medical Lexicon for Spanish)
A medical lexicon of 14013 pairs of technical word forms and the corresponding simplified synonym or definition. It is aimed at automatic text simplification in Spanish. A subset of the lexicon (4642 term entries) was also normalized to Unified Medical...
Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
-
CLARA-MeD simplified sentences
This dataset contains 1200 manually simplified sentences (144 019 tokens) from clinical trials in Spanish. A total of 1040 announcements from the European Clinical Trials Register (EudraCT) were analyzed to select sentences with ambiguities or...
Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
-
CT-EBM-SP - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish
A collection of 1200 texts (292 173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic...
Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
-
Medical Lexicon for Spanish (MedLexSp)
MedLexSp is an unified medical lexicon for Medical Natural Language Processing in Spanish. It includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 Unified Medical Language System (UMLS) Concept...
Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
-
CLARA-MeD corpus
A collection of 24.298 pairs of professional and simplified texts (>96 million tokens): 1) Drug leaflets and summaries of product characteristics (10 211 pairs of texts, >82M words); 2) Cancer-related information summaries (201 pairs of texts,...
Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC