-
CLARA-MeD simplified sentences
This dataset contains 1200 manually simplified sentences (144 019 tokens) from clinical trials in Spanish. A total of 1040 announcements from the European Clinical Trials Register (EudraCT) were analyzed to select sentences with ambiguities or...
Organización: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
-
CLARA-MeD corpus
A collection of 24.298 pairs of professional and simplified texts (>96 million tokens): 1) Drug leaflets and summaries of product characteristics (10 211 pairs of texts, >82M words); 2) Cancer-related information summaries (201 pairs of texts,...
Organización: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC