-
Botany and Literature in Ancient Mesopotamia—Primary Sources
The data used and generated during the execution of project PID2021125678NB-I00 are of a dual nature, both related to the discipline of the study of the languages and cultures of the ancient Near East. On the one hand, there are cuneiform clay tablets,...
Instituto: Instituto de Lenguas y Culturas del Mediterráneo y Oriente Próximo (ILC), CSIC
-
Corpus for Complex Word Identification in Medical Spanish Texts (CWI-Med-Sp)
[Description of methods used for collection/generation of data] The corpus statistics and methods are explained in the following article: Federico Ortega-Riba, Leonardo Campillos-Llanos, Doaa Samy (2025) "Lexical Simplification in Spanish Texts For...
Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
-
CLARA-MeD simplified sentences
This dataset contains 1200 manually simplified sentences (144 019 tokens) from clinical trials in Spanish. A total of 1040 announcements from the European Clinical Trials Register (EudraCT) were analyzed to select sentences with ambiguities or...
Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
-
CLARA-MeD corpus
A collection of 24.298 pairs of professional and simplified texts (>96 million tokens): 1) Drug leaflets and summaries of product characteristics (10 211 pairs of texts, >82M words); 2) Cancer-related information summaries (201 pairs of texts,...
Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
