Dataset-HCSDatos

Botany and Literature in Ancient Mesopotamia—Primary Sources

The data used and generated during the execution of project PID2021125678NB-I00 are of a dual nature, both related to the discipline of the study of the languages and cultures of the ancient Near East. On the one hand, there are cuneiform clay tablets,...

Instituto: Instituto de Lenguas y Culturas del Mediterráneo y Oriente Próximo (ILC), CSIC
- xlsx
- txt
Cuneiform texts Medical texts Magical texts Lexical texts 1st millennium BC
Corpus for Complex Word Identification in Medical Spanish Texts (CWI-Med-Sp)

[Description of methods used for collection/generation of data] The corpus statistics and methods are explained in the following article: Federico Ortega-Riba, Leonardo Campillos-Llanos, Doaa Samy (2025) "Lexical Simplification in Spanish Texts For...

Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
- zip
- txt
Patient information documents Annotated corpus Medical text simplification Biomedical natural language processing Consent forms Clinical trials
CLARA-MeD simplified sentences

This dataset contains 1200 manually simplified sentences (144 019 tokens) from clinical trials in Spanish. A total of 1040 announcements from the European Clinical Trials Register (EudraCT) were analyzed to select sentences with ambiguities or...

Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
- txt
- tsv
- pdf
Biomedical natural language processing Parallel sentences Medical text simplification
CLARA-MeD corpus

A collection of 24.298 pairs of professional and simplified texts (>96 million tokens): 1) Drug leaflets and summaries of product characteristics (10 211 pairs of texts, >82M words); 2) Cancer-related information summaries (201 pairs of texts,...

Instituto: Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
- txt
- zip
Comparable corpus Parallel sentences Medical text simplification Biomedical natural language processing

You can also access this registry using the API (see API Docs).

4 datasets found

Botany and Literature in Ancient Mesopotamia—Primary Sources

Corpus for Complex Word Identification in Medical Spanish Texts (CWI-Med-Sp)

CLARA-MeD simplified sentences

CLARA-MeD corpus