You're currently viewing an old version of this dataset. To see the current version, click here.

CLARA-MeD simplified sentences

Data and Resources

Interoperability

RDF/XML (DCAT-AP)application/rdf+xml

Download

Groups

Additional Info

Field	Value
Identifier	http://hdl.handle.net/10261/346579
Author	Rocío Bartolomé Rodríguez Ana Rosa Terroba Reinares Leonardo Campillos-Llanos
Project	PID2020-116001RA-C33
Name	CLARA-MeD simplified sentences
Description	This dataset contains 1200 manually simplified sentences (144 019 tokens) from clinical trials in Spanish. A total of 1040 announcements from the European Clinical Trials Register (EudraCT) were analyzed to select sentences with ambiguities or exceeding 25 words. Simplification criteria were devised in an annotation guideline, which is released publicly along the dataset. This resource was collected in the CLARA-MeD project, with the goal of simplifying medical texts in the Spanish language and reduce the language barrier to patient's informed decision making. In particular, the project aims at developing linguistic resources for automatic medical term simplification in Spanish; and conducting experiments in automatic text simplification.
Themes	Science and technology Healthcare
Tags	Biomedical natural language processing Parallel sentences Medical text simplification
Creation date	2024-02-09T00:00:00
Last updated
Refresh rate
Languages	Spanish English
Geographic coverage	Spain
Geographic coverage (International)	Europe
Time coverage	From 2024-02-09 to 2024-02-09
Effective resource
Related resources	https://github.com/lcampillos/CLARA-MeD/ http://hdl.handle.net/10261/269887 http://hdl.handle.net/10261/359759 http://hdl.handle.net/10261/359770
Normative
Institute	Instituto de Lengua, Literatura y Antropología (ILLA), CSIC
Publisher	Publicador - Digital.CSIC
Observations	Recommended citation: Bartolomé Rodríguez, Rocío; Terroba Reinares, Ana Rosa; Campillos-Llanos, Leonardo; 2024; CLARA-MeD simplified sentences [Dataset]; DIGITAL.CSIC; https://doi.org/10.20350/digitalCSIC/16110