CLARA-MeD simplified sentences

Datos y Recursos

Interoperabilidad


Categorías


Información Adicional

Campo Valor
Identificador https://doi.org/10.20350/digitalCSIC/16110
Autoría
Proyecto
Nombre CLARA-MeD simplified sentences
Descripción

This dataset contains 1200 manually simplified sentences (144 019 tokens) from clinical trials in Spanish. A total of 1040 announcements from the European Clinical Trials Register (EudraCT) were analyzed to select sentences with ambiguities or exceeding 25 words. Simplification criteria were devised in an annotation guideline, which is released publicly along the dataset.

This resource was collected in the CLARA-MeD project, with the goal of simplifying medical texts in the Spanish language and reduce the language barrier to patient's informed decision making. In particular, the project aims at developing linguistic resources for automatic medical term simplification in Spanish; and conducting experiments in automatic text simplification.

Temáticas
  • Ciencia y tecnología
  • Salud
Etiquetas
Fecha de creación 2024-02-09T00:00:00
Fecha última actualización
Frecuencia de actualización
Idiomas
  • Español
  • Inglés
Cobertura geográfica España
Cobertura geográfica. Internacional Europa
Cobertura temporal
  • From 2024-02-09 01:00
Vigencia del recurso
Recursos relacionados
Normativa
    Publicador Publicador - Digital.CSIC
    Observaciones

    Cita recomendada para este dataset: Bartolomé Rodríguez, Rocío; Terroba Reinares, Ana Rosa; Campillos-Llanos, Leonardo; 2024; CLARA-MeD simplified sentences [Dataset]; DIGITAL.CSIC; https://doi.org/10.20350/digitalCSIC/16110