Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish version 3 (CT-EBM-SP v3)

Data and Resources

Interoperability


Groups


Additional Info

Field Value
Identifier http://hdl.handle.net/10261/416915
Author
Project
Name Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish version 3 (CT-EBM-SP v3)
Description

This is the version 3 of the CT-EBM-SP corpus of 1200 clinical trials (292173 tokens), annotated with 23 entity types and 18 relation types, covering Unified Medical Language System (UMLS) semantic groups, drug-related information, temporal data, and negation/speculation. It includes 11 encoded attributes (e.g., event temporality and experiencer status) and normalized entities to UMLS Concept Unique Identifiers. The corpus contains 87037 entities, including nested and discontinuous entities, 16597 attributes and 68206 relationships. Inter-annotator agreement (IAA) achieved average F1 values of 0.861 (entities), 0.810 (attributes), and 0.791 (relations). 81.75% of entities were normalized (IAA: F1 = 0.966).

The repository includes the code to benchmark this dataset by fine-tuning Transformer models for relation extraction and medical concept normalization. In the relation extraction task, the average F1 ranged from 0.858 to 0.879. In the medical concept normalization task, the accuracy at rank 1 was 0.896.

Themes
  • Science and technology
  • Healthcare
Tags
Creation date 2026-02-03T00:00:00
Last updated 2026-02-04T08:44:42
Refresh rate
Languages English
Geographic coverage
    Geographic coverage (International)
    Time coverage
    Effective resource
    Related resources
    Normative
      Institute
      Publisher Publicador - Digital.CSIC
      Observations

      Recommended citation : Campillos-Llanos, Leonardo; Valverde Mateos, Ana; Capllonch Carrión, Adrián; Zakhir Puig, Sofía; González-Quevedo, David; López-Urbán, María Rosa; Hernando-Tundidor, Soledad; Heras, Jónathan; 2026; Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish version 3 (CT-EBM-SP v3); Zenodo; https://doi.org/10.5281/zenodo.18048413