Sharing genomic data: GCAT Genomes for life and EGA experience

Sharing genomic and epidemiological related data with industry and research
Sharing genomic and epidemiological related data with industry and research

GCAT is a biomedical research project, gathering biological samples, extracting genomic data, as well as lifestyle, health factors, and access to the Electronic Health Records from up to 50,000 volunteers. All this represents a huge amount of data. The project provides open access to its data, which is regulated by an Access Committee. Besides, data is provided through the European Genome-phenome Archive (EGA). Best practices and barriers for data sharing have been learnt from the experience of GCAT project and the EGA.

Process Main Stages: 

1. Setup an access data protocol that fits the requirements of the research project. In the particular case of GCAT, these are:

  • Transparency: The GCAT publishes its policy and procedures for access to the samples and data and their use in research. The GCAT will maintain total control over access and use of data and samples in the project in accordance with its commitment to public use of the collection and with the framework of its own regulations. The Executive Committee of the GCAT will constantly revise the policy of use of the collection to ensure it complies with the access policy and to ensure that the resource is used for the public benefit.
  • Acceptance of GCAT Policy of Use: Access will be defined in the terms and conditions established and accepted by the GCAT participants, including the fulfillment of the consent granted.
  • Research within GCAT Strategic Lines: The projects will fit within the scientific objectives of the GCAT Project.

2. Setup a data access committee. This committee will have the responsibility to grant or deny the access to the data after a scientific and ethical examination of the research that makes the request. It is an independent body that among other tasks evaluates the requests for access to samples and data.

The External Board is composed of at least four members with expertise in the field of biomedical research and led by a member of the GCAT Steering Committee. To guarantee an independent view with no conflict of interest, this board is run according to strict rules. Members are designated by the GCAT Steering Committee for renewable periods of 2 years.

3. Evaluate data access requests. All the proposals will be evaluated by the relevant committees (scientific and ethical) to ensure that they are consistent with the consent of the participants, that they have the pertinent ethical approval and that they are up to the desired standards of excellence.

4. Accept or reject access to GCAT data.

Touchpoints & Bottlenecks: 

To access GCAT data, an application must be filled that is evaluated by the GCAT access committee. Contact can also be established through a third party that provides data access (the EGA in this case).

Regarding bottlenecks, some experience has been gathered from similar projects that provide access to genomic data. Often, in the Data Access Committee (DAC) there is not a professional on data management, so the evaluation becomes difficult. Other problems can be found in the informed consent protocol documents that often are quite complex and difficult to be applied in the right way. The quality of data is not always as good as it should be, as often it lacks additional information on donors or about the process to obtain the biological samples and the analysis methods used. Sometimes not all the information to make a proper use of the data is shared, to avoid any troubles for sharing sensible information. Sometimes the amount of data is quite huge, so it requires some data management skills.

Success Factors / Barriers: 

The main success factor is that GCAT makes genetic data broadly available for further scientific research, but controls over access help ensure protection of the privacy of those individuals who have agreed to have their genomic sequencing data placed in the GCAT database, as well as that of their family members.

The GCAT samples and data can be used for biomedical research by universities and public institutions along with profit and non-profit-making private companies. The shared use of the biological samples is one of the objectives of the GCAT project.

As for the barriers, economic cost to generate the information is reflected in some minimum fees to access the information, this effort is not always well perceived from the scientific community.


Sharing genomic data requires establishing the access requirements clearly and a professional data access committee. Quality and scope of data offered is a factor that can determine the success or failure when providing access to research data.

  • Clearly define the data sharing protocol.
  • Enrol a professional on data management in the Data Access Committee.
  • Use standardized Informed Consent protocols with categorized ontologies for access conditions.
  • Ensure the availability of near and trusted cloud infrastructures for data sharing.
  • For publicly funded research projects it should be mandatory that part of the budget is duly devoted to data management activities, ensuring that data becomes available through open public infrastructures beyond the project life cycle.
  • Delay answers on applications from the Data Access Committee.
  • Allow data maintenance to be made by somebody not aware of the project that created it.
  • Put access fees if possible, or minimize them for some interesting use of the research data.