UK Biobank enables medical research worldwide through vast database powered by AWS

Digital Leaders
3 min readJun 6, 2024

--

Naomi Allen and Sir Rory Collins, Chief scientist and Leader, UK Biobank

UK Biobank is the world’s most comprehensive source of health data used for research. It houses a vast, continuously growing dataset of biological, health, and lifestyle information. From 2006–2010, UK Biobank recruited 500,000 UK citizens between the ages of 40 and 69 to supply biological samples (blood, urine, and saliva) and information about their lifestyle regularly on an ongoing basis. Research participants also provided consent for linkage to their health-related records.

Today, UK Biobank has about 10,000 variables per volunteer, from simple lifestyle information to physical measures, electronic health records (EHRs), genetic sequencing, biomarker data, and full body scan images. But to reach its goal of a safe and accessible database, UK Biobank had to overcome the challenge of how best to accommodate all of the data in a way that met researchers’ needs. With the inclusion of whole-genome sequencing data for all 500,000 participants, the sheer size of the data (currently around 30 petabytes) meant they had to find a way for researchers to analyze the data where they were situated, instead of downloading huge amounts.

UK Biobank needed a purpose-built data platform with compute and data-storage capabilities that provided analysis tools in a centralized environment and the flexibility to manage increasing quantities of data, allowing researchers to work on the dataset with ease. This led to the establishment and launch in 2021 of the secure, cloud-based UK Biobank Research Analysis Platform (RAP), which is hosted on Amazon Web Services (AWS) in the Europe (London) Region and enabled by DNAnexus. This post highlights UK Biobank’s journey to becoming a globally accessible dataset for health researchers.

Health data for the public interest

The altruism of research participants is at the heart of UK Biobank’s existence. The dataset’s founders and core funders champion contributors’ generosity by making the data available to researchers worldwide, thereby maximizing its benefits as an enabler for new drug discovery, diagnostics, and treatments worldwide.

All the data is de-identified and available to approved researchers for health-related research that is in the public interest. Since the database opened in 2012, more than 30,000 researchers from 90 countries have registered to use UK Biobank. So far, there have been more than 10,000 scientific publications based on researchers’ discoveries using UK Biobank data.

These include discoveries about conditions including cancers, heart disease, chronic kidney disease, stroke, type 2 diabetes, and Alzheimer’s disease. For example, a PhD student in Boston, Massachusetts, took UK Biobank’s genotyping data (around 800,000 markers across the genome) to establish the value of polygenic risk scores (a measure of a person’s disease risk due to their genes). This kind of analysis could support earlier and more targeted interventions for heart disease or aggressive forms of cancer, for instance.

New findings continue to come thick and fast-there were more than 3,000 published reports in 2023. Each enhancement to the data adds to its potential for other scientists.

Continuous, collaborative innovation

UK Biobank is exploring the possibility of adopting new technologies, such as generative artificial intelligence (AI), to make its database even more accessible and digestible to researchers. Initially, generative AI algorithms may simplify and accelerate interrogating the database, for instance, through direct questions such as “How many people in UK Biobank have had a heart attack under the age of 65?” This may progress to predictive analysis, for example, “Given the cholesterol level of men over the age of 65 with obesity, what will their projected cholesterol level be in five years?”

UK Biobank hopes to see the development of complementary biobanks in more countries, as these are essential for capturing detail about disease progression in diverse demographics and environments. The dataset’s leadership team continue to provide advice to scientists on how to set up similar studies and look forward to seeing the continuing, transformative results the UK Biobank RAP has for diagnoses, treatments, and cures around the world.

Originally published at https://digileaders.com on June 6, 2024.

--

--