Cracking the Code: DNA Sequencing to Improve Human Health

Genetics and Human Health | Emily Mackie

Inside almost all our cells, wrapped up inside the nucleus, is a copy of the entire length of our DNA, called the genome. If you’re a human reading this, your genome will be made up of around 3.2 billion units of adenine, guanosine, thymine and cytosine bases, all ordered in a way which is completely unique to you. Proteins inside your cells, called polymerases, can then read these bases like a set of instructions and use them to create all your individual traits. Ultimately, this means that an organism’s genome is the key to its life. Until recently, being able to read an organism’s genome was a privilege reserved only for those DNA-replicating proteins, however with the invention of DNA sequencing, we can now look inside cells and understand their genetic instructions ourselves. This allows us to link these instructions to the final biological product, revolutionising our understanding of how and why living organisms survive, die, and evolve. In the medical and health science fields, DNA sequencing is a fundamental tool in the quest to understand and tackle significant challenges to human health, such as cancer and infectious disease.

The first DNA sequencing technologies were developed in the decades following the discovery of the structure of DNA [1]. The first breakthrough in DNA sequencing was achieved by Fred Sanger and his colleagues in 1977, with the technique now known as ‘Sanger sequencing’. This process involves using radiolabelled dideoxynucleotides (ddDNTs) to mimic actual DNA bases, and cause the elongation of a coding strand to be cut short, as they don’t have a 3’ hydroxyl group. During Sanger sequencing, the DNA sequence is copied many times so that there are a massive number of copies of the DNA in the reaction mixture. These fragments are then pulled apart and copied by adding complementary nucleotides. At random points, a complementary ddDNT will be added instead of a regular base as the strand is elongated, which cuts the copying of the template short. Once this copying process is complete, the results will be run through a tube, which sorts the DNA fragments by size. These are then read in order by a computer, which uses a laser to produce a colour signal from the radiolabelled ddNTP. Each colour produced will correspond to a different kind of base, which allows the computer to put all the fragments together to form a complete sequence [2]. Sanger, or ‘first generation’ sequencing, is by no means obsolete and is still commonly used in research laboratories. However, while effective, it is limited by only being able to produce small lengths of sequence, around 1000 kilobases at once. This means that projects requiring high amounts of DNA sequencing can’t be achieved very efficiently with this technique, paving the way for newer, higher-throughput technologies.

More recent developments in sequencing mean that much larger quantities of DNA can be sequenced for a lower cost. The Human Genome Project used Sanger sequencing to assemble a complete human genome, and took around 10 years and $3 billion to complete [3]. Comparatively, newer sequencing technologies mean that a whole genome can now be sequenced in a significantly reduced amount of time, for a mere few thousand dollars. ‘Next generation sequencing’ (NGS) allows for higher throughput and, unlike Sanger sequencing, can sequence up to millions of DNA fragments in parallel [4]. A commonly used form of NGS is Illumina sequencing, which involves running fragmented DNA samples through channels containing short molecules which can ‘capture’ these fragments, allowing them to be copied repeatedly on the surface of the channel to produce thousands of identical copies. These copies are then sequenced in a similar way to Sanger sequencing, by attaching complementary nucleotides with a fluorescent tag that emits a signal that corresponds to a specific base. All the clusters will be sequenced in parallel and can then be layered according to regions of similarity to construct a complete sequence. Even more advanced is third-generation sequencing technology, such as nanopore sequencing, which now allows for sequencing of much longer molecules of DNA than first or second-generation technologies do [4]. This involves threading a length of DNA through a tiny nanopore, and as each type of base passes through, a different electrical current will be produced. These changes are detected and put together to build a sequence. These techniques allow us to gain insight into an organism’s genetic makeup, whether that be just a particular gene or its entire sequence. This has a vast array of useful applications across many areas, including tackling formidable threats to human health, such as genetic disease, cancer or viral outbreaks.

By sequencing an individual’s genetic sequence, mutations can be identified which might provide an explanation for an uncharacterised pathology, or could even help to predict and prevent severe disease before its onset. As an example, diseases resulting from a mutation at a single point in the DNA are responsible for a large number of infant mortalities [5]; however, these conditions often go undiagnosed. By sequencing the genome of these patients, the disease can be diagnosed, and potentially life-saving treatments can be implemented before it is too late. Similarly, genome-wide association studies (GWAS), which analyse the genomes of individuals with a certain condition or disease, can identify mutations which will elevate an individual’s risk of developing a particular disease later in life [6]. For example, mutations in the BRCA1/2 genes are associated with an increased risk of breast cancer, and a CAG nucleotide repeat in the HTT gene can indicate the future onset of Huntington’s Disease [7]. By identifying these genetic aberrations early, the disease may be able to be managed more effectively when it manifests or may even be prevented altogether.

Furthermore, once the disease has begun, DNA sequencing can help to identify its patient-specific characteristics, and these features can be used to personalise treatment to most effectively target the malady. For example, an up-andcoming prospect for the treatment of metastatic melanoma is the development of personalised neoantigen vaccines [8]. These are developed by extracting a sample of a patient’s melanoma tumour and then sequencing it to identify mutations which cause neoantigens, or cancer-modified proteins, to be produced. These neoantigens can be recognised as new and potentially dangerous by the immune system; however, the immune response isn’t always as strong as it needs to be to enact sustained control of the cancer. Neoantigen vaccines stimulate immune cells around the body to launch a response against these neoantigens, which ultimately allows for a more robust attack on cancer cells.

Many people who aren’t well-versed in the field of genetics may have heard of the term ‘genome sequencing’ from tuning into the daily COVID-19 lockdown updates. In the context of an outbreak of infectious disease, genome sequencing becomes not just a way to understand the genetics of a virus or bacteria, but also a key method of understanding what the pathogen is and how it is spreading through the population. At the onset of the COVID-19 pandemic, the genetic makeup of the novel coronavirus was quickly identified through whole-genome sequencing [9]. This allowed researchers to compare the DNA sequence of this virus to those with similar clinical manifestations, and by assessing how similar their genomes were, they established that the new virus was a coronavirus, and identified molecular features which might help it to transmit and cause disease [10]. Furthermore, genome sequencing played an important role in the ongoing management and mitigation of the spread of the virus [11]. At the height of the pandemic, new cases of SARS-CoV-2 were identified from nasal swab samples in laboratories around the country. If the virus was detected in a sample, these laboratories would extract and sequence its genome and then compare the genome to other sequences in a database of SARS-CoV-2 genomes. Again, by analysing how similar the genome of an isolated virus from a new case was to that of previously isolated viruses, researchers could determine how closely related the two viral isolates were, and therefore whether their transmission was likely to be linked. By layering genomic data with spatial and temporal epidemiological information, they could understand how that individual caught the virus and help to inform how further spread could be attenuated.

These are just a few examples of how DNA sequencing impacts the health sphere; however, its range of applications reaches much further than this. DNA sequencing, whether it be of a single gene or an entire genome, allows us to read the very biological instructions with which living organisms operate, and as such, provides us with invaluable insight into how genetics influence the form of living organisms across time. This understanding has already given us the necessary tools to understand and treat significant threats to human health. As we look to the future, and as DNA sequencing technologies continue to evolve and become more efficient, these will be key players in elucidating answers to the big health-related questions that we haven’t yet been able to answer. Is there a cure for cancer? Can we predict and prevent genetic conditions? These questions are still unclear, the answers encoded in a string of As, Cs, Ts, and Gs. However, DNA sequencing is our code-cracker, and by further deciphering our genes, we may one day find answers.

[1] J. M. Heather and B. Chain, “The sequence of sequencers: The history of sequencing DNA,” Genomics, vol. 107, no. 1, pp. 1-8, 2016, doi:10.1016/j.ygeno.2015.11.003.

[2] J. Shendure and H. Ji, “Next-generation DNA sequencing,” Nature Biotechnology, vol. 26, no. 10, pp. 1135-1145, 2008, doi: 10.1038/nbt1486.

[3] “The Cost of Sequencing a Human Genome.” National Human Genome Research Institute. https://www.genome. gov/about-genomics/fact-sheets/ S e q u e n c i n g - H u m a n - G e n o m e - c o s t (retrieved Jul. 3, 2025)

[4] H. Satam et al., “Next-Generation Sequencing Technology: Current Trends and Advancements,” Biology, vol. 12, no. 7, p. 997, 2023, doi: 10.3390/biology12070997.

[5] S. F. Kingsmore, R. Nofsinger, and K. Ellsworth, “Rapid genomic sequencing for genetic disease diagnosis and therapy in intensive care units: a review,” npj Genomic Medicine, vol. 9, no. 1, 2024, doi: 10.1038/ s41525-024-00404-0.

[6] J. W. Prokop et al., “Genome sequencing in the clinic: the past, present, and future of genomic medicine,” Physiological Genomics, vol. 50, no. 8, pp. 563-579, 2018, doi: 10.1152/ physiolgenomics.00046.2018.

[7] M. Macdonald, “A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes,” Cell, vol. 72, no. 6, pp. 971-983, 1993, doi: 10.1016/0092- 8674(93)90585-e.

[8] A. Mehta, M. Motavaf, I. Nebo, S. Luyten, K. D. Osei-Opare, and A. A. Gru, “Advancements in Melanoma Treatment: A Review of PD-1 Inhibitors, T-VEC, mRNA Vaccines, and Tumor-Infiltrating Lymphocyte Therapy in an Evolving Landscape of Immunotherapy,” Journal of Clinical Medicine, vol. 14, no. 4, p. 1200, 2025, doi: 10.3390/jcm14041200.

[9] R. Harfoot et al., “Characterization of the First SARS-CoV-2 Isolates from Aotearoa New Zealand as Part of a Rapid Response to the COVID-19 Pandemic,” Viruses, vol. 14, no. 2, p. 366, 2022, doi: 10.3390/ v14020366.

[10] W. Tan et al., “A Novel Coronavirus Genome Identified in a Cluster of Pneumonia Cases — Wuhan, China 2019−2020,” China CDC Weekly, vol. 2, no. 4, pp. 61-62, 2020, doi: 10.46234/ccdcw2020.017.

Emily is completing her Honours degree in Biomedical Science, where she uses CRISPR-Cas9 gene editing technology to research the molecular basis of fragile skin conditions. Emily is fascinated by the intersection of science with society, and outside of the lab, is a keen writer, reader, and artist.

Emily Mackie - BBiomedSc (Hons), Molecular Biology and Genetics