Study Opens Door to Cancer Research in the ‘Dark Matter’ of the Genome

Key Takeaways:

  • Mutations in non-protein-coding regions of the genome – the “dark matter” of the genome – have been difficult to link to cancer.
  • A new study provides a blueprint for future research into the role of mutations across the genome and their possible link to cancer.

When a dam collapses because of a design flaw, tracing the error to the original blueprint is a relatively straightforward matter. When the breach happens for another reason — shoddy workmanship, insufficient oversight during construction, a failure to follow design specifications, or any of a range of other possibilities — the investigation can be much more challenging.

Something of the same problem confronts scientists exploring how mutations in vast stretches of the genome contribute to cancer. Researchers have identified hundreds of mutations that drive or sustain cancer growth. Virtually all of these occur in regions of the genome that hold the code for cell proteins. Collectively, these regions account for only about 2% of the entire genome. Mutations that occur in the other 98% — that don’t result in the production of abnormal proteins – have been far more difficult to tie to cancer.

In a recent study in the journal Science, Dana-Farber researchers offer a new technique for making those connections. Using this approach — a three-part method of assessing whether mutations anywhere in the genome are likely involved in cancer — they discovered distinct patterns of mutations in 19 major types of cancer.

The paper provides a blueprint for future research into the role of mutations in cancer tissue — whether in protein-coding stretches of DNA or the much vaster, but less-understood non-coding regions, sometimes called the dark matter of the genome. Once researchers have a handle on how dark matter mutations affect cancer growth and spread, they’ll be in position to develop therapies to counter them.

A matter of interpretation

“It was initially thought that the statistical techniques used to identify cancer-related proteins in protein-coding sections could be extended to non-coding regions as well,” says the study’s first author, Felix Dietlein, MD, PhD, who conducted the research at Dana-Farber and is now a Harvard Medical School Assistant Professor in the Computational Health Informatics Program at Boston Children’s Hospital. “Over the past few years, however, it has become clear that we need a completely new way of interpreting data about mutations in non-coding areas.”

Felix Dietlein, MD, PhD
Felix Dietlein, MD, PhD

The discrepancy relates to the very different roles of coding and non-coding portions of the genome. In coding areas, also known as exons, the letters of the DNA code dictate which amino acids the cell will assemble into proteins. Mutations or other errors in the genetic code can result in clunky, misshapen proteins that may contribute to cancer. Non-coding regions, by contrast, are a more diverse group: some raise or lower the activity of specific genes; others, apparently, are idle, having no impact on cell life.

Researchers use statistical techniques to determine whether mutations in coding areas are cancer “drivers” — helping promote cancer — or passengers, irrelevant to the cancer process. “The algorithms used to determine whether coding region mutations are drivers or passengers may not apply to all portions of the non-coding genome,” Dietlein states.

Mutations that matter

A 2020 study known as the Pan-Cancer Analysis of Whole Genomes (PCAWG) made a start in elucidating the role of non-coding mutations. The study, organized by the International Cancer Genome Consortium, scanned the entire genomes of cancer cells from more than 2,600 patients, exploring the nature and consequences of abnormalities in both coding and non-coding regions.

“PCAWG laid the groundwork for much of the way we now think about mutations across the genome,” Dietlein observes, “but we still saw a need to develop additional techniques for studying these in a systematic manner.

The Science study sets forth three principles to guide such research.

  • Because mutations appear in a variety of patterns in non-coding regions, researchers may need several computational tools to probe them. (The technique used by Dietlein and his colleagues in the current study is a composite of three such tools.) In coding areas, where genetic information has just one function — to be a template for proteins — a single algorithm may suffice.
  • In contrast to the traditional “gene-centric” approach to research — which studies mutations solely in the context of the genes in which they arise — scientists need to think more broadly. Instead of viewing mutations primarily through the lens of whether they are cancer drivers or passengers, scientists should consider other mechanisms by which they might contribute to cancer.
  • Scientists should be open to the possibility that mutations may contribute to cancer in ways that have yet to be identified. “A misspelling in the genetic code that results in an abnormal protein is not the only important category of mutational event,” Dietlein says. “Because of the heterogeneous nature of non-coding regions, there are probably different processes by which mutations there can lead to cancer.”

Mutations by the millions

For the Science study, Dietlein and his colleagues used their composite method to search for mutations throughout the genomes of cancer cells from 3,949 patients with 19 types of cancer. They detected 61.2 million somatic — non-inherited — mutations.

The mutations appeared in regions across the genome, including:

  • Protein-coding regions, including many mutations that were well-established as cancer drivers;
  • Regulatory regions, which control gene expression. Many of these involved cancer-relevant genes such as BCL6, FGFR2, RAD51B, and XBP1, suggesting they may turn out to be driver mutations;
  • Tissue-specific genes, which are active only in certain types of tissue – in stomach tissue, for example, or liver tissue;
  • Sectors of the genome whose function or purpose is unknown.

The findings offer an important reference point for future efforts to identify cancer-related genes, says the study’s senior author, Eliezer Van Allen, MD, Harvard Medical School Associate Professor and Division Chief of Population Sciences at Dana-Farber.

“This type of work is key to unlocking the potential of understanding how events in non-coding regions play a role in the development of cancer,” he says. “It also underscores how much remains to be learned about the relationship between genetic abnormalities and the biological processes involved in this disease.”