When I’m asked my opinion about tests like Galleri from the company Grail (a spin-off from Illumina), I look at my watch to determine if there’s enough time left in the day. These tests are generically referred to as MCED (multicancer early detection tests) or “liquid biopsy” or ctDNA (circulating tumor DNA). The science is sound. Understanding the test is another thing entirely.
On the surface, it looks great. A single blood sample can detect more than 50 types of cancer, hopefully at an early stage. Already, we’re in trouble. There’s a huge amount of data for each of the 50+ cancers. We are accustomed to using at least 7 “performance characteristics” as we discuss screening tests one cancer type at a time. But blood test performance is different for each type of cancer. Multiply 7 performance characteristics by 50 cancer types, and we have about 350 data points to start the review if we want to know how good the test is for each type of cancer. Calling it a “major challenge” doesn’t do it justice.
What do we know so far? At this point, a “large” prospective study, called PATHFINDER, enrolled a little over 6,500 patients, not nearly enough to get a handle on this new approach to screening. The REACH trial is underway, aiming for 50,000 patients, and a UK trial is aiming to enroll 100,000 patients. This should provide some clarity, but numerous controversies will be generated. Consider the patient with a cancer signal that indicates a positive test, but no cancer can be found after multiple radiologic procedures. That patient will be called a “false-positive.” But what if, 12 months later, that patient develops a cancer that fits the original blood test results. In reality, that patient is a “true positive,” we just didn’t have the means to confirm the diagnosis with the first blood sample. But where do you place that patient in your statistical analysis? She or he has should be registered as a “false negative” since the original results indicated “no further action needed.” (btw — this scenario is already occurring).
Consider the fact that mammographic screening is a single test, and the “performance characteristics” are still hotly debated. Now, we shoot for 50 cancer types, and the troubles have already come in the form of class action suits for — in brief summary — due to over-hype followed by a false-positive or false-negative.
The PATHFINDER STUDY was conducted at 7 major health care facilities using one of the MCED tests (Galleri) to screen prospectively for 50+ cancer types. The results? It would take a book filled with data to explain everything. In fact, the authors gave us Supplementary Data online (15 Tables and 5 Figures) because there wasn’t enough room in the main body of the article.
One thing we already know from past articles in the development of Galleri (and other similar tests) — it DOES NOT WORK WELL FOR BREAST CANCER. For some reason, breast cancer cells don’t shed detectable ctDNA as reliably as other types of cancer (more on this later). And, for full disclosure, I’ve been in the breast cancer blood test business for 30 years, so I’m looking at the ctDNA approach with a bias that indicates we should stick with one type of cancer unless future developments indicate otherwise.
Where the test has its most promising role is for those cancer types where screening is not routinely performed or recommended. Take pancreatic cancer, for instance. Finding pancreatic cancer earlier does not automatically save a life. The biology is aggressive and might not lend itself to early detection. Then again, it might prove to be the answer for pancreatic cancer as well as other cancer types not being screened. However, the data is being presented to clinicians in a confusing way that puts a spin on the test that is optimistic beyond reality. The marketing department is not fabricating the facts, rather they are simply molding the perspective (see Addendum at the END on “performance characteristics.”)
For instance, we are told that Specificity (one of the “performance characteristics”) is 99% for multiple types of cancer. That’s great, but what does that mean?….that the test will find 99% of cancers? No, not by a long shot. Specificity says, “Given no cancers in a population, what are the chances that the test will be positive (a.k.a. “false-positive”). Because only a small percentage of patients are carrying occult cancer, the large number of “negatives” in a study dilute the formula and make Specificity look really, really good at 99% (only 1% chance of a false-positive if you undergo the test). In the PATHFINDER study, there were over 6,500+ patients, and that large number is part of both the numerator and denominator of the Specificity formula, generating the tiny 1%.
But ask the question differently, and in a more clinically relevant fashion — “If I get a POSITIVE result on the Galleri test, what are the odds that I’ll actually have one of the 50 types of cancer?” This is a different performance characteristic called Positive Predictive Value (PPV), and it is this number that is critical for a good, informed consent when discussing the test with laymen and health care providers. Here, the large number of 6,500 patients is not part of the PPV formula, so one might get a PPV of 43%, leaving a 57% chance of the test being a false-positive. The false-positives are reasonable with Galleri, but their focus is on the 99% Specificity. Potential users must understand the 99% means that,prior to testing, there is only a 1% chance of a false-positive. Yet, given a positive test, there is a 43% chance of some type of cancer being present upon further investigation, usually radiologic studies, yet ending up with a 57% false-positive rate. At this point in performance, the overall false-positive rate is higher than the true positive rate.
Now, each cancer type has its own Specificity, Positive Predictive Value (PPV), and Negative Predictive Value (NPV). (The NPV is hugely diluted by the 6,500 negatives, so it’s not very helpful here). Additional performance characteristics are mentioned below. Then, it gets more complicated as, depending on the cancer type, some cancers are not early at all. So, when divided into stages, one can get frustrated when told “we find 60% of a certain type of cancer,” only to learn that half of them are advanced stage where screening doesn’t help. Leaving the larger numbers for advanced stage in place, however, can lead to easier acceptance of the test.
Even when a test is finding early (smaller) cancers reliably, eventually it will be necessary to document a mortality reduction or one of the surrogates for mortality reduction (fewer interval cancers, stage shift, fewer advanced cancers). We might think we’ve solved the problem of aggressive cancers by finding them smaller, but they could prove to be just as deadly. That’s why a mortality reduction, or surrogate, must eventually be confirmed.
Hang on…here’s the punchline. I’ve not mentioned the most important performance characteristic of all — SENSITIVITY. This is the percentage of cancers that will be detected by a screening test. Stated alternatively, given a certain number of cancers in a cohort, it’s the percentage that will be found by the screening test. Sensitivity is how we generate our mortality reductions. This is where lives are saved, finding more cancers earlier. It is the first characteristic on everyone’s mind when a new screening test is introduced. Specificity is very important when it comes to the feasibility factors, but it doesn’t save lives. To save lives, one has to detect as many cancers as possible, as small as possible, and this is entirely through Sensitivity. The other way to state Sensitivity is through a “false-negative rate.” If mammography is detecting 70% of breast cancers, then its Sensitivity is 70% and its false-negative rate is 30% (false-negative = mammo negative, but cancer present).
For those of us working on a blood test that “specializes” in one cancer type — breast cancer — Sensitivity vastly exceeds what is being seen in the MCED tests. Scientists are approaching 80% to 90% Sensitivity with breast-only blood testing using different methodologies. And this is where it turns bizarre when it comes to breast cancer, which is only one test of 50+ in the Galleri test. In nearly all the MCED studies, the Sensitivity for breast cancer early detection is low. Instead of acknowledging an issue here, amazingly, SENSITIVITY is barely mentioned. I can calculate overall sensitivity for all cancer types from an algorithm in Figure S3 — 21%. FYI…21% is terrible (only finding 21% of cancers), and that’s all types of cancer, and all stages. Limited to smaller Stage I & 2, the number would be even lower. This explains why the company focus is on cancer types for which there is no routine screening.
When it comes to breast cancer and MCED testing, the shocking point is not simply that Sensitivity is left out of the Discussion, but that the breast cancer data is only published deep in the Supplementary materials. There, one is surprised to find14 breast cancers identified through routine screening (exam and mammography) and ZERO were identified with the MCED blood test. Repeat: 14 breast cancers were identified per the usual screening methodology, and not a single one of the 14 had a positive Galleri blood test. There were 5 breast cancer recurrences detected, but that’s not what a screening blood test is for. The medical oncologists might be able to use the information gleaned from recurrences, but I’m totally focused on asymptomatic screening. We’ve lost our way when marketing departments tell us that MCED testing has 99% Specificity (technically true), yet the practical ability of the test to find early, occult cancer was ZERO in the PATHFINDER STUDY.
I’ve barely scratched the surface here, but try to imagine explaining all of the above to a patient or clinician. I didn’t even get into the other performance characteristics (added as Addendum) — 1) Accuracy (a combo of specificity and sensitivity), 2) Cancer Detection Rate (CDRs), or 3) Number Needed to Screen to Save One Life (NNS). Multiply the 7 items by 50 to get 350 data points from which to draw an informed consent. If you’re trying to tackle undiagnosed cancer, the Galleri test (and other MCEDs) are going to help more in those cancers that are not routinely screened. As for breast cancer, it’s not going to help at all unless major improvements are made.
“One Ring to Rule Them All” is a famous phrase from JRR Tolkein and his fantasty world. But when it comes to breast cancer, that’s what we need — a single test that generates Sensitivity of 80-90%, with Specificity of 90%, and good results in the remaining performance characteristics.
So, here’s a breast cancer screening approach for the future:
3D mammography, and if 3D is negative, proceed to a blood test designed for breast cancer. If the blood test is positive, then proceed to contrast-enhance mammography, breast MRI, or molecular imaging. A positive blood test turns screening into a diagnostic work up. The most logical place to begin blood test screening is with women who have dense breast tissue since the miss rate is much higher for them. And, so far, the blood tests being developed are not affected by density levels.
When used primarily for women with dense tissue on negative mammography, then we are talking about a cohort of women in whom, by definition, have already demonstrated ZERO Sensitivity (mammos negative), yet we know that if we proceed with MRI, we will find 10 to 20 occult cancers in the general population. With this knowledge, one could move directly to MRI or the other contrast-enhanced technologies. But this is where the blood test comes in — sparing many from having an MRI et al while still finding most of the hidden cancers.
You can see now why I look at my watch whenever I’m asked what I think about MCED blood testing. The Powerpoint version lasts an hour.
END (almost)
A streamlined view of performance characteristics, as pertains to cancer screening:
The BUILDING BLOCKS for performance are true-positives (TP), false-positives (FP), true-negatives (TN), and false-negatives (FN). These 4 outcomes (expressed in numbers) are then inserted to 4 basic formulas that each have a specific message for the clinician. In lay terms for blood testing (rather than confusing formulas), these 4 primary performance characteristics are:
SENSITIVITY — Given a population with cancers, sensitivity is the percentage of those cancers that will be identified by the screening test in question.
SPECIFICITY — Given a population that includes patients without cancer, what is the percentage that will be correctly identified as negative.
NEGATIVE PREDICTIVE VALUE (NPV) — Given a negative result on the test in question, what is the probability that cancer is absent in such a patient.
POSITIVE PREDICTIVE VALUE (PPV) — Given a positive result on the test in question, what is the probability that cancer will be identified.
Note: in the top 2 characteristics, it is the patients, some of whom harbor cancer, who make up the premise. And, in the bottom 2 characteristics (NPV and PPV), it is the test result that forms the premise. Without going into too much detail, there are some relationships among the Big 4. For instance, a very high Sensitivity will render a test to have a very good NPV, helping to rule out cancer. And, high Specificity will render a test with a good PPV helping to “rule in” cancer.
Ideally, we want all 4 characteristics to be good, or even excellent. That said, there is frequently a teeter-totter relationship between Sensitivity and Specificity. As one tweaks the experimental method to elevate Sensitivity, ground will be lost by worsening Specificity. Or, vice versa. So, in the article reviewed above, you can understand my concern that Sensitivity seems to be neglected except for the online Tables and Figures where one can do their own math.
We’re not done yet. There are also “DERIVATIVE performance characteristics” that are unique expressions dependent upon the Big 4 above. Far and away, the most confusing is “ACCURACY.” You’d think Accuracy meant Accuracy, but it doesn’t. This favorite number is much beloved by the marketing departments where great weaknesses can be hidden. Accuracy, to the research clinician, is a formula that gives equal weight to both Sensitivity and Specificity, combining them into one number. Just for kicks, here’s the formula: TP + TN, divided by TP + TN + FP + FN. When expressed as a graph instead of a number, it is called the Area Under the ROC Curve, often expressed as AUC.
The MCED tests are perfect to demonstrate how bad results can be hidden. If Specificity is 99%, and Sensitivity is an unacceptable 50%, then Accuracy is 75%. Specificity is dragged down, Sensitivity is inflated, but hey, 75% looks fairly good in detecting cancer. But this example doesn’t detect cancer at a 75% rate…it detects it at a 50% rate. Accuracy is not the same as the ability to detect cancer (Sensitivity is the key for detecting cancer). But those who read an article with Accuracy as an endpoint might fail to realize that Sensitivity could be good, bad, or ugly, you don’t know unless Sensitivity has been calculated separately. Sensitivity plus Specificity equals Accuracy. While Accuracy might be helpful in comparing two or more different tests, it has huge abuse potential when it comes to the promotional literature for the MCED tests. In other words, the 99% Specificity is so powerful it easily covers low Sensitivity levels that are quite concerning. This confusion about “performance characteristics” is one of the many reasons that class action suits have arisen with the MCED tests as being “over-hyped” and “over-promoted.”
Another DERIVATIVE performance characteristic based entirely on Sensitivity is the CDR — the CANCER DETECTION RATE. By convention, this is expressed in terms of cancers identified per 1,000 screened patients. There are many variables here, from risk status, to age, to the screening interval, to the modality used for screening. But it is very helpful in comparing different cohorts of patients or comparing different technologies.
Although there are many DERIVATIVE characteristics used to judge the effect of screening, the last one I’ll mention here is “NUMBER NEEDED TO SCREEN to save one life” (NNS). For example, NNS might be 1,500 for screening mammograms in the general population, though NNS will be paradoxically lower for high-risk patients (Example: NNS = “only” 900 mammograms rather than 1,500), and higher for low-risk women (NNS = 2,000 mammograms, rather than 1,500). This derivative is subject to manipulation of the data, as it is not always clear as to when a life has been saved through early detection.
The Big 4 plus derivatives are an integral part of medical practice, from screening to diagnosis to treatment. Yet, without a working knowledge, “industry” can pull the wool over the clinician’s eyes. Even after all these years, in the collaborative group where I work, we sometimes obsess over the various published numbers in a conference setting trying to make sure things are in order. It’s not for the faint of heart.
In conclusion, would I ever order a MCED test like Galleri (there are several), right now as the technology stands? Yes. There are certain germline mutations (born with DNA variants that are present in every cell in the body) that elevate cancer risk throughout the body, many cancer types. The most powerful of the multi-cancer genes is TP53, known as Li Fraumeni Syndrome. The cancers in these families can occur anywhere, in any organ, and the current guidelines include “total body MRI.” But not all patients can have an MRI, or they might have trouble getting insurance to pay for frequent MRIs, or might want assurance more frequently than current guidelines. The same goes for families who test negative on their germline mutations, but the families are loaded with all types of cancer. In other words, they are undiagnosed even though we suspect germline mutations that might only pertain to the one family. For these patients, I think serial MCED testing will prove helpful.
We’re going to be hearing a lot about MCED testing or “liquid biopsy” or ctDNA over the next decade, and additional confusion will be encountered as the tests are potentially useful for reasons other than screening (identifying recurrent disease, guiding therapy, long-term follow-up, etc.) Medical oncologists will be following the events from their standpoint, but I wonder which specialty will take charge of these many options when used for screening?
END








