Luddites and Butterflies

NEJM

In the March 15, 2018, issue of The New England Journal of Medicine, an editorial from the Stanford School of Medicine (Danton Char, MD et al) offers a cautionary note about ethical concerns that will accompany artificial intelligence in medicine. Key points include:

–Data used to create algorithms can contain bias, with skewing of results depending on the motives of the programmers and whoever pays them.

–Physicians must understand how algorithms are created and how these models function in order to guard against becoming overly dependent upon them.

–Big Data becomes part of “collective knowledge” and might be used without regard for clinical experience and the human aspect of patient care.

–Machine-learning-based clinical guidelines might introduce a third-party “actor” into the doctor-patient relationship, challenging the dynamics of responsibility in the relationship.

 

As a bona fide Luddite, I’m relieved to see others are concerned about the bold promises of artificial intelligence (AI) in medicine. You are probably familiar with the Luddites, a group of English textile workers who, from 1811 to 1816, organized a rebellion whereby the newfangled weaving machines were destroyed as a form of protest. Eventually, mill owners began shooting protestors, and the rebellion was squashed with military might.

 

Over the years, the term Luddite has been applied to any anti-progress position. But with the advent of the computer age, the moniker has enjoyed resurgence and is commonly used in a pejorative sense to reference those of us who are skeptical of Big Data, data mining, machine learning, AI and anything else that is potentially dehumanizing and threatens to manipulate and change the world beyond all recognition. Well, maybe I’ve slipped into hyperbole here, but you get the picture.

 

So, you might be surprised to learn that this Luddite-author’s name is on a recent paper with this Luddite-unfriendly title: “Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm,” published in Phys Med Biol, Jan. 30, 2018 (doi: 10. 1088/1361-6560/aaa1ca.) with co-authors M. Heidari, AZ Khuzani, G. Danala, S. Mirniaharikandehei, Y Qiu, H Liu, and B Zheng. Furthermore, I admit without shame that I am in the midst of reviewing 4,000 mammograms where the images have been converted into “risk score” numbers at the pleasure of algorithms, all through an R-01 NCI grant that will last through June 2020.

 

The term “locality preserving projection algorithm” in our title above prompted me to visit Wikipedia to see if something had changed in the world of algorithms. I recommend you do the same. This is not your grandmother’s algorithm. The complexity is daunting, and I don’t think many of us clinicians are going to have a clue as to how these programmed algorithms work, in contrast to the admonition from the Stanford School of Medicine. (My contribution to the article, by the way, was strictly clinical, accepting the “numberized” mammograms at face value.)

 

And what exactly is machine learning? In simple terms (my only refuge) machine learning involves the construction of algorithms that can “learn” without being explicitly programmed and can thus make predictions based on incoming data.

 

My introduction to the term “machine learning” was entirely negative, as any good Luddite is proud to point out. It came a few years back when I was wearing my other hat – that of a novelist. In 2001, I had the good fortune of a successful novel, which you can read about on this web site or elsewhere. Amazon Customer Reviews were a new concept at the time, and it was quite unsettling that anyone could post anything about my work while I had no recourse whatsoever. Harsh criticism is difficult to swallow for the novelist who has spent many years on one manuscript, given that all responsibility is singular, that is, there is only one set of shoulders. If you rate a movie as bad, a myriad of individuals take responsibility, but a book – all the poundage settles on the author’s back.

 

But I was lucky in that regard. I had nearly all 5-star reviews, and a 4.5 Star average. Yes, one Angry Bird gave me a puny 1-star, adding the word “horrifying,” as my novel was apparently the worst thing he or she had ever read. Such an outlier, though, doesn’t carry much weight (or shouldn’t), and it barely affected the 4.5 average. The book was a bestseller, it stayed in film option for 15 years with 3 different groups, and I made 3 trips to Beverly Hills with each new option. No movie (yet), but it was all fun and successful well beyond my expectations.

 

After the hoopla died down, I re-visited the Amazon web site many years later, gearing up for the release of my new book, Killing Albert Berch, a true crime/memoir. Newly released books prompt backlist purchases, and I thought there might be renewed interest in my novels, Flatbellies and University Boulevard. I wanted to make sure copies were still available through online retailers. Barnes & Noble Online had both my novels with 100% 5-star reviews, and so far, Killing Albert Berch has had 100% 5-star reviews.

 

But when I went to the Amazon page for Flatbellies (by far, my most popular work), the ranking had dropped to 3.7 stars, the lowest of any of my writings. I assumed some mediocre reviews levied during the intervening years had prompted the drop, but there were no recent reviews. They were the same reviews I had seen years ago. The highly offended One-Star reviewer was still there, along with a single 3-star and a single 4- star. Yet, 11 other reviewers had given the book 5 stars. One doesn’t have to do the math in order to do the math. The average should be well above 3.7.

 

Then I read how the Amazon rating system had changed to machine learning from the old-fashioned mathematical law of averages that has worked so well for several thousand years. And this is how Amazon’s machine learning works, based on 3 parameters (with my comments in parentheses).

–The Age of a Review (Do opinions become wine or vinegar with age?)

–The “Helpfulness” votes by customers, created when you click on the button that asks, “Was this review helpful?” (Okay-y-y-y).

–Whether or not a review is accompanied by a “verified purchase” (In other words, did the reviewer buy their book on Amazon? You can fill in the blanks here.)

 

Mystery remained, however. The 1-Star Angry Bird could not have dragged my rating down as a single reviewer, as there was only one person who found the review helpful and this purchase was not verified. As it turns out, the 3-star review carried more dragging weight, as this was a verified purchase and a powerful component of the machine-learned algorithm. Still, it’s bizarre that the novel has a 3.7 star ranking, with only 2 reviews below that level, and 12 reviews well above that level. A mathematical average would be 4.5, the median would be 5.0, the mode would be 5.0, and the Olympic approach – drop the highest and lowest, then average – would have yielded 4.75.

 

But the Amazon machine knows better – it renders a lackluster 3.7. What does it matter? Not much when it comes to this particular situation, a novel so many years removed from its heyday. But these masterminds and their ilk are gearing up to dictate the future of medicine.

 

What will you do when AI tells your patient that she has no business going to the doctor when her “score” didn’t qualify her for the trip? Or, that her cancer surgery won’t be allowed because AI has determined that risks exceed benefits, in that her tumor biology scored a number too low to worry about? Or, that your patient should not be screened at your breast center because it has been determined that your facility deserves only a 1-Star ranking, and is, indeed, a “horrifying” location for mammography? And consider, too, that it will be the third-party payors, including our government, which will be most eager to program the algorithms.

 

We’re already getting obsessed with number-based guidelines, paving the way for the AI patrols to enter the scene with their truthiness based on positioning the data somewhere along the shadowy “training-validation-test” continuum. The next step will be relinquishing our brains at the altar of AI, simply because the stupefying complexity must warrant some form of worship.

 

But in contrast to all of the posturing above, here’s a Luddite dream-quote from Bloomberg.com 2016-11-10, in an article addressing machine learning: “Effective machine learning is difficult because finding patterns is hard and often not enough training data is available; as a result, machine-learning programs often fail to deliver.”

 

You undoubtedly have heard of the term “butterfly effect,” and thank goodness someone gave it that name because the ten-dollar word that preceded it was “concatenation.” Hollywood scriptwriters have had a field day with this concept, applying it to every plot ever conceived about time travel (although mathematical philosophers have dallied with the concept at least since 1800). But the story behind the name “butterfly effect” is a bit concerning when it comes to algorithm-driven AI and machine learning.

 

Edward Lorenz was a meteorologist and mathematician who was using computer models early on to predict the weather. In 1961, he decided to re-calculate outcomes from a prior study. Given the slow speed of computers at the time, he decided to start in the middle of the project as a shortcut, rather than go back to the beginning. He typed in the initial condition as 0.506 from the earlier printout, rather than the complete and precise value of 0.506127. (Honestly, how much difference could 0.000127 possibly make?) He went down the hall for a cup of coffee and returned an hour later, during which time the computer had simulated two months of weather. The conditions were completely different than the first time around. He thought of every possible malfunction before finally realizing that the tiny alteration in data entry had magnified exponentially over time, until the difference “dominated the solution.”

 

In 1963, Lorenz published his seminal work as “Deterministic Nonperiodic Flow,” and in that first iteration, he used the metaphor of the “flap of a sea gull’s wing” to note the powerful impact based on chaos theory. In 1972, he failed to provide a title for a talk that he was giving at the American Association for Advancement of Science, and a colleague suggested “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” And a new phrase – butterfly effect – was added to our lexicon.

 

Note the distinction between a minor tweak in a chaotic universe versus direct causation. The butterfly’s wings do not cause the tornado, but the end result is dramatically altered with the tiniest of data input. So now, my question is this – if such tiny alterations early in algorithmic progression can cause such major differences in outcomes, do we really want to put the future of medicine on the wings of butterflies?

 

Oh, one other thing about the Luddites. As is frequently the case, convenient and capsulized summaries have distorted the original truth. It turns out that Luddites were not anti-machinery at all. Many were highly skilled machine operators in the textile industry, and furthermore, the technology was not new. In truth, Luddites were opposed to the way in which management was using the machines in what was called “a fraudulent and deceitful manner.” And, they were opposed to poor working conditions in general. In short, they wanted machines that made high quality goods, and they wanted the operators of those machines to be well-trained and well-paid. Destruction of machines during the Industrial Revolution was a common method of general protest, established well before the Luddite rebellion and should in no way imply that the Luddites were anti-machinery or anti-progress.

 

As is often the case, it’s not the technology that’s the problem – it’s the users.

 

 

“Most Breast Cancer Screening MRIs are Unnecessary According to Guidelines,” …or is it Civil Disobedience?

MRI discovered cancer

A 1.0cm Invasive Ductal Carcinoma, Grade 3, discovered on screening MRI.  Even in retrospect, mammography and ultrasound were both negative.

 

In a recent issue of the Journal of General Internal Medicine (Hill et al, 2018; 33:275-283), five regional imaging registries were analyzed, and more than 80% of patients undergoing screening MRI did not meet guidelines. The medical journalists then stretched the issue by converting “not meeting guidelines” into “unnecessary,” a much more provocative word that equates this violation to the knife-happy surgeon.

My response is two-fold: 1) Kudos to those using rational thought as opposed irrational guidelines (civil disobedience), and 2) How are they getting insurance coverage for their patients? (In OKC, we are chained to pre-authorizations since the “cash option” is still too high for truly low cost MRI screening).

After the oddly-conceived guidelines for screening MRI were introduced in 2007 by the American Cancer Society, few addressed the flaws. Instead, other organizations like the NCCN patterned similar recommendations, maintaining nearly all of the original flaws. And as one would predict, guidelines were converted over time to canon, whereupon all those who didn’t believe were labeled as heretical.

We have lost our way. And those who keep claiming that there is “no data” to support screening women at lower levels of risk have simply chosen not to look at available data. And certainly, critics don’t think through MRI performance characteristics applied to various sub-groups based on disease prevalence and incidence, where outcomes can be accurately predicted.

One could start with the 15% lifetime risk threshold (Claus model) utilized for entry to The Netherlands study (Dutch MRISC), by far, the largest of the 7 international MRI screening trials that led to the 2007 guidelines. In that study, both the modest risk patients and high risk patients identified twice as many invasive cancers as mammography.

But far more incriminating as to the weakness of our current “20% lifetime” guidelines is the work of Dr. Christiane Kuhl in Germany where she has screened a general population risk cohort (under 15% lifetime) with MRI (Kuhl CK et al. Supplemental Breast MR Imaging Screening of Women With Average Risk of Breast Cancer. Radiology 2017; 283:361-370).

Participants in Dr. Kuhl’s study are “cleared” with normal mammograms and, for most, screening ultrasound as well. Her ongoing series of more than 2,000 women is now larger than any of the international MRI screening trials based on high risk protocols. After excluding those cancers found on mammography, ultrasound, and clinical exam, MRI was still able to generate a cancer detection rate (CDR) of 15.5%, or 15.5/1,000, roughly triple the mammographic yield for cancer detection. And, if one looks only at the prevalence (first) MRI screen, then the CDR was 2.26% or 22.6/1,000, comparable to what one finds in the MRI screening studies performed in high risk population. No cancers were found with mammography alone, or ultrasound alone. And what about the old bugaboo of mammographic screening – the 25% interval cancer rate? Zero in Dr. Kuhl’s study. That’s right. The “inherent” problem of screening, that is, the more aggressive tumors that pop up in between screens…well, they are relegated to historical interest only in this average risk population.

What should be our goal for breast cancer screening with MRI? If we are shooting for cost-effective yields, we’re out of luck. Studies of cost-effectiveness for MRI screening require 3% CDRs at a minimum, and there is no possible way to maintain that number over the long-term. Who has a 100% risk of breast cancer over the next 33 years (3% X 33 years)? No one. Even BRCA-positivity only generates long-term risks in the 2%/year range, which is what you will find on long-term incidence screens with MRI. And for the women who barely squeeze above 20% risk for MRI justification, long-term screening will yield under 1%/year, nowhere close to being cost-effectiveness. So, unless we can get the cost of screening MRI down to $300 or so (or $600 and perform the MRI biennially) forget cost-effectiveness. Breast MRI will never be cost-effective for long-term screening if we rely on risk levels alone.

So, with cost-effectiveness currently impossible with MRI, are there alternative goals? What is the best way to offer MRI to the correct patients, opening up the possibility to all women, but without having to screen all women with MRI? How can we best recommend MRI to get as close to cost-effectiveness as possible without excluding the majority of candidates, as we currently do with our guidelines?

Let me mention an idea that I proposed during that short interval after the clinical introduction of breast MRI and before guidelines were handed down from on high. My breast radiology counterpart (Dr. Rebecca Stough) and I developed a simple point system that obviates all the current problems with our guidelines, based on that old principle known as “common sense,” or in the world of philosophy – rational thought. Two parameters were used – breast density and risk. We had 3 levels of risk (baseline, high and very high, taken from a prior publication of a working group – Am J Surg 2004; 187: 349-362), and this was combined with the 4 levels of breast density as commonly defined by the BIRADS system, yielding a single score. Importantly, this score defined the interval as well – annual, biennial or triennial MRI – rather than the bizarre all-or-nothing annual MRI (such that 21% lifetime risk tells us to perform MRI annually, but for 19%, no MRI at all, or else it will be “unnecessary”).

We published our first 6 MRI cancer discoveries (Semin Breast Dis 2008; 11:67-75) using this method, but we were too late by a few months, our article coming out on the heels of the American Cancer Society 2007 guidelines. Of those 6 cancers, using the 3 mathematical models suggested by the ACS, 0/6 would have been identified with the Gail, 0/6 using Claus, and only 3/6 using Tyrer-Cuzick. Clearly, we were targeting a different population for MRI success that was being missed by the ACS guidelines. When we updated our experience in 2014 with 33 MRI discoveries (The Breast Journal 2014; 20:192-197) , the Gail model would have selected only 9/33, Claus 1/33, and Tyrer-Cuzick 12/33. Combining all 3 models, and using the model that calculated the highest risk, only 16 of 33 cancers would have been identified had we followed ACS guidelines.

Today (after 50 MRI-detected cancers), with our spirit crushed by insurers that are enamored by the “magical” 20% lifetime risk, nearly all of our patients officially qualify for MRI as we were forced to surrender to the guidelines-turned-canon. At the same time, our total number of MRI-detected cancers has dropped precipitously as MRI is closed to women who don’t meet the “risk-only” guidelines. Where are the women we used to diagnose? Their discoveries come later, perhaps much later, in the form of mammographic screening with its attendant 25% interval cancer rate. Or, they have mammographically invisible tumors and are discovered well on down the pike with palpable cancers and node positivity rates far in excess of our node-negative MRI discoveries.

What did our formula do to identify so many MRI-discovered cancers missed by current guidelines? It took into account a major variable – every bit as powerful as risk – that is, a feature that should be considered in all women – breast density. This is in sharp contrast to the ACS guidelines that relegated density to some sort of quasi-risk issue in the “needs more research” category, treating it as an isolated issue. It is not isolated. It is an integral feature of every woman considering auxiliary imaging. By tossing it into the 15-19% risk grouping in the ACS guidelines, it was turned into “just another risk factor.”

Again, what is our goal in proper patient selection to improve efficiency? We are trying to identify those patients who, on the day of the planned MRI, have a heightened risk for developing breast cancer (best reflected by short-term risk, not lifetime), AND – with equal importance – are more likely to have that cancer missed by mammography (density level).

Why is the Level D patient denied MRI without additional risk factors?   They certainly anticipated this issue in The Netherlands where the Level D Trial is underway, using density as the SOLE entry criteria for breast MRI. And does the Level A patient with 21% risk really need annual MRI? Come on. Admit it. Our current guidelines are in a bad state of affairs, predicated on so-called empirical data that has nothing to do with true empiricism.

And when it comes to pseudo-empiricism, how did we get locked into lifetime risks anyway? Well, it was lifetime risks that were mostly used in the MRI screening trials, so that defect simply got transferred into the 2007 guidelines, a la canon.

But if one wants to use a surrogate for higher CDRs on screening MRI, why would it be important to include the breast cancer risk 20, 30, 40 or even 50 years from now? Wouldn’t you really want to know the short-term risk, calculated let’s say, for the next 5 years? Does it make any difference in patient selection whether you use short-term or lifetime? Actually, it makes an enormous difference.

For starters:

–Lifetime risks are not as accurate as short-term risks, often extrapolations of known risk beyond what has ever been observed.

–Lifetime risks have a wide range of calculations, depending on the models used

–Risk models might be accurate for populations, but are poor predictors at the individual level.

–The most widely utilized model for MRI screening (Tyrer-Cuzick) has the narrowest application, that is, white Western European ethnicity only.

–Picking a number like 20% turns risk assessment into gamesmanship to use the highest number possible for insurance coverage (even it means using inappropriate models)

–Lifetime risk is actually remaining lifetime risk which decreases over time, while at the same time, short-term incidence increases. By ignoring incidence as distinct from long-term risk, lifetime risks are highly age-discriminatory. Younger women qualify for MRI, even though an older patient with the same risks won’t qualify in spite of the fact the older patient might be triple the risk of the younger patient in the short-term. Another way age discrimination manifests itself is the patient who barely qualifies for MRI at age 45 with “21%” will no longer qualify at age 55 (19%) when she is at peak short-term probability of breast cancer. Remaining lifetime risks are deeply flawed, yet have become ingrained as the only option.

–Another indirect way lifetime risks discriminate against older patients is the “needs more research” category that includes ADH, ALH and LCIS. The young patient with these findings on biopsy will bypass the “needs more research” designation because the Gail and/or the Tyrer-Cuzick model will elevate them to a “greater than 20%” status. In contrast, the 65 year-old patient with ADH won’t make it without other risk factors, even though her chance of getting breast cancer is roughly the same at 1%/year.

–The (remaining) lifetime risk approach being used in current guidelines gives us a very wide range, from 20% to 85% lifetime, and everyone in that group is advised to consider annual MRI. This is analogous to the original BIRADS 4 lesion on mammography where the risk of malignancy was 3% to 90%. Once the clinical futility of such a range was recognized, the system was modified to 4a, 4b and 4c where there was some clinical help in PPV estimates. And so it is with our very wide range of risk for MRI, with the lower end of the spectrum clustering most patients around 20-25% where they are given the same recommendations as the patients at 85% risk.

 

The unfortunate part of all of this is that our simple point system proposed at Mercy-OKC 10 years ago erases all of these issues, yet will never make it to prime time. It opens up the possibility of MRI screening to all women. It takes away age discrimination totally. It uses categorical risk levels rather than the bizarre “20% lifetime” and all its attendant problems. And incidentally, it was created without any data whatsoever, simply applying performance characteristics of MRI in sub-groups of patients where 1) disease prevalence and incidence is known, coupled to 2) the probability that a cancer will be missed on mammography.

We were not the only ones thinking along these lines. If you want to see how rational thought, unfettered by pseudo-empiricism, can create a very nice model for selecting patients for auxiliary imaging, review the entry requirements for ACRIN 6666. This was a study primarily for proving the benefit of screening ultrasound. However, one-fourth of the participants underwent a screening MRI at the end of the study. This single MRI found more cancers than 3 screening sessions at 0, 12 and 24 months using double-modality screening. It’s an incredible testimony to the power of MRI over mammography and ultrasound combined.

But I digress. My purpose here is to show how ACRIN 6666 used well-informed rational thought to design accrual criteria that are very well-conceived. Granted, mathematical models were used, but included a short-term risk calculation that, in effect, erases the age discrimination problems.

In its simplest form: ACRIN 6666 included women with mammographic density, plus one additional risk factor.     

In its actual form:    Five levels of BREAST DENSITY were used (not correlating exactly with BIRADS), such that one could have relatively low density overall if there was significant density in at least one quadrant – a qualitative aspect in addition to the usual quantitative attempts to define density. One can see the sophistication used here, in that the goal is to identify those women most likely to have a cancer missed on mammography, rather than the simplified dense vs. non-dense.

RISK FACTORS for accrual:

Mutation in BRCA1 or 2

History of prior chest, mediastinal or axillary radiation

ADH/ALH/LCIS/atypical papilloma

Personal history of breast cancer treated with conservation (over 50% of trial participants, btw, for those who believe there is no data to support MRI after breast conservation)

Lifetime risk greater than, or equal to 25% (Gail or Claus)

5-year risk using the Gail model – greater than or equal to 2.5% (or 0.5%/year)

5-year risk greater than, or equal to 1.7% + extremely dense breasts (note the interplay of risk and density — the greater the density, the lower the required risk.  More than any other accrual feature, this adjustment reflects the principle I’m espousing in this article — it’s the interplay of risk and density that should be used currently for auxiliary imaging selection.)

 

Results of ACRIN 6666 for the total cohort (n=2,321) undergoing both mammography and ultrasound screening at 0, 12 and 24 months:

Sensitivity of mammography = 52% (55% invasive)

Sensitivity of ultrasound = 52% (94% invasive)

Sensitivity of both modalities together = 76% (not that great when you think about it)

But for the 612 patients who opted for a single MRI at the end of the study:

The combined sensitivity of both mammography and ultrasound = 44% (ouch!)

That is, 7/16 cancers were detected with mammo + US performed at 0, 12 & 24 months.  Then, at the study’s end, MRI detected an additional 9 cancers (8/9 MRI-detected cancers were invasive with a mean size of 0.85cm, all node-negative).

As the study was designed with the primary interest being ultrasound’s potential, it was counted as a victory for the concept of auxiliary screening using that modality.  However, many saw it as an incidental victory — by a wide margin — for MRI screening.

Then, the Principal Investigator of ACRIN 6666 decided that perhaps she should undergo an MRI herself, given her family history. A 1.0cm mammographically invisible invasive cancer was identified on MRI, prompting this PI to form a new breast density organization to increase awareness about the hazard of breast density.

The answer to proper patient selection is not refinement of risk assessment. Add SNPs to the Tyrer-Cuzick model if you must in order to get insurance coverage for the MRI, but we shouldn’t deceive ourselves that this approach is the long-term solution for MRI screening. Using risks alone can never reach the state of cost-effectiveness (unless costs drop dramatically and intervals longer than annual are validated).

No, the answer for cost-effectiveness is R&D with the sole purpose being to develop a post-mammography tool (e.g., a blood test) that will properly select patients for screening MRI (or MBI, or contrast-enhanced mammography). Such a tool would convert the patient from screening to diagnostic status, and the purpose of the adjunct imaging would be to confirm cancer (yes or no), and if yes, then to locate the cancer.

This approach would obviate the need for asymptomatic screening using these expensive modalities at pre-determined intervals in only a select few, as is currently done. Only those patients at extremely high risk, such as BRCA-positivity, would maintain fixed intervals (unless such a post-mammography tool were found that was nearly flawless). Such an approach would be all-inclusive and would largely do away with fixed interval screening where CDRs drop to 2% or less once the steady state is reached. CDRs obtained through a post-mammography tool would remain high as they would always be based on disease prevalence, which for breast cancer, is roughly 3-fold disease incidence.

To that end, I’m involved with 3 approaches (and very little else): 

–Blood testing

–Ultra-CAD analysis of normal mammograms (currently under an R-01 grant)

–Artificial Intelligence applied to non-contrast breast MRI, with gadolinium only if AI signals a problem.

 

Most of my experience is with #1, starting in 1993. This is a research agenda once considered at the lunatic fringe (Why are you doing that? We already have mammography!). Today, many groups are after what is now called a “liquid biopsy,” which was originally applied to tumor DNA fragments in the circulation, but is now applied to any blood-based test for cancer.

As prospects and performance get better and better, with proprietary implications, I work with one group at a time, most recently aligning with Syantra, Inc., working in laboratories at the University of Calgary. A prospective study is planned for Canada and the U.K., with possible sites in the U.S. If retrospective results can be duplicated in the prospective trial, we will be looking at clinical utility that could make MRI screening cost-effective.

Our goal doesn’t have to be perfection, no matter what the tool. With peak MRI cancer yields at “below 4%,” even in the highest risk patients (and lower for incidence screens), a CDR of 5% will be a wild success as the cost-effective threshold will be crossed. If this sounds like a low bar, recall that the CDR for screening mammography in a mixed population of prevalence and incidence screens hovers around 5/1,000 or 0.5%. Thus, our target CDR of 5% must be 10-fold more effective than screening mammography, a very high bar indeed. But then imagine the ease with which cost-effectiveness will be achieved with CDRs of 10% or even 20%.

One thing for sure, we’re going nowhere if we conform to the outdated MRI guidelines under which we are currently mandated, lest we be accused of doing something “unnecessary.” So, here’s to Civil Disobedience that, hopefully, will lead to the revolution in how we properly select patients for multi-modality breast cancer screening.

Finalist, Oklahoma Book Awards

Killing Albert Berch is sponging up the time now, so breast cancer topics are once again on hold.  Had a very nice feature article by Ken Raymond in the Book Review section of The Oklahoman last week. http://newsok.com/prominent-oklahoma-city-doctor-pens-true-crime-tale/article/5584295

And, the book moved into the #8 position on the non-fiction bestseller list in Oklahoma.

Then, today (March 3, 2018), I received notice by snail mail that Killing Albert Berch was going to be a Finalist for the Oklahoma Book Award in the non-fiction category for books published in 2017.  Tough year to win, however.  Specifically, Killers of the Flower Moon is the #1 true crime national bestseller right now, and it is a Finalist in the same category as my book.  Oh, well, as they say, “It was an honor to be nominated.”  (winners to be announced April 7)

A Fisher Among Men — Feel the Bern

Fisher

The Editor-in-Chief of a popular breast cancer journal was recently reviewing my commentary in response to a new paradigm proposed to explain the overarching biology of breast cancer. In my invited comments about the proposal, I stated that this new theory addressing breast cancer biology is really just an extension of Fisher Theory. In his suggestions to me, the Editor asked that I explain Fisher Theory, as many readers today might not be familiar with its tenets that include dormant cell theory, common vascular channel theory, host:tumor relationships, and so forth.

Wow. I hadn’t thought about it. He was correct. New clinicians getting on board the train today don’t really need to think about such things. Yet, once upon a time, every respectable breast cancer conference included Dr. Fisher discussing his theory and how he proved its value.

Every time a woman undergoes lumpectomy for breast cancer, the procedure is based on the Fisher Theory of breast cancer biology that originated in the 1950s, largely confirmed in clinical trials in the 1970s and 1980s. Breast conservation emerged – not as a gradual downscaling of doing less and less over time – rather, in response to governing principles as to how breast cancer “worked.” That is, its biology. Thus, one could stop performing Halsted radicals one day, and move directly to lumpectomy the next. And this was based on a biologic paradigm, not a gradual de-escalation of surgical warfare.

Paradigm? The first time I ever heard the word – overworked and misused as it is today – was from Dr. Bernie Fisher at the podium at one of the early breast cancer meetings in the 1980s. In fact, he would sometimes stop to explain the word to audience members unfamiliar with its meaning. Composed of many principles, the broader paradigm of Fisher Theory can be condensed thusly – “Breast cancer is systemic at its inception.” That wording didn’t go over well, and was later more commonly expressed as, “Breast cancer is either local or systemic at its discovery, and is unlikely to progress from local to systemic during the clinical phase of therapy.” As a result, the primary implication is this: In Dr. Fisher’s own words, “Variations in locoregional management are unlikely to have a substantial impact on survival.”

In this paradigm, breast cancer interacts with the immunologic response of its host (the patient), and either the tumor wins or the host wins, but the winner is decided upon tumor removal, no matter what method is used to remove it. Think of it as “biologic predestination.” (my words, not Dr. Fisher’s)

Some distort Fisher Theory as, “Local control doesn’t matter” (not true…think about untreated breast cancer). Or, “Local recurrence rates are the same with mastectomy and conservation (not true – only the conservation group can have a recurrence within breast tissue). This confusion about “local recurrence” is a function – once again – of imprecise language and semantics. But consider this: in the early days of conservation, the NSABP called recurrences within the breast tissue “cosmetic failures.” Indeed, this dismissive terminology was utilized prior to proven equivalency when it came to survival as the endpoint.  When the outrage died down, given that this “cosmetic issue” resulted in (ahem) mastectomy, the NSABP gave it a new name – ipsilateral breast tumor recurrence, or IBTR. And, IBTR can only occur in the conservation group, so this is a parameter NOT equal to mastectomy. When “local recurrence” is defined as “chest wall and axilla,” then yes, the methods are equal. But this definition conveniently excludes the most likely event after breast conservation – in-breast recurrence at the lumpectomy site.

The IBTR (in-breast recurrence) is, granted, held harmless (unless you’re the one undergoing completion mastectomy several years after you thought you were done, and then you discover that the prior radiation is going to prevent you from having tissue expanders, and you will be reconstructed with autologous flaps that no one mentioned as a possibility when they were telling you how conservation and radiation was equal in all respects to mastectomy).

But when it comes to survival, the IBTR is thought to be a “marker” of increased tumor aggressiveness, not the cause of the newly calculated survival rate, which is not as good as the affected patient was originally told. The convincing evidence for association rather than causation lies in the fact that women in multiple clinical trials who were randomized to conservation have the same survival as mastectomy patients, even when the women with IBTRs followed by mastectomy remain in their assigned “lumpectomy” grouping (thanks to the intent-to-treat principle). If IBTRs were causing breast cancer deaths, then the mortality would have been greater in the conservation groups. But that wasn’t the case. Mortality was not affected by IBTRs. It’s only when the individual learns that her prognosis is dimmed somewhat that clinicians struggle to explain this.

Well, to the point. At the time of this writing, Bernard Fisher, MD is still alive, age 99, and would ordinarily have received the Nobel Prize for his courageous stand against rigid Halstedians, and for the prescience to introduce prospective randomized trials into the surgical management of cancer. More than any single individual in the history of breast cancer, he overhauled treatment strategies.

However, after one of his high-volume investigators in Canada falsified data, Dr. Fisher was dragged through the mud by the NCI, and worse, his alma mater, the University of Pittsburgh. In essence, he was demonized. Although he later sued and won a cash settlement and an apology from the University of Pittsburgh, some believe this is why the Nobel committee has egregiously refused to entertain his nomination. Dr. Fisher has won every other award that medicine has to offer, but it’s looking more and more like he won’t be going to Stockholm, given that the Nobel Prize is not awarded posthumously (unless you die in that sweet spot after the announcement but before the presentation).

Well, over the years, some cracks developed in Fisher Theory. Early detection of breast cancer through screening mammography shouldn’t have made a difference in the face of biologic predestination (nihilistic Fisher purists still claim early detection is a total illusion). And, tumor size should not have remained a prognostic indicator, yet look at Dr. Tabar’s 15mm data that suggests this is the watershed point, before which early detection makes its mark. And then, more recent data indicates that radiation after lumpectomy provides a slight survival advantage, something that would not be seen in pure Fisher Theory (although remember those italics I used at the beginning of this editorial — Dr. Fisher did qualify his statement that variations are unlikely to have a substantial impact on survival).

Whether they know it or not, most surgeons and radiologists and radiation therapists operate today under Spectrum Theory (outlined by radiation oncologist, Sam Hellman, MD), which states that, yes, some tumors follow Fisher Theory, yet other tumors form a separate biologic group, located in between “local” and “systemic.” Here, tumors have a Goldilocks biology wherein they progress from local to systemic during the clinical window of opportunity. This window was opened wider with the introduction of screening mammography. These are the cancers that lend themselves to early detection. They are also the cancers that should be carefully eradicated with local therapy. Unfortunately, we don’t know which ones they are, any more than we know which ones are biologically local (today, the “local only” tumors are referred to with the pejorative term – overdiagnosis).

These overarching biologies are not really spoken of much anymore, as everyone ignores the forest and focuses on the wonderful trees of precision medicine, with Luminal A/B, triple negatives, HER2 positives, etc. Yet, these “old” theories guide every step for every physician treating every breast cancer patient today.

So did Dr. Fisher get it right? Well, he was certainly close. And, he was much closer than Halsted whose theory of a predictable, orderly spread of breast cancer prompted the radical mastectomy. Given that Dr. Fisher stood up to an army of surgeons that were protecting the past and the memory of Halsted, he deserves the Nobel Prize for sheer courage, not to mention his tenacity for doing the right thing even when his alma mater betrayed him.

 

For an in-depth look at Fisher Theory and its implications for breast cancer screening, read Mammography and Early Breast Cancer Detection by Alan B. Hollingsworth, MD.  (The best parts are in the Endnotes.)  Order from Amazon or directly from the publisher using Links on this website.

SNPs Sneak Into Breast Cancer Risk Assessment

courtesy Marshall.edu — http://knowgenetics.org

 

They’ve been knocking at the door for nearly 20 years. Insulted as inadequate, misleading and of no clinical utility, the much-maligned SNPs stood outside waiting their turn until – finally – they were allowed to enter the world of breast cancer risk assessment.

What are they? SNPs (pronounced “Snips”) are arbitrarily defined as “single nucleotide polymorphisms” that occur in greater than 1% of the population. So, what would this same variation be if it occurs in less than 1%? Answer: a mutation.

Right off the bat, we have problems arising with ethnicity, in that one group’s SNP can be another group’s mutation. And that’s just for starters. (Note: I’m going to use the old terminology here that includes “polymorphism” and “mutation,” given that the recommendation to label everything in genetics as “variants” has not yet been completely adopted – see October 2017 blog.)

SNPs can be substitutions, deletions or insertions, and might occur in coding or non-coding areas. They might be silent (same amino acid in spite of the nucleotide change); they might yield a different amino acid; or might even generate a stop codon. They can be disease-causing, just like a mutation (ergo, the call to revise our terminology, as many reserve the term “polymorphism” for non-disease-causing alterations).

When studied in breast cancer risk assessment, however, SNPs are complements to traditional risk factors. And, individually, SNPs are weak. That is, the power of any single SNP is small, with RRs and ORs barely above 1.0 in most cases. So, the big question is: How do you combine SNPs in order to yield clinically important information? And the bigger question: How do those combinations of SNPs interact with established risk factors?

Although their presence has been known for many years under a variety of descriptors, the SNP acronym emerged in the mid-1990s, and then the concept of SNPs having a potential use in breast cancer risk assessment was on a roll by 2000. At the same time, home DNA kits were being developed using SNPs to determine if the user had an aptitude for playing the accordion or perhaps becoming an Olympic diver, prompting SNP-promoters to be labeled as “the used car salesmen of science.”

Early on, it was clear that no single SNP would do the trick for refining risk assessment. Groups of SNPs became the goal. But how do you define the proper groupings when you’re dealing with this tidbit: By 2001, a map of the human genome sequence variation was published in Nature (409:928-933) containing 1.42 million SNPs. As for those SNPs that pertain to breast cancer risk, in the Oct. 23, 2017 issue of Nature, a massive project added 65 new SNPs to those known to impact risk levels, bringing the total to approximately 180 breast-related SNPs.

In Oklahoma City, a biotech company was established to improve breast cancer risk assessment by using key SNPs that were clustered into triplets, drawn from those SNPs that might be involved in hormonal or carcinogenic pathways. In keeping with the buzz phrase, “personalized medicine,” this company actually did a nice job from the laboratory science standpoint, generating RRs for a very large number of specific triplet combos. While some SNP combos exceeded RR=3.0 (and a few hit RR=5.0), the number of volunteers with those specific combos were scarce, leaving very wide confidence intervals even when statistically significant. For the vast majority of patients, the degree of risk imparted by the SNP triplet was subclinical (that is, a RR less than 2.0).

But then comes a bigger problem – how do you blend SNPs into the standard mathematical models, when those models are already on shaky ground? In our current models, traditional risks are merged mathematically, with little regard for biologic interactions. Ideally, risks should be studied in couplets and triplets to better understand these interactions. Even then, however, studies can be conflicting.

Consider the interaction of atypical hyperplasia and family history. The Mayo Clinic data indicates that atypical hyperplasia is the primary driver for risk calculations, largely unaffected by family history. Yet, the original Page and DuPont data showed an impressive degree of synergism (9-fold risk) with atypical hyperplasia and a first-degree relative with breast cancer, a feature that made it into both the Gail and Tyrer-Cuzick models. Now, into this confusion, including a broad range of calculated risks depending on the model used, we add SNPs?

Well, the company in Oklahoma City added their SNP results to the Gail model exclusively. This created the unfortunate situation of bad conclusions if the Gail model was inappropriate in the first place, a not uncommon situation, yet rarely recognized by those unfamiliar with the construct and contraindications for the Gail. Patients with LCIS (not included in Gail) were told they were at normal risk because of their “gene test” results, but it was the Gail model that misinformed, when in fact, the SNP results had been neutral (RR=1.0). The same was true for extensive family histories positive for breast cancer, but only in second degree relatives. Results were not separated into the Gail component and the SNP component. Instead, for the patient and her doctor, there was a single “score” to reflect risk. And if the Gail was wrong, the score was wrong, and the counseling was wrong.

Although the company is no longer in business, we are still working through its aftermath in that many women in Oklahoma (and in certain locations nationwide) still believe they have completed “genetic testing” as a result of their SNP results years ago. Imagine their surprise to learn that they have had no analysis whatsoever of the cancer predisposition genes.

 We tend to forget that these mathematical models are at their best in predicting the number of cancers in a large cohort for clinical trials. Here, the Gail works nicely. At the individual level, though, the original Gail had a c-stat of .58 (or barely better than flipping a coin). To attach SNPs to the Gail is akin to putting lipstick on pigs if you are counseling individual patients.

Enter Tyrer-Cuzick. Based on western European whites, this model has a sharp restriction when it comes to using it to calculate risks in other ethnicities. And, it tends to overestimate probabilities when tissue risks are paired with family history. Its attractive features, of course, are the inclusion of 2nd and 3rd degree relatives, paternal history, reproductive risks, tissue risks and prior genetic testing results.

Version 6.0 of the Tyrer-Cuzick (T-C) model calculated lifetime risk through age 80, so when we moved to version 7.0, the calculations were slightly higher, in that 5 additional years were added through age 85.

And, now, with version 8.0, breast density has been added, which can work both ways, generating numbers higher or lower than v. 7.0. This introduces an element of gamesmanship, of course, as many of us are trying to reach the magic 20% lifetime risk threshold to justify screening with breast MRI (another subject entirely). Be that as it may, you’re going to get higher numbers by sticking with v. 7.0 if your patient has A or B density, but then using 8.0 if density levels are C or D. (I’m embarrassed to admit the hoops we jump through to overcome moth-eaten guidelines.)

If you ever toyed with the beta version of Tyrer-Cuzick 8.0, you might have discovered under TOOLS that you could apply SNP risk. That’s right. If you had access to SNP data, you could plug it into the T-C model on your own. The problem was obvious, at least in the U.S: “Where do you get this free-standing SNP data?” The commercial entities would only provide their “final score,” which was a Gail model foundation with SNPs thrown in. How could you arrive at an independent SNP risk? And that’s the reason I’m writing. Myriad Genetics has started to do this for you – automatically, at no cost, with caveats to follow – by combining SNP results with the Tyrer-Cuzick model, version 7.0., resulting in the riskScore®.

Here are the key points about this semi-major development:

First of all, when I heard this was going to happen, I made the mistake of thinking it would apply to all women who had tested negative for the predisposition genes. This would have been the same errant step that prior companies had done with SNPs. Instead, I was relieved to learn that, at first, the application of SNPs will only occur in a select group, and using Tyrer-Cuzick rather than the Gail.

Because the T-C model is based on women of European ancestry, Myriad will not report SNPs in the form of riskScore® for other ethnicities (until data emerges). Further inclusion/exclusion criteria: patients must be under age 85 (the T-C model does not include calculations for immortality), no personal history of breast cancer, no LCIS, no Atypical Hyperplasia or proliferative change on a prior biopsynot even a prior biopsy with unknown results. And, the riskScore® will not be calculated if a blood relative is known to carry a mutation in a breast cancer risk gene. Furthermore, breast density will not be included as an independent variable, as evidenced by the choice to go with T-C version 7.0 rather than 8.0.

So, is there anyone left? Yes, quite a few actually. And I have to hand it to Myriad for being so careful with riskScore® whereas prior companies have bulldozed their way into risk assessment with their SNPs. By offering the service at no charge (at least for now), Myriad will report the data only when the inclusion/exclusion criteria are met. But even when they don’t report SNP impact, they will be collecting data so that eventually the exclusions will peel away from the process.

Why so many exclusions? Well, what if a particular SNP is responsible for proliferative change on a biopsy? To count the SNP risk and the biopsy risk would be counting the same risk twice, artificially raising risk above what it should be (as if T-C doesn’t already trend toward higher numbers). Or, what if certain SNPs work together to cause breast density? Again, we would be combining different manifestations of the same risk factor.

In practice, when one receives a riskScore® that includes 80-plus markers (mostly SNPs), it is not readily apparent what the risk would have been using the Tyrer-Cuzick alone. However, it’s there on the report, so you can appreciate the impact of the SNPs by comparing the T-C alone to the riskScore® that includes the SNPs. The impact of the SNPs is usually going to be modest, especially when one considers the difference spread out over many remaining years. However, in this world of “20% lifetime risk” for determining insurance coverage of MRI screening, a few percentage points can make the difference.

More importantly, from the scientific standpoint, it will allow Myriad to accumulate the SNP data and how it interrelates with the known risk factors.

With the recent discovery of 65 new breast cancer-related SNPs (reported shortly after Myriad’s announcement), it begs the question: Where are we headed? Are we even going in the right direction? Is the data going to accumulate so fast that our risk calculations are obsolete within months after we document calculated risk?

And that brings me to the place where we older doctors congregate – back porch philosophy.

My risk assessment program predates the commonly used mathematical models. Starting one of the first such programs in the country, I had to rely on epidemiologic models that had not yet broken through to clinicians. For example, the Ottman tables were published in Lancet in 1983 by Ruth Ottman, Malcolm Pike, Mary-Claire King and Brian Henderson. With far more sophistication than the Gail model (albeit focused only on family history), taking into account “age at diagnosis” for relatives, bilaterality in affected relatives, and specific age intervals of the unaffected proband, one could counsel patients as to their absolute risk for breast cancer over a defined period of time.

And…we had the DuPont tables published in 1989 (Stat Med 1989; 8:641-651) that converted relative risks to absolute risks over a defined period of time, preferably using the 20-year table rather than “lifetime.” You could estimate an overall RR from available literature at the time, then select the patient’s age on the graph and follow the curves to the absolute risk for breast cancer.

Then came the Gail model where there was a subtle shift away from science and toward a mathematical pragmatism that few considered as awkward at the time. Risk factors were merged through math, not biology. Yes, there were articles that studied risk factors in couplets, and a few in triplets. But, of course, it was impossible to perform a defined cohort study on every possible combination of risks. Instead, we took a leap of faith toward mathematical models that were very good for predicting the number of cancer cases in a clinical trial (the overestimates largely neutralized the underestimates) but admittedly, at the individual level, the models were marginal, at best, when it came to predicting future breast cancer.

Originally, breast cancer risk assessment began as a fledgling science. Efforts were made to understand “Why?” Why did nulliparity increase risk? Why did a late first full-term pregnancy actually impart slightly greater risk than nulliparity? And on and on. If we could figure out the “why,” then we’d be that much closer to prevention strategies. We had theories to explain these things: estrogen “window” theory, the ovulatory theory, the cellular differentiation theory, circulating hormone theories, etc.

But theories – no more. And this is where we depart from science and move toward technology. And when the dust settles, breast cancer risk assessment might end up as pure technology – that is, we know it works, even though we don’t know how or why.

When the announcement was made about the 300-site study of SNPs (they never used the term SNPs in the article) wherein 65 new ones were proposed, I wrote a brief comment on the web site: www.breastcarenetwork.com that we were going to generate so much data that we were going to exceed our ability to process that data. Dr. Barry Rosen followed my comment with his own, implicating the need for IBM’s Watson to intervene. Dr. Rosen is absolutely correct. As more and more SNP data emerge, the blending with known risk factors will progress and prediction should improve. But just so we remember – if we don’t understand the “why,” then we will be at the mercy of technology rather than revelatory science. As our new master, I hope it treats us well.

 

 

 

 

 

 

 

The Shameless Art of Book Promotion

 

1962 selfie age almost 13

The economics of a book signing don’t add up. You can easily spend more on travel than you’ll receive in royalties from book sales (only the mega-sellers have expense-paid book tours). So, it begs the question, “Why do it?”

Primarily, for the “free” PR that comes with it, that is, the notice in the local newspaper, the signs at the bookstore prior to the actual signing, and the signed copies you leave behind. But the fact remains, it’s an old-fashioned vestige of a rapidly changing industry.

Author and Publisher have a strange, semi-synergistic relationship. To their common benefit, the author expects the publisher to go to extraordinary lengths to promote the new release, while the publisher expects the author to be the primary motivator of sales. Both benefit by the other’s actions, but usually the fingers of blame point away from one toward the other.

“Over a million copies sold” might be what you remember best when reading about successful books, but those are few and far between. Depending on the genre, a small publisher might be happy with a few thousand in sales. The author’s cut with a standard royalty publisher is 15%, give or take. That’s a few bucks per book. When you consider the time spent writing the book and then finding a publisher, followed by production and promotion, you can rest assured the income generated is going to be less than a dollar per hour. Or, approximately the same that I made mowing lawns in 1962 when I took the pioneering selfie (above) with my Kodak Brownie Starflash, one month before turning 13.

Even a successful book can be a shock to the author looking at his or her first sales report. I had the good fortune to hit a home run with my novel Flatbellies, published in 2001. On paper, the royalties should have matched nicely to the nationwide sales figures…except for an innocent little clause in the contract that stated that when the books were sold at “deep discount,” no royalties would be paid. In my naïve mind, I thought this meant that after all routine sales outlets and bookstores had their field day, and when only a stack of rock-bottom-remainders was left, there would be no royalties. But no-o-o-o. Deep discount sales were routine and common from the start, so for the majority of Flatbellies sales, I received exactly nothing. So much for earning one’s living by writing (even successful writers usually keep their day jobs).

Today, writers have the option of self-publishing with far more efficiency and hope for success than in the days of old where the self-publishing author had to buy a minimum of 1-3,000 copies of his or her own book, then store them in the garage upon discovering that bookstores rarely stock self-published books. With print-on-demand now available, there are self-publishing success stories, especially among those who know how to use social media to maximal benefit. The author retains a much larger cut, but since there’s no distribution access to bookstores, the emphasis becomes online sales, theoretically to an established and loyal readership.

But for those of us who see the traditional royalty publisher as the only measure of validation for quality writing, we have one option – promote your book. Not because of the financial windfall, which is most unlikely, but to generate sales figures that become part of your writer’s résumé. That is, when you pitch a book idea to a new publisher, first and foremost they want to know your track record. What have you published in the past and what were the sales figures? And that’s why you do everything possible to jump-start sales.

In spite of the changes in publishing, one thing has never changed. Without a movie to accompany a book (and excluding celebrity tomes), success comes through word of mouth. Today, “word of mouth” is greatly facilitated and magnified by social media. But the usual approach by publishers is to send out a blast of PR upon releasing the book, then after that brief blast, the book is on its own. The working premise is: Word-of-mouth will generate sales if the book is received well. If not, the publisher won’t throw bad money after good. The author might hang on a little longer, pitching the book wherever possible, to get that exponential sales curve going, but without word-of-mouth (including all forms of social media), sales will stagnate, the run will be over, and the track record will make it even that much harder to find a publisher for the next book.

I’ve had the wonderful luxury of writing a novel under contract wherein I knew it would be released by a major New York publisher upon completion. I doubt it will ever happen again. (University Boulevard was a contracted deal as a sequel to Flatbellies.) Nevertheless, this doesn’t keep me from trying. I have a “nearly completed” trilogy with the first book of three being a stand-alone novel with which I hope to get my foot in the door, allowing the trilogy to be published later. It took from 1999 to 2010 to write the trilogy, and I didn’t quite finish due to medical writing and then my shifted focus toward Killing Albert Berch. However, I’m back to putting the finishing touches on this trilogy, while working on two other non-fiction projects at the same time. There is a fairly high chance that none of these books I’m working on will ever go to print.

And that’s why I shamelessly promote Killing Albert Berch, along with my prior books that are still available on Amazon. And, it’s why Amazon customer reviews are important, it’s why Barnes & Noble customer reviews are important, and it’s why a serious reader should participate along with a million other readers at sites like www.GoodReads.com where customer reviews are welcome as well.

So, it’s simple really. While I’m promoting one book, I’m actually saying, “I have three completed novels and two other non-fiction books in the works. Nearly every free minute for the past two decades has gone into this body of work, and I’d really like to get these things in print before I drop dead.”

Truth is, of course, Killing Albert Berch was a labor of love to finish what my grandmother and mother started. I had no great visions of its sales potential when I began. I didn’t even know if it would be worth publishing at all. It was surprising to me, as much as anyone, that I would discover so many odd twists in the story that it morphed from an interesting family plot to a Grade A soap opera.

So now, let me tease you with the title of my three novels that may or may not ever be published, collectively called – The Brainbow Trilogy. Book One is called Nutshell. Book Two is Cannibal Club. Book Three is Heavenly Blues. Setting: a medical school located next door to a mental institution in the 1970s, in a desolate spot in far west Texas. The school is known for its cutting edge mental health program and experimental psychosurgery. Plot: A mad scientist insists on doing good in spite of himself, and the greater his deterioration and the wider his path of destruction, the greater the good.

But to get there, it’s Killing Albert Berch and my backlist of prior works for now. Visit: www.KillingAlbertBerch.com and if so inspired after reading the book, I’d certainly welcome a customer review, somewhere.

 

 

Time Out

Jesse Berch et al

WHAT’S WRONG WITH THIS PICTURE?  IT’S FROM THE LIBRARY OF CONGRESS CATALOG, AND WAS RECENTLY USED IN A PBS DOCUMENTARY TO ILLUSTRATE “SLAVE TRACKERS.”  BUT IF YOU STUDY THE PHOTO, THINGS ARE NOT QUITE RIGHT FOR THAT EXPLANATION. 

INDEED, THE TRUE BACK STORY TO THIS PHOTO CRACKED THE CASE IN MY NEW BOOK, KILLING ALBERT BERCH, EXPLAINING WHY MY GRANDFATHER, AT AGE 30, DECIDED TO VIOLATE THE SUNDOWN LAW IN 1923 MARLOW (OK), AND PAID FOR IT WITH HIS LIFE AND THE LIFE OF HIS AFRICAN-AMERICAN HOTEL PORTER. 

(SEE BELOW FOR MORE INFORMATION ABOUT THE PHOTO)    

Although I’ve intended this blog to be dedicated to breast cancer controversies, it’s time for a time out.  This web site has been in operation since April 2015, doubling readership each year, and I haven’t missed a month of blogging yet — until now.  As you might know, I juggle two lives — physician and author.  Every now and then, the latter takes over, as is happening now with the November 7 release of my new book, Killing Albert Berch.

The book has its own web site, if you’re interested: www.KillingAlbertBerch.com.  And for those who live in the area, here’s the current line-up for book signings:

November 18, 2017 — 7:30 to 9:00pm — Decopolis — Tulsa, OK

November 19, 2017 — 2:00 to 4:00pm — Barnes & Noble (@Mem & May) — OKC

November 28, 2017 — 6:30 to 8:00pm — Full Circle Book Store — OKC

November 30, 2017 — 6:00 to 7:30pm — Best of Books — Edmond, OK

December 6, 2017 — 5:00 to 7:00pm — The Twig Book Shop — San Antonio, TX

AND NOW, BACK TO THE PHOTO ABOVE.  FIRST OF ALL, WOULD YOU EXPECT AN ESCAPED SLAVE TO BE DRESSED IN HER SUNDAY BEST?  AND WHILE IT LOOKS LIKE THE MAN ON THE LEFT IS AIMING HIS GUN AT THE POOR GIRL’S HEAD, TAKE A LOOK AT THE ANGLE OF THE OTHER MAN’S GUN.  THIS IS AN 1862 STUDIO PORTRAIT AND THE SUBJECTS HAVE BEEN “ARRANGED” IN LIKE POSE BY THE AFRICAN-AMERICAN PHOTOGRAPHER. 

SLAVE TRACKERS?  NO.  THESE TWO MEN HAVE JUST COMPLETED A DANGEROUS ASSIGNMENT TO ESCORT THE GIRL TO SAFETY.  THE MAN ON THE LEFT IS MY GREAT-GREAT GRANDFATHER, JESSE BERCH, GRANDFATHER TO THE SUBJECT OF MY BOOK, ALBERT WELDON BERCH, MURDERED BY A MOB THE SAME YEAR THAT KKK MEMBERSHIP PEAKED — 1923.

 

A Variety of Variants

Watson, ABH & wife Barbara1994 — Dr. James Watson (left), Nobel Laureate for discovering DNA structure, visits OKC where he enjoys casual conversation with Barbara Hollingsworth (right) far more than topics about genetic testing proposed by Dr. Alan Hollingsworth (looking on from afar).

 

Just when you thought it was safe to go back into the room and tell a patient that her variant found on genetic testing was “of uncertain significance, but essentially negative,” and “we’ll keep you posted if there’s an update,” the lexicon changes. Gone is the oppressive and abrasive “deleterious mutation.” Gone is the adorable word that sounds like the attributes of a superhero – “polymorphism.” Instead, we now have a variety of variants. At least, that’s what’s in the works. All this on the heels of the umpteenth study that shows persistent confusion about the term “variant of uncertain significance.”

Nothing has caused more headaches and confusion in the world of genetic testing for cancer predispositions than the “variant of uncertain significance” (VUS). The VUS has become so commonplace with multi-gene panels (36% for us) that we coach patients ahead of time about its probability and to “let us do the worrying.” I don’t know if that bromide helps or not, but we certainly feel better. The very word (“variant”) that has caused more bizarre counseling than anything else in genetic testing is – rather than getting demoted or kicked out of our lexicon – getting a makeover, so that it will be word-of-choice for all possible outcomes with genetic testing that are not completely “normal.”

That’s right. More variants rather than fewer. Most of us who do genetic testing have brushed aside our angst about the VUS rate in multi-gene panels, under the presumption that, as more data accumulates, the probability of getting a VUS will become smaller and smaller, just as it did with BRCA test results. And that would be a fair assumption (although upgrades to “probably deleterious” have actually increased). Problem is, we are in the middle of a transition to a new lexicon that is going to use the word “variant” (or “alteration” or “change”) for everyone who is not composed of perfectly normal DNA. Of course, with whole exome sequencing, everyone will have variants.

We have polysemy to blame for this. Recall from my July 2017 editorial https://alanhollingsworth.wordpress.com/2017/07/ that I first discovered the word “polysemy” after being referenced in the May 2017 issue of the Journal of Volcanology and Geothermal Research. My theme had been precision language as the first stepping stone to precision medicine. To repeat, “polysemy” is when a word has more than one distinct meaning, even though the definitions are related. More specifically, in science, polysemy is when the same word has different meanings in different disciplines. And that’s exactly why the words “mutation” and “polymorphism” are to be phased out. The words are used differently by different disciplines.

But the point of recognizing polysemy is to tighten up our definitions with more precision, not more confusion. More clarity, not ambiguity. In the case of the new lexicon, “variant” rules.

Just as we clinicians have ASCO, NCCN, ACS, ACR, ASTRO, ASBrS, etc. that generate guidelines, the geneticists around the world have the Human Genome Variation Society (HGVS) who, with singular authority, dictate the lexicon for all. Did you ever wonder what all those little symbols are that describe mutations, in addition to the numbers and letters? Well, there are pages of instruction from the HGVS in the use of those symbols, numbers and letters. But in their 2016 consensus statement, they were joined by the Human Variome Project (HVP) and the Human Genome Organisation (HUGO), and it’s here that they came up with this “radical” proposal for a change in the lexicon (published in Human Mutation 2016; 37:564-569):

In contrast to the original recommendations, the terms “polymorphism” and “mutation” are no longer used because both terms have assumed imprecise meanings in colloquial use. Polymorphism is confusing because in some disciplines it refers to a sequence variation that is not disease causing, whereas in other disciplines it refers to a variant found at a frequency of 1% or higher in a population. Similarly, mutation is confusing since it is used both to indicate a “change” and a “disease-causing change.” In addition, “mutation” has developed a negative connotation (Condit et al., 2002; Cotton, 2002), whereas the term “variant” has a positive value in discussions between medical doctors and patients by dedramatizing the implication of the many, often largely uncharacterized, changes detected. Therefore, following recommendations of the Human Genome Variation Society (HGVS) and American College of Medical Genetics (ACMG) (Richards et al., 2015), we only use neutral terms such as “variant,” “alteration,” and “change.”

 The fact that the statement says, “We only use neutral terms…” (as opposed to “You must…”) leaves an opening for us to continue our imprecise, confusing and dramatizing terminology in attestation to the bright side of polysemy. Nevertheless, I’ve seen some softening on my genetic testing reports lately, at least from some of the labs. Others are steadfastly reporting, “Deleterious mutation.”

But are we headed for the following in our testing reports?

I — Variant with strong propensity for phenotypic disease that might be considered not-benign

II – Variant with probable propensity…

III – Variant with possible or unknown propensity…

IV – Variant that is unlikely to be associated with phenotypic expression….

V – Normal

(feel free to substitute “alteration” or “change” for “variant”)

 

Will this improve the situation, or is it just patchwork polysemy that is going to make things worse?

I’d be attacking my own position on the need to eradicate polysemy (the word just rolls from the tongue when you use it at every possible opportunity) if I were to take a stance against this new lexicon. Nevertheless, painting all DNA scratch marks with the word “variant” (or “alteration” or “change”) could homogenize the significance of these alterations, making the confusion even worse.

But who knows? In this politically correct, kinder, gentler world, maybe a new lexicon is exactly what we need to get more people to sign up for genetic testing where we can manage disease before it happens. A “disease-causing variant” seems much more vulnerable to interventions than a rock solid mutation.

Yet, sometimes, these terminology proposals simply don’t fly. Clinicians are fickle and unpredictable when told to change their tune. Can we really practice medical genetics, though, without using the word “mutation?” And what’s next? Will the next superhero movie alter its title to the Teen-age Variant Ninja Turtles? Will the next blockbuster in 2018 be X-Men: The New Variants. Maybe it’s time for a change (or an alteration), but do we really want to camouflage the underlying clinical importance when one’s DNA is misspelled?

 

The Alchemy of Overdiagnosis (Part 3)

A 47 year-old female has a screen-detected invasive ductal carcinoma, 1.0cm, Grade 2, node negative (Luminal B if we delve a little deeper). She undergoes breast conservation and radiation therapy. Was this cancer overdiagnosed?

Of course, we don’t know, but there are 3 possible scenarios regarding her tumor biology:

Biologic A – her tumor was local at the time of mammographic discovery and would still be local in a year or two when it became palpable. Thus, screening and “early diagnosis” was of no benefit in the long run. She will be cured whether the diagnosis is early or later on.

Biologic B – her tumor was already systemic at the time of mammographic discovery, so screening was of no benefit, and “early diagnosis” was only an illusion.

Note: The original Fisher Theory of breast cancer biology ends here with only two options – Biologic A and Biologic B. That said, Fisher Theory was developed prior to screening mammography, back when diagnosis occurred at a single point in time upon tumor palpation. Early detection through asymptomatic screening converted that single point into a window of opportunity by gaining a jump on the natural history of the disease. The success of the mammographic screening trials introduced a third scenario that plays out in the minority of patients, although enough to achieve a measurable mortality reduction, that is:

Goldilocks Biology – her tumor is local at the time of mammographic discovery, but its natural history is such that, if left in the breast until it becomes palpable, will become systemic. This is the only scenario where screening benefits the patient. Even with Goldilocks biology, the screening must occur at the right point in time.

Some are astounded to think that only the minority of patients have Goldilocks biology. Stated alternatively, most so-called “early detections” don’t save a life. This is not a serious point of contention among screening epidemiologists (although there are wide variations in quantification). If Goldilocks biology dominated, then the mortality reduction with mammographic screening would be far greater than what we actually see.

As for Biologic A, this is the fodder for anti-screening epidemiologists and guardians of public health who have used the alchemy of indirect reasoning and a warehouse of assumptions to convert this well-known group of patients, perhaps one-third of mammographic discoveries, into “Overdiagnosis.” The distinction is this: Biologic A patients will eventually progress (if they live long enough) and will still be diagnosed and treated for breast cancer without screening.

Not true in Overdiagnosis. With overdiagnosis, the Biologic A tumors never progress. Rather than the length bias of slow-growing tumors, overdiagnosed cancers are “pseudocancers” that would never harm the patient if it weren’t for that darned screening obsession in countries like the U.S. Dr. H. Gilbert Welch makes the claim that this is the case for 70,000 women every year in the U.S., such that more than a million women are walking around today thinking they’ve been cured of their cancer when, in fact, they only had pseudocancers in the first place. (The only pseudocancers, I would submit, are actually misdiagnoses rather than overdiagnosis, e.g. the adenosis family of lesions and radial scars/CSLs, no small problem…but I’ll save that for another day.)

Now, returning to our 47 year-old newly diagnosed patient, let’s assign her to Goldilocks biology and an optimally-time mammogram that found the lesion while still local in the breast. This patient is one of the lucky minority whose life is indeed saved by screening mammography. On the final day of her radiation treatment, however, she walks out of the hospital and is struck by the proverbial bus and dies on the spot. What do the strange brews and concoctions of Big Data do with this patient? She will appear in the column of Overdiagnosis.

This scenario is obviously not going to explain 70,000 cases a year of pseudocancers, but the point is this: Life expectancy allows for gross manipulation in this debate, while Big Data has severe shortcomings. Amazingly, the SEER data is used routinely and frequently to endorse and solidify overdiagnosis, in spite of the fact that “method of detection” is not part of the SEER data. How can you draw reliable conclusions about screening mammography when you have no idea about mammographic utilization, compliance or method of detection?

The SEER data is often used to explore the biology of small cancers, pointing out that aggressive tumors occur as interval cancers. This is where the assumptions kick in, with smaller tumors “assumed” to have been discovered by mammography. It is true that there is a modest trend toward lower grade cancers detected by screening and higher grade cancers appearing during screening intervals. Thus, we have length bias that is very real, but this is still overpowered by the mortality reductions seen in the prospective, randomized screening trials.

That said, the difference between the biology of screen-detected and interval cancers is not all that it’s made out to be. The Overdiagnosis Mob tries to convince us that interval cancers are the only killers and that they pop up in between screens. I’ve spent a great deal of time over the years reviewing the interval cancer data (and there’s a lot of it), and it’s not exactly what many claim. The majority of interval tumors have nothing to do with aggressiveness. The reason that they “pop up” in between screens is that they were “garden variety” breast cancers buried in the white patches of mammography, finally emerging as palpable. They were present on the prior mammogram and large enough to be detected had they interfaced with fatty tissue. But by virtue of their location in a dense patch, were not visible. They would have been easily detectable, however, had ultrasound or MRI been used. MOST interval cancers are mammographic failures that have nothing to do with the inherent biology of the cancer.

For many years, this was conjecture supported by reasonably good evidence, but Dr. Christiane Kuhl put the argument to rest with her landmark study of screening MRI in a normal risk population. With the reliable detection of MRI, guess how many cancers were left to “pop up” in the interval between screenings? ZERO. That’s right. The dreaded interval cancer disappeared by simply using a tool that detects cancer more reliably than mammography. Again, the majority of interval cancers are mammographic misses that have nothing to do with biology. Even the high-risk MRI screening trials, where interval cancers should have been in the 30-40% range saw, with a few exceptions, single-digit rates of interval cancers.

And what is the response to MRI (and other modalities) by the Overdiagnosis Cabal? Here’s the reason why need to make the distinction between Biologic A tumors with length bias vs. true overdiagnosis — the Overdiagnosis Squad condemns improvements in screening with multi-modality imaging because “you will only make the overdiagnosis problem worse!” And there you have it, in print, by anti-screeners as well as the U.S. Preventive Services Task Force – “you will only make the overdiagnosis problem worse.”

Definition, please. “Overdiagnosis” is the detection of disease that would NEVER cause symptoms or death during a patient’s expected lifetime. Aye, there’s the rub – expected lifetime. This is where strange brews and concoctions are used to convert indolent cancers into pseudocancers.

In the epidemiology of cancer screening, overdiagnosis is a given. It’s only a matter of quantifying the degree. And if your entire career is devoted to the anti-screening position, such quantification can run amuck. And one of the quickest ways for the overdiagnosis rate to escalate is by patient deaths prior to expectations. How much of overdiagnosis is related to biology and how much is due to premature death of the patient?

This entire debate, however, would be moot if it weren’t for the evil cousin of overdiagnosis, that is, overtreatment. Who cares what you name it? We are overtreating too many patients.

That’s a legitimate point, and finally, after many years of neglect, we are seeing a huge push in the direction of limiting overtreatment. But the reason I am often railing against the Overdiagnosis Club is the fact that overdiagnosis (with presumed overtreatment) is being used to denigrate screening – not simply mammographic screening with all its known warts, but the technologies that allow us to find the cancers currently missed by mammography – ultrasound, PET, MBI, contrast-enhanced mammography and MRI. All of them are under attack for their potential for making the “overdiagnosis crisis even worse than it is already.”

The focus on overdiagnosis of invasive breast cancer has risen among public health guardians from an obscure sect 10 years ago into the dominant religion of the day for many. Belief in the unseen. Faith in the miraculous regression of invasive cancers (70,000 times a year) if we would only leave them alone and keep them out of the hands of breast radiologists and their clinical parasites.

But it is not a religion of passivity. The Overdiagnosis Cult (my favorite moniker) attacks from all angles. If you say that they have failed to correct for lead time at the end of the randomized screening trials (which tends to make overdiagnosis disappear), they will point out that this criticism is based on the false assumption that all tumors progress. If you say they are not accounting for patient deaths before the natural history plays out, they will show you the overdiagnosis rate they have calculated for women in their 40s who are screened and have plenty of time left for the natural history to emerge.

But there’s an odd paradox to their impressive arguments – the more that the Overdiagnosis Cult insists that these lesions NEVER progress, the more they are backed into the corner of having to explain why – unlike prostate cancer – we can’t find the direct evidence for overdiagnosis in breast cancer. Autopsy data doesn’t support it. And now, the regression study by Dr. Sickles’group doesn’t support it. A true believer, however, is never backed into a corner — Dr. Welch begins his presentation by explaining there is no possible way to generate direct evidence for overdiagnosis since these tumors are removed after discovery. Thus, like “black holes” in science, we must use indirect methodology.

As we’ve seen in Part 1 and Part 2, however, this claim of “indirect only” is not entirely true. We get to directly observe what would happen if overdiagnosis of invasive breast cancer were as common as Dr. Welch and others have claimed. We would see either 1) tumor quiescence, or 2) tumor regression.

Tumor quiescence has been knocked out of the saddle by Dr. Welch’s own autopsy data from 20 years ago when his target was merely DCIS (he showed invasive cancers present in autopsy series only 1.3% of the time, the same as disease prevalence in the living, effectively ruling out quiescence as an explanation). Then, with the recent study by Society of Breast Imaging, we know that tumor regression doesn’t happen either.

Okay, how about a 0.7cm tubular cancer found with screening mammography in a 78-year-old with co-morbidities? From a practical standpoint, it doesn’t matter whether you call it never-progressing or slowly-progressing. She should have a wide excision and nothing else, perhaps endocrine therapy. If we did that as our routine, we would not be suffering the attacks of the Overdiagnosis Cult to the same degree (and the same can be said for other presentations as well).

In the case of this 78-year-old, we have life expectancy supervening. But what is the natural history of the small, pure tubular cancer? Can you remember the last time you treated a 3.0cm pure tubular cancer? Probably not. Why are tubular cancers always small? And, as a corollary, why are they usually mammographic discoveries? Certainly, overdiagnosis applies here, you say. Maybe so, but only when life expectancy is part of the definition. So, yes, this 78 year-old would likely have been fine without screening, but what if she lives another 20 years? If tubular cancers don’t progress and don’t regress, then they should be identified frequently at autopsy. But they are not.

As it turns out, historical literature is available, generated in the early days of mammographic screening when tubular cancers started to emerge on a regular basis. What pathologists found was a direct correlation between size and “purity.” That is, as tumor diameter increased, less and less of the tumor surface area had strictly tubular features. Only remnants of tubular persisted above 2.0cm. This was judged from many cases, simply by recording percentage tubular vs. percentage non-tubular (or “garden variety” IDC) plotted against size. The larger the tumor, the less percentage was tubular. I can only think of one explanation – tubular cancers de-differentiate as they grow. It may take a long time, but they are not stagnant and thus are not overdiagnosed in the strictest sense. They are progressing. The most indolent invasive cancer known to exist progresses very slowly, not merely by an increase in size, but by additional mutations that allow the cancer to become “more malignant” over time (that is, the basic scientists’ definition of “progression.” as in “initiation, promotion, progression,” the 3 steps of carcinogenesis).

The Overdiagnosis Cult are iconoclasts, and they revel in their position, authoring articles and books, generating headlines based on “man bites dog” while the breast radiologists of the world go about their business saving lives with mammographic screening while suffering the beatings endured by constant media attacks.

There has been an enormous shift these past 10 years toward anti-screening sentiment, linked to the false belief that systemic therapies have advanced to the point where we can de-escalate screening efforts. It is true that once a “cure” is established for breast cancer, we will no longer need to screen. But the call to cut back on screening is premature. Way too many women are still dying of breast cancer, and if you understand the unique power of the “reverse randomized trial” from Mass General (Webb ML, Cady B, Michaelson JS, et al. Cancer 2014; 120:2839-2846) then most of those deaths (71%) occur in unscreened women.

I am not in favor of the status quo. My position is exactly opposite the anti-screeners – that is, we should be doing more. More, that is, in properly selected patients. The sensitivity of mammography has been overstated for years. Now that we have multi-modality imaging studies, we know the truth. And the truth is that the mortality reductions we see in the historic clinical trials of screening mammography are relatively modest only because detectable cancers have gone undetected. Goldilocks biology doesn’t help if the cancer is hidden by mammographic density. We ought to be doing everything possible to find those cancers. The mortality reduction through early detection (not limited to mammography) is actually greater than we currently believe because we are using data from obsolete technology that probably missed half of what we could detect today. Yet, we are told to “back off” on screening?

Sadly, we have the technology to find nearly all breast cancers when still small and node-negative, including the aggressive ones. Biologic B cancers, of course, will still be in the mix, so the mortality can never be reduced to zero with early detection. But instead of capitalizing on this incredible technology, we are being told that we will do more harm than good.

If you don’t think this is true, then read the fine print in the 2015 report from the U.S. Preventive Services Task Force. For women with dense breasts, the Task Force has issued a Grade “I” (Insufficient Evidence) when it comes to dense breasts – that is, in their own words: “The USPSTF concludes that the current evidence is insufficient to assess the balance of benefits and harms of adjunctive screening for breast cancer using breast ultrasound, magnetic resonance imaging (MRI), tomosynthesis, or other modalities in women identified to have dense breasts on an otherwise negative screening mammogram.” The Task Force goes on to explain their concern about the biggest harm of them all – overdiagnosis.

Now, if you happen to have Level D breast density, then the sensitivity for cancer detection with 2-D digital mammography is somewhere around 30-40%. Then, if you rank order sensitivity with available technologies (US, PET, MBI, MRI, 3D tomosynthesis, Contrast-enhanced Mammography), then the Task Force recommendation – mammography – comes in dead last. As a result, the Task Force is exclusively endorsing the worst possible imaging modality for breast cancer detection in these women.

They are not ignorant of this fact, but they refuse to budge from their no-win situation of relying entirely on the results from the mortality reduction endpoint in prospective-randomized trials (which can’t possibly keep up with developing technology). Of course, there are no prospective trials with a mortality endpoint for these modalities, so for the slaves to RCTs who refuse to yield to the quiet voice of rational thought, they are stuck in the technology of the 1970s and 1980s.

The Task Force sounds semi-apologetic about their plight, noting wistfully that it does appear that the other modalities increase cancer detection rates through improved sensitivity, but then comes their own chilling retort as they hold themselves in check – unfortunately, overdiagnosis has a direct and unavoidable relationship to sensitivity. If you improve the sensitivity of cancer detection, then the overdiagnosis problem is made worse. Consider the twisted mindset here –   the fear of overdiagnosis is greater than the benefit of early detection (this, in spite of the fact that we already know that lives are saved, that is, the benefit outweighs the risk).

The endpoint of this warped logic is what should be terrifying to everyone – we must stop overdiagnosis in its tracks, and if stopping screening entirely is the only way to do it, then so be it. And if you think I’m being my usual hyperbolic self – witness the Swiss Medical Board, who in 2014, issued that very decree, recommending the cessation of screening mammography in Switzerland, given that harms outweigh benefit. And the greatest of those harms is overdiagnosis.

And that is why I rail against the inflated impact of overdiagnois in breast cancer screening. Are we really going to let women die of breast cancer in the name of ethereal pseudocancers that no one can see or feel or document? So those of us who use breast MRI for screening and see clear-cut tumor downstaging (and incredibly low mortality rates emerging) are supposed to throw our hands up in the air and say, “Well, we must consider the alternative explanation – even though we can’t prove it – our outcomes are good only because we are diagnosing pseudocancers. And we should accept this as truth over the other straightforward explanation that early detection saves lives, even in our population of high risk women who have watched their relatives die of the disease, which is, in fact, why they come to us for screening MRI in the first place.”

I’m not endorsing mass multi-modality screening. Currently, we use risk levels to identify candidates for MRI screening, and we use density levels to select for ultrasound screening. Sounds good, but cost-effectiveness is still an issue, even though this is what some would term “precision medicine.” Cancer Detection Rates (CDRs) with multimodality imaging have impressive numbers for the prevalence screen, but then over time, as one slips into routine incidence screens, the cancer detection rates fall to a level that is harder to justify.

Why did we jump to an MRI policy of annual or nothing? Shouldn’t we study biennial or triennial screening for different levels of risk? Shouldn’t we be looking at strategies for post-mammography testing that could select patients for US or MRI or whatever, based on blood testing? (then the auxiliary imaging is only done with a positive blood test). This could open up the possibility of MRI for all women, independent of risk levels, yet not “automatically” performing MRI on a routine basis. Potentially, a positive blood test would convert the situation from screening to diagnostic. Instead of cancer detection rates plummeting as one moves to incidence screens with multi-modality approaches, there would be no scheduled, routine adjunct screen – and CDRs would remain high, if performing multimodality imaging only when the post-mammography test were positive.

We have barely tested the power of early diagnosis of breast cancer. Screening mammography has only teased us with its potential, leaving many cancers behind for early detection. But we’ll never find out the truth, if the policy-makers and Overdiagnosis Cult have their way – placing the fear of overdiagnosis in greater priority than saving lives.

 

 

 

 

Lessons from the Grave (Part 2) — Occult DCIS or Invasion?

In the ongoing controversy about overdiagnosis as one of the harms of breast cancer screening, one of the most important observations is the disease reservoir at autopsy. With the decline in autopsies as a standard practice, our data is historical. But in this case, the historicity works to our advantage, in that autopsy disease reservoirs were established largely in the pre-mammographic era, such that we get to see how often subclinical disease lingered silently in the natural state.

The numbers generated from these old studies have been hanging around a long time, only to resurface in the current era, wildly distorted through a double misstep. First: When these studies are referenced, DCIS and invasion are often lumped together and called “breast cancer” (see Part One July editorial on “Polysemy” – a word or phrase with more than one related meaning). If the two entities were present at autopsy in their usual ratio seen through screening, I would probably not be writing this article. But as we will see shortly, the autopsy series reveal DCIS almost entirely. Second: The next misstep is quoting only the very highest numbers (again, applicable only for DCIS), so it’s not uncommon to hear that the disease reservoir for “breast cancer” in autopsy series is 30-40% (“comparable to prostate or thyroid cancer”). Few seem to remember that there is a wide range for DCIS found at autopsy, and the low end of the published range happens to be zero.

Why is accuracy suddenly so important when discussing the disease reservoir? In recent years, critics of screening are generating sky-high numbers for overdiagnosis, once applied only to DCIS, but now with invasive carcinoma in the crosshairs. All methodologies used to generate these whopping numbers are through indirect observations, accompanied by the claim that “it’s all we’ve got” since there is “no way to directly observe overdiagnosis.” But that’s not entirely true.

Overdiagnosis, as distinguished from length bias, is based on the concept that tumors either regress or become quiescent with no further growth. In a nutshell, with true overdiagnosis, tumors never progress during the life of the patient. Note: this is a shade different than “slow-growing tumors” that are unlikely to kill the host, wherein screen-detection does not improve outcomes. It is these slow-growers that are responsible for length bias in screening studies (with the bias overpowered by mortality reductions). While length bias is a cousin concept to overdiagnosis, it doesn’t have near the firepower when drawing media attention. Who stays awake at night worrying about length bias?

If overdiagnosis of invasive breast cancer is real, then most agree that we can’t recognize it in an individual patient newly diagnosed. However, we are able to make some generalizations about the existence of overdiagnosis through direct observations regarding the natural history of invasive breast cancer. If some breast cancers never progress and therefore never kill the host, then there are only two possible scenarios, and they should be observableregression or quiescence. “Slow growing” doesn’t cut it – that’s the biology that generates length bias. In contrast, overdiagnosis translates to “pseudocancers.”

How could we directly observe tumor regression or quiescence? Regression is the more challenging scenario because the evidence “disappears.” Some epidemiologists claim, by the way, that they have proven the existence of regression such that it is now a known “fact.” But upon reviewing these studies, you’ll find layer upon layer of indirect evidence sandwiched between two slices of precarious assumptions. In contrast to this deductive process, we do have a way to directly observe the natural history of untreated invasive breast cancer to see if it can regress.

There are some patients who go untreated after a positive biopsy of a screen-detected cancer, yet for whatever reason, still opt to return for their next mammogram. Until recently, there was no systemized approach to track the natural history of screen-detected breast cancer after these long delays. We relied on radiology leaders and colleagues who told us they’d “never seen a single case of cancer regression.” And this was a data deficit that I described in my book – Mammography and Early Breast Cancer Detection (devoting 2 chapters to overdiagnosis). Essentially, we were relying on hearsay to support the notion that tumor regression of invasive cancer was pure bunk.

Well, it’s no longer hearsay. In the July 2017 issue of the Journal of the American College of Radiology, Ed Sickles led a group of breast radiologists in a massive effort to identify tumor regression rates (Arleo EK, Monticciolo DL, et al. J Am Coll Radiol 2017; 14:863-867). While certainly not a prospective randomized trial for Level One evidence, this still goes down in history as the first organized attempt to document the validity of breast cancer regression in screen-detected tumors.

From 42 actively practicing Society of Breast Imaging fellows, their entire practice results were tallied for a 10-year period, looking for untreated, screen-detected cancers. Of the 6,865,324 screening mammograms performed, there were 25,281 screen-detected invasive cancers and 9,360 screen-detected in situ cancers. Of these, there were 240 cases of untreated invasive breast cancer and 239 cases of untreated DCIS who had imaging follow-up. The number of cases that either decreased in size or regressed completely on the next mammogram was ZERO.

Could some of these still have been “overdiagnosed?” Yes, but recall that we’re trying to explain critics like H. Gilbert Welch et al who have announced repeatedly and publicly that we currently diagnose “pseudocancers” with mammographic screening to the tune of 70,000 cases of invasive cancer every year. While the authors of this new survey of breast radiologists shy away from a direct assault against the pseudocancer claim and overdiagnosis, they point out the inescapable conclusion: While it is conceivable that some of these cancers might be overdiagnosed, the mere fact that NONE of them regressed on imaging after their next mammogram means that extending the interval for screening from one year to every 2 years, will have ZERO impact on reducing the rate of overdiagnosis. The cancers will still be there next year, waiting.

If invasive cancers don’t regress (note that DCIS didn’t regress in the Sickles study either) then how could there still be overdiagnosis in the group of 479 cancers noted above? Why didn’t the authors go ahead and kick the overdiagnosis enthusiasts while they were down? Because there still could be tumor quiescence. That is, the tumors didn’t regress, but they didn’t grow either – a tumor-host stand-off. And this is where the autopsy data are so critical. The lack of regression does not rule out tumor quiescence. It takes a ONE-TWO punch to knock out overdiagnosis – no regression (as strongly indicated above) and no quiescence (as supported by the invasive disease reservoir, or lack thereof, in autopsy studies).

If tumor quiescence is so incredibly common to account for 70,000 (invasive) pseudocancers every year (over 1,000,000 women walking around today having been treated for fake cancer), then we should see high numbers of invasive cancer in the pre-mammographic autopsy series where disease reservoir was measured. Welch et al can generate incredibly high numbers by capitalizing on slow-growing tumors and length bias, re-christening these tumors as “pseudocancers” through skillful statistical shenanigans. But if we’re going to look for 70,000 pseudocancers a year, it is unavoidable – there must be either: 1) frequent tumor regression, or 2) tumor quiescence as reflected in a high disease reservoir at autopsy.

Yes, it is probably true that one-third of screen-detected invasive cancers would have been cured without screen-detection, but this is a different concept than true overdiagnosis where tumors never emerge clinically. When length bias is the culprit, the favorable tumors will be cured with or without screen-detection. If the patient lives long enough, these tumors will eventually emerge without screening (therein lies the rub – life expectancy). This is why it is so important to correct for lead time at the end of a screening study.

Without knowing the autopsy data, it seems much more probable that tumor quiescence would be the mechanism behind overdiagnosis, rather than tumor regression, the latter now known to be fiction through a direct observation rate of ZERO. If tumors are stagnating at a certain point in their biologic history, then they will accumulate over time and appear in autopsy series in disproportionate numbers compared to what is seen clinically. And if overdiagnosis of invasive breast cancer is as common as being touted (70,000 per year is a “conservative” calculation, according to Welch et al), then back-of-the-envelope calculations indicate that autopsy findings ought to be similar to what is seen in prostate cancer, or even worse. As it turns out, they’re not even close. Importantly, the autopsy series need to come from a population that has not undergone mammographic screening during life, given that removal of tumors would falsely lower autopsy incidence.

In sharp contrast to invasive breast cancer, the disease reservoir for prostate cancer at autopsy is roughly proportional to a man’s age. As was seen in a review of the world literature addressing autopsy studies (Haas GP, et al. Canadian Journal of Urology 2008; 15:3866-3871), by age 30, one finds 31% of men with foci of prostate cancer, and by age 80, over 80% will have prostate cancer. And what about disesase reservoir in the living? – In a prostate cancer prevention trial of finasteride (New Engl J Med 2003; 349:215-224), the participants with normal PSA and normal digital exam were offered a random biopsy – a rather startling 15% of those participating had prostate cancer present in the very limited tissue sampling.

Speakers at the podium, still today, will routinely cast breast cancer in the same light as prostate and thyroid cancer, claiming comparable autopsy findings. So, what are the numbers for invasive breast cancer at autopsy? Take a guess. You’ve seen the slides at meetings. You’ve seen the reference made in published articles. You’ve heard the CME tapes. “Autopsy studies show up to 35-40% of patients have occult breast cancer when they die.” The correct number? The incidence of invasive breast cancer at autopsy is 1%. Disengaging DCIS from invasive breast cancer is an eye-opener!

Nearly all the “breast cancers” found in autopsy studies are DCIS. Even DCIS is overstated when one looks at the photomicrographs from those historic studies and sees that many of lesions would today be called Atypical Ductal Hyperplasia. Furthermore, a small area of low grade DCIS without calcium is not going to be a clinical problem, so yes, disease might be indolent, but it’s not going to be identified on mammography – thus, it’s not an issue. You can’t overdiagnose if you can’t see the lesion on breast imaging. Still, these small non-calcific areas of low grade DCIS (and borderline lesions) greatly inflate the autopsy numbers. In contrast to mammographically occult low grade lesions, elevated PSAs in men prompt biopsies, which then reveal the occult indolent cancers, making overdiagnosis/overtreatment a significant problem.

Borderline lesion

In the photo above, the lesion is virtually identical to Duct Lesion #10 in Rosai’s 1991 landmark article (Am J Surg Path 1991; 15:209-221) where interobserver variability in borderline epithelial lesions was initially described. For lesion #10, three experienced pathologists called it hyperplasia, one called it ADH and one called it DCIS. Among 5 pathology experts, there was not complete agreement on a single lesion in the study of both ductal and lobular lesions. The prevalence of DCIS in autopsy series are inflated to an unknown degree by these difficult, borderline lesions. (Note: Dr. Page later countered the Rosai article to a degree by showing the interobserver variability is greatly reduced if there is standardization of criteria (complete agreement among 6 pathologists in 58% of the lesions); however, while the variability was remarkably tight from the viewpoint of pathology, in contrast, from the clinical standpoint where decisions are made, 42% of the cases still had some disagreement among experts.)

I’m not going to deny that overdiagnosis occurs in DCIS. Most are in agreement that some (an unknown number) of these lesions do not progress, or would not progress during the lifetime of the patient, especially lower grade DCIS. Hopefully. the much anticipated “Comparison of Operative to Monitoring and Endocrine Therapy Trial” (COMET) will sort this out. Good luck to Dr. Hwang and all those participating.

My intent is to draw attention to the remarkably low disease reservoir of invasive breast cancer, no different than disease prevalence in the living. If you perform the initial mammographic screen (prevalence screen) on 1,000 women at age 50, you will find 1% of them have breast cancer, the same number as found at autopsy (albeit some of these are DCIS). But now add the mammographically occult cancers we can find using screening MRI wherein Dr. Christiane Kuhl’s series (in women with normal mammograms, normal exams, and most with normal ultrasound), identified an additional 2.2% with cancer on the prevalence screen, again some with DCIS. However, if you remove the DCIS from both mammographic screening and MRI screening, you’ll still have 2% of women with invasive breast cancer as detectable disease prevalence, higher than what is found at autopsy. These numbers do not support overdiagnosis of invasive cancer. They do not support tumor quiescence.

Stated more emphatically, the autopsy data is the strongest evidence we have against widespread overdiagnosis when it comes to invasive cancer, especially in light of the new information that excludes tumor regression by direct observation. So why do we shoot ourselves in our collective feet by playing into the hands of the epidemiologists who are using every trick in the book to malign breast cancer screening? Why, in the non-radiologist world, have many adopted the 30% overdiagnosis rate, not based on the epidemiologic claims or reasoning, but by believing and promoting incorrect autopsy data? Say what you want about DCIS, but when it comes to invasive cancer, the 30% overdiagnosis rate (by strict criteria where tumors are considered “pseudocancers”) is simply not true.

Now – to my source for the 1% disease reservoir for invasive breast cancer. In the 1990s, when many of us (incorrectly) believed that the key to eradicating breast cancer was to find all cases of DCIS, a young internist (with a specific interest in the hazards of screening healthy populations for a variety of diseases) decided to stake his claim to fame on exposing overdiagnosis in DCIS. He decided to perform a combined review of all available autopsy studies at the time, which consisted of 852 patients total, mostly from the pre-mammography era. The title of his article was: “Using Autopsy Series to Estimate the Disease ‘Reservoir’ for Ductal Carcinoma In Situ of the Breast: How Much More Breast Cancer Can We Find?” The article was published in a journal that has always been a haven to anyone willing to shed light on the harms of screening – Ann Intern Med 1997; 127:1023-1028.

In the 7 autopsy series in this comprehensive combined analysis, the overdiagnosis rate, that is, the disease reservoir rate, was 0 to 14.7%, with the higher rates seen when more slides were reviewed (the range here was 9 slides per breast to 275 slides per breast). Granted, one of the 7 studies revealed a 39% incidence of DCIS (borderline photomicrographs notwithstanding) in 109 autopsies when considering only those patients who were of screening age (one must keep in mind that these are often forensic autopsies that include younger patients). However, the median prevalence of DCIS when all patients were considered in the seven studies was only 8.9%.

Seemingly forgotten today, this autopsy review secondarily included invasive cancers as well. The range was very tight, including the “275 slides per breast” study. For invasive disease, the range was from 0 to 1.8% with a median of 1.3%, a conclusive indictment against the alleged phenomenon of tumor quiescence. As for the outlier study above, where 39% had DCIS with extensive sampling, 275 slides per breast were only able to identify invasive cancer in 0.9%.

So, if we don’t see invasive cancers regress clinically, and we don’t see excessive invasive cancers that are quiescent until death of the host, there is no observable support for overdiagnosis of invasive breast cancer when one examines untreated natural history. And while one can argue that this is still indirect evidence, it is far closer to direct observation of the “black hole of overdiagnosis” than are the large scale epidemiologic reviews that don’t include specific mammography data as to who did and who didn’t comply. And, the lack of disease reservoir is much closer to the truth than the abuse of the historical screening trials for mammography where lead time is not corrected at the end of the study, allowing an “excess” of cancers in the screened group.

Oh, I nearly forgot. The lead author of the DCIS exposé was Dr. H. Gilbert Welch, prominent screening expert from Dartmouth who, subsequent to his little-noted 1997 autopsy article, has gone on to fame by exposing the extraordinarily high rates of overdiagnosis with regard to screen-detected invasive cancer. His indirect evidence of the “black hole” proved beyond a shadow of doubt to many, including what seems like every science journalist in America (hyperbole notwithstanding), that mammographic screening did more harm than good, by diagnosing “pseudocancers” in 70,000 women every year, totaling to 1.3 million women over the past 30 years (Bleyer and Welch, New Engl J Med 2012; 367:1998-2005). “All those unfortunate women undergoing mutilating surgery, unnecessary radiation and chemotherapy for benign disease,” wrote one journalist.

But Dr. Welch didn’t stop with a measly 70,000 overdiagnosed cancers per year, or one-third of invasive cancers. Abandoning the concept of tumor quiescence (perhaps due to his own 1997 findings), he has embraced tumor regression as the likely mechanism at work, and in 2016, he upped the ante to an indirect estimate of overdiagnosis that exceeded 80% (N Engl J Med 2016; 375:1438-1447). And this is why the Society of Breast Imaging study quoted above is so critical, and so wonderfully timed – ZERO regression. And then, from Dr. Welch’s own work – ZERO tumor quiescence of invasive disease reflected at autopsy.

While we can’t determine if a specific cancer is a “pseudocancer” or not, there are direct observations that can be made, as we’ve seen above, that can confirm or deny overdiagnosis. Indeed, the very first trick played upon an audience listening to exaggerated claims of overdiagnosis is the mistaken belief that one can only calculate overdiagnosis indirectly. That’s where the magic begins – you see with your own eyes that the elephant on the stage has disappeared, but you also know it’s a trick.

“But it’s all moot,” you might say. “What difference does it make what you call it, if a screen-detected cancer is not going to be a threat to a patient’s life? Who cares whether it’s length bias or overdiagnosis? You’re splitting hairs. The problem is overtreatment.”

And you would be justified in your criticism…at first glance. In September, I’ll conclude with Part 3 – The Alchemy of Overdiagnosis.

For Dr. Hollingsworth’s book on the nuances of screening (including two chapters on Overdiagnosis), Click on: http://www.mcfarlandpub.com/book-2.php?id=978-1-4766-6610-5