Vaccine Efficacy, Statistical Power and Mental Models

The Immune System is Not a Sea Wall With a Fixed Height

Vaccines, Variants and why Mental Models and Statistical Power Matter

Shortly after Facebook opened up to all colleges, after originally being exclusive to Harvard and a few other elite universities, I had a surprising encounter with my students. More accurately, with their mental model of Facebook and the reality of it. I used to teach introduction to sociology, which is a wonderful class to teach, but unfortunately, too often it’s relegated to large lectures (as are most  “introductory” classes). I’d break the lectures up with in-class assignments, so that it wasn’t just me talking all the time. One of my such assignments was an exercise meant to encourage students to exercise their “sociological imagination”: I’d ask them to imagine if they were born in a different time or a different social class or something else different about their social location. What might be different about their lives? We’d have them write up a few answers, discuss them with a small group, have a quick discussion in class. We’d collect the written assignments, too, not because we really graded them—since there was no wrong answer—but as a way to provide credit for participation that went beyond attendance.

Then my sharp-eyed teaching assistant noted that a handful of them had the same hand-writing. It made no sense. You basically had to scribble a few coherent sentences and you got the grade which was basically 1/0—there or not. But if people were filling them in for others, that wasn’t cool. She didn’t think that’s what happened, though. The attendee/sign-up sheets counts matched up. We had people write their names in the sign-up sheets, and those didn’t have the same handwriting.

So I called the group with the identical handwriting—all five of them—into my office to ask what was going on. They were good students, too, so I wanted to understand. They vehemently denied any wrongdoing. They said things like: Coincidence! Maybe you are squinting too hard! They aren’t all similar! I asked them if they were a friend group. “No, no, we don’t know each other,” they claimed.

I looked them up on Facebook, which was, as noted above, a nascent platform. Unsurprisingly, they were all friends with each other. I called them back and confronted them with the obvious. 

Their initial response was… utter shock that this information was visible to me. They had no idea. Not the smallest inkling. You can see all that information?,they exclaimed. One of them sat down, unable to compose herself. Long story short, they were pre-med students preparing for the MCAT—high-stakes medical entrance exam—and had decided that they were going to use my class as more study time. They were all there, but only one of them did the scribbling on paper and put all their names on it. They were, in fact, excellent, hardworking and ambitious students, but they were completely dumbfounded by Facebook’s actual visibility to a cursory search. The problem was their mental model of how the platform operated—and the visibility of information there, and how it actually operated, were completely different.

Mental models matter, because they are our conceptualization of what the world—and dynamics within it—are like. If they are wrong, we can quickly get in trouble. Refining them through experience and information is the work of a lifetime, and it’s crucial in so many aspects.

Mental models are likely to go awry when we hit specialized or unfamiliar territory. That’s  what happened with early Facebook, when a novel digital platform  flattened social roles. It was very weird and unnatural, treating all relationships as essentially in the same visibility space. It essentially put everyone in a single room, whereas we normally expected time and space to separate us, and to protect us from everyone else’s gaze. Here’s an article from 2010, exploring this dilemma:

“The problem with traditional social networks 1.0 is all the relationships are flat,” said Charlene Li, founder of the Altimeter Group, which researches Web technologies and advises companies on how to use them. “Everyone is the same level, whether I’m married to you or you’re someone I went to high school with or somebody I met at a conference.”

That online reality does not reflect human nature, said Zeynep Tufekci, an assistant professor of sociology at the University of Maryland, Baltimore County who studies the social impacts of technology.

“Your mom and your boyfriend are rarely in the same room,” she said, “and that’s why Christmas and Thanksgiving are such a stressful time for people, because their worlds collapse. On Facebook you’re in a long extended Thanksgiving dinner with everyone you ever knew, and people find that difficult to deal with.”

After a decade or more of this kind of flattening, it may seem like an obvious thing. But apparently it wasn’t obvious then to many, including to Facebook’s CEO. Back then, Mark Zuckerberg would give interviews claiming the social flatness of his platform was not only natural, that it was a sign of integrity:

“You have one identity,” he emphasized three times in a single interview with David Kirkpatrick in his book, “The Facebook Effect.” “The days of you having a different image for your work friends or co-workers and for the other people you know are probably coming to an end pretty quickly.” He adds: “Having two identities for yourself is an example of a lack of integrity.”

Of course we’re not exactly the same to our friends, our co-workers, our parents and to strangers. That’s called having social roles, and it’s not lack of integrity to treat your close friends in a different manner—and reveal different kinds of information—than you would treat your workplace acquaintances. Zuckerberg's mental model of human relationships was wrong, misguided and dangerous. The platform he designed reflected this mental model, which, in turn, was a shock to my normal students.

So I made two decisions right there and then. I decided to never, ever to look up my students again on social media. It’s a transgression that I decided I could live without, and it wasn’t conducive to the relationship I wanted to have with them. I also promptly launched a research project on Facebook, privacy and our individual and societal mental models of it—and how the divergence was about to create a crisis. 

I’ve recently been thinking of another mental model divergence, one that is coloring our interpretation of (and worries around) vaccines, variants and their efficacy. 

In a nutshell, here’s the divergence. A lot of our discussions seem to treat the immune system like a wall with a fixed height: if a taller wave comes over it, it will wash over the wall.  Like the sea wall that was meant to protect the Fukushima nuclear reactor from tsunami waves but was too short to do so.

Another facility, the Onagawa Nuclear Power Plant had a higher sea wall and survived despite being closer to the epicenter of the Tsunami.

 If that’s what the vaccine trials were measuring—the height of thewall that is our immune system comparing vaccine effectiveness would make a lot of sense.  Many high-profile, highly-credentialed people have been (misleadingly) describing it exactly in that manner: that if a vaccine is 95% effective, those 5% are left “unprotected.” If Moderna and Pfizer and 95% efficacious, and if Johnson and Johnson is 66%—well, that would mean that 34% of the people are left “unprotected.” right?

Wrong. To get to why that assumption is not right—and why those vaccine efficacy numbers are not the height of the wall that represents the immune system—let me first mention something important The two mRNA vaccines do appear to be spectacular, but they were tested under conditions where those pesky “variants-of-concern”—the B.117 (UK one) and B. 1.351 (South Africa) and P1--were not widespread. If tested now, under equal conditions, those numbers may be closer. Plus, Johnson & Johnson is a single-shot with a trial with a booster underway. So those efficacy numbers may well be much closer in reality than they appear from the trial results. But let’s leave that aside for a moment.

The trials have predefined “endpoints” that we measure for statistical comparison. In all these trials, the endpoint is any symptomatic disease. Not any infection (though a few have measured this), and not deaths or hospitalizations or even severe disease (though, obviously that’s what we care about!). Why? Pretty simple actually: it is the best compromise between a good indicator of a vaccine that works well and statistical power

Simply put, statistical power is the calculation that gives you the trial size you need to make a meaningful comparison between two groups. The rarer the thing you are measuring, the larger the group needs to be. Consider a randomized trial to figure out if one can create a “national player of the year”  award-winning basketball player at the college level with a summer boot camp teaching MAGIC TECHNIQUE in the high school senior year before the player goes to college and joins a team. Let’s say our control is a summer boot camp that teaches REGULAR TECHNIQUE. There are currently about six such awards per year, from six different organizations. Sometimes they go to the same person: Zion Williamson won all six in 2018-2019. Sometimes they don’t. 

So let’s say we took 10,000 thousand top-playing high school players, randomized them to each group, and then watched what happened. Let’s say that one person who attended the MAGIC TECHNIQUE camp won two of the awards in a year, and nobody from the group got the REGULAR TECHNIQUE won any of the six awards. Are we confident that those results will always be the case? Of course not. Let’s say we repeat it next year, and again, a person who attended the camp won all six awards. Any more confident? Not really. The outcome is just too rare. We’d have to keep repeating this for many years to get even the slightest inkling of what’s really going on.

A more sensible way to do this would be to measure something a lot more common. Say: being drafted to college basketball at all. Take the same top 10,000 promising high school players, randomize them in their junior year of high school for MAGIC TECHNIQUE and REGULAR TECHNIQUE summer camp. Then check how many got recruited to play basketball at the college level—which appears to be about 1,500 total per year. Now we are getting into better statistical territory, though, here too we still wouldn’t be able to feel super confident if the results were close. It appears that about 3.5% of high school players are recruited each year, so we expect 175 out of each group (of 5000) to be recruited on average, if things are happening just by chance. If it is 180 vs 170? We won’t be that sure this is a real difference. If a MAGIC TECHNIQUE group has 250 recruits to 100 from the REGULAR people? Looks more like a statistically significant effect.

I gave eyeball numbers here, but these are, in fact, fairly precise statistical calculations. We can calculate these things, and we have an idea of what kind of trial size we need to have the kind of statistical power to differentiate things we expect to be different at what kind of level.

Which brings me back to vaccine efficacy. The vaccine trials have as any endpoint any symptomatic disease,  rather than hospitalization or death, for the same reason it is easier to measure the effectiveness of a technique by looking at whether a high-schooler makes it into a college basketball at all rather than whether they become an MVP college basketball player for that year: it’s easier and quicker to look at something that’s a lot more common.

But here’s the twist: There is, of course, a relationship between mild COVID—or breakthrough cases, getting symptoms of COVID despite being vaccinated—in that if you aren’t even getting mild disease, you are certainly not getting severe consequences. But the converse is not true: an inability to prevent mild disease does not necessarily signal an inability to prevent severe disease, hospitalizations and death. That would be the case if vaccinations and the immune system, indeed, had operated like a wall; if a wall can’t stop a five-foot wave because it is too short, it certainly can’t stop a nine-foot wave.  

Instead, though, the immune system is a tiered (and very complicated) system. The first line of defense is those antibodies that we keep hearing of (which are important and significant for our purposes here, also easier to measure) that attach to the invading pathogen and are, well, neutralizing (Hence neutralizing antibodies). But there is another component to our immune system, called T cells, which kick into action after an infection has occurred. They work to clear out the infected cells (which have become little factories producing the virus). 

Alongside antibodies, the immune system produces a battalion of T cells that can target viruses. Some of these, known as killer T cells (or CD8+ T cells), seek out and destroy cells that are infected with the virus. Others, called helper T cells (or CD4+ T cells) are important for various immune functions, including stimulating the production of antibodies and killer T cells.

T cells do not prevent infection, because they kick into action only after a virus has infiltrated the body. But they are important for clearing an infection that has already started. In the case of COVID-19, killer T cells could mean the difference between a mild infection and a severe one that requires hospital treatment, says Annika Karlsson, an immunologist at the Karolinska Institute in Stockholm. “If they are able to kill the virus-infected cells before they spread from the upper respiratory tract, it will influence how sick you feel,” she says. They could also reduce transmission by restricting the amount of virus circulating in an infected person, meaning that the person sheds fewer virus particles into the community.

Now, obviously, I’m not an immunologist, and neither are most of you. (And those of you who are: please feel free to correct and add any insights, because what I’m attempting to do here is the part that matters for the rest of us—the lay people—about understanding why statistical power/endpoints/measurement choices matter to evaluating the vaccine trial results).

I certainly don’t have to pretend to understand all the complexities of the immune system which, my Atlantic colleague Ed Yong referred to as the place where “intuition goes to die.” But his whole article is absolutely worth reading for the non-specialist because the important message is this: THE IMMUNE SYSTEM IS NOT A WALL WITH A SPECIFIC HEIGHT THAT FAILS IF THE WAVE IS TALLER. IT’S A TIERED SYSTEM WITH VERY COMPLEX INTERACTIONS. IF THE INITIAL RESPONSE FALLS SHORT AND “BREAKTHROUGH” DISEASE OCCURS DESPITE VACCINATION, THAT DOES NOT MEAN THAT SEVERE DISEASE IS NECESSARILY EASIER TO OCCUR, because the wall had been overcome the way a wave washed over the sea wall protecting the Fukushima plant. 

So not this:

Very much not this! That’s because the post-infection phase is when those killer T cells will come in.  If they still work well, we might not be able to avoid the symptoms as if we got a cold (many colds are caused by other endemic coronaviruses), but we will defeat the disease without letting it progress to a severe stage!

Here’s Yong again:

Both T-cells and antibodies are part of the adaptive immune system. This branch is more precise than the innate branch, but much slower: Finding and activating the right cells can take several days. It’s also long-lasting: Unlike the innate branch of the immune system, the adaptive one has memory.

After the virus is cleared, most of the mobilized T-cell and B-cell forces stand down and die off. But a small fraction remain on retainer—veterans of the COVID-19 war of 2020, bunkered within your organs and patrolling your bloodstream. This is the third and final phase of the immune response: Keep a few of the specialists on tap. If the same virus attacks again, these “memory cells” can spring into action and launch the adaptive branch of the immune system without the usual days-long delay. Memory is the basis of immunity as we colloquially know it—a lasting defense against whatever has previously ailed us.

So where does this leave us? Basically, just looking at vaccine efficacy—measuring breakthrough symptomatic COVID—isn’t enough to determine the outcomes we really care about, like severe disease, hospitalization and death. So far, in all six trials, there are no deaths reported among the vaccinated group, almost no hospitalizations (a few definitions are wonky), and little to no severe disease. That is very encouraging! Is this a guarantee that they are all equally good at that side of the equation? Maybe. Possibly. 

It’s back to the same problem of finding the right training to produce the MVP player: so far, while we have great evidence that the vaccines are excellent at preventing these terrible outcomes, we know that there might be slight differences between them. But given the rarity of the events we care about (and given that we had zero(!) deaths so far in trials), it is hard to compare that minute difference between the vaccines. Can we rule out differences? Not yet. Can we assume they exist just because there are different efficacy numbers between the vaccines? Not really, either.

So this goes back to the point: for now, there is no reason for anyone not to take the first vaccine they are offered, despite, yes, uncertainty about if there are differences among them that may matter at the margins. This will also greatly impact something I’ll cover in my next post: what should we, as lay-people, know about vaccines, variants and T-cells? (Hint: it’s been really good news lately).