About me


Postdoc at UC Berkeley School of Information.

PhD in Statistics and Public Policy.

One-time semi-professional passista.

These days I study semi-parametric causal inference.  I work on developing estimators and techniques which draw on machine learning tools and expert knowledge to carefully answer policy questions.

Email: jacqueline dot mauro at berkeley dot edu

More info: jacquelinemauro.com

About me

SF Data Blog #3: Make It Right

Make It Right

Today we’re talking one of my favorite things: permutation tests!

Black And White Studio GIF by Baroness

Permutation tests are amazing because they don’t require ~any~ assumptions about the data except that when I assigned treatment, I did it randomly.

Juvenile Justice

The context for this is George Gascon’s policy experiment to reduce juvenile recidivism, called “Make It Right” (MIR). The idea is basically to give juveniles a chance to make amends rather than being charged for a crime. The DA’s office describes it thusly:

Through Make it Right, eligible young people are given the option, before their cases are charged, to participate in “restorative community conferencing.” In this process, the youth come together with their victim and their supporters (including family/caregivers, youth services, schools, coaches, and others) in a community-based facilitated dialogue to develop an agreement for the young person to repair harm, address root causes, and make amends. This collective agreement identifies concrete actions the youth will take to address harm caused to the victim, the community, the youth’s family, and him/herself. With support from a community-based case manager, the young person has a six-month period to complete their agreement. If successful, the case is not prosecuted.

Along with an innovative idea, Gascon did something relatively rare in policy–an experiment. Instead of rolling the program out for a specific subset of people or all at once, his office (in collaboration with California Policy Lab and others) designed an experiment to decide who gets the new program randomly. This means on the backend you can be really confident about whether the treatment had an effect or not.

Their experiment–strictly speaking–was a randomized block design. But I’m gonna ignore that for now. Let’s just imagine they had a two step process where in step 1 they decided if someone was eligible for MIR and then in step 2 they flipped a coin to decide whether to send people through the usual process or the new one.

The Results

Tldr; looks like a raging success:

Of the 47 youths who completed the program, about 13% re-offended within two years, according to data provided by the district attorney. By comparison, of the 43 youths in the control group who went through the traditional court process, about 53% re-offended.

Just looking at it, it seems like we should probably implement this more widely. After all, the estimated average treatment effect is:

ATE = Average recidivism of treated – Average recidivism of control 

= 0.13 – 0.53

= – 40

A 40 percentage point fall in recidivism!! That’s HUGE.big GIF

But! we’re only looking at 90 people–how confident can we be that we’re gonna see a change this big when we roll this out on a larger scale? Maybe these 90 people were particular in some way. Maybe we just got unlucky and sent lots of people who were likely to recidivate anyway to the traditional court process. That’s what policy-makers have to worry about, and what statistics can help with.

Enter permutations!

Permutation tests start in the same spot as most statistical tests: we assume there’s no effect of treatment. Meaning, we assume that all the people who we see recidivating would have done that no matter what treatment they got.

Said yet another way, we assume that in our dataset there are 29 youths who were destined to recidivate whether they went through MIR or not, and 61 who were destined not to recidivate. The 40 percentage point difference we observed between treated and untreated is just a fluke: we accidentally assigned way more “doomed” youths to the current treatment and way more non-recidivating youths to MIR. If we rolled this out to the whole population, we’d see no improvement in recidivism rates.

We can figure out how likely luck this bad is… and how likely every other permutation of treatment and control is.

Toy Example

Let’s illustrate this with a smaller toy example. Say instead of 90 people I just have 5. Then let’s assume 2 of them—person A and person B—are going to recidivate no matter what (the remaining 3–C, D and E—are not). I assign a useless treatment randomly. The possible treatment assignments are given in the table below, along with the treatment effect we would estimate:

Person Got treatment = 1; No treatment = 0
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
A 1 1 1 1 0 0 0 0 0 0
B 1 0 0 0 1 1 1 0 0 0
C 0 1 0 0 1 0 0 1 1 0
D 0 0 1 0 0 1 0 1 0 1
E 0 0 0 1 0 0 1 0 1 1
Effect size 1.00 0.17 0.17 0.17 0.17 0.17 0.17 -0.67 -0.67 -0.67

In v1, we randomly gave treatment to A and B who are destined to recidivate. So in this setting, 100% of the treated recidivate and 0% of the untreated do, giving an estimated treatment effect of 100 percentage points. In v2 through v7, 50% of the treated recidivate, and 33% of the untreated do, so the estimated effect size is 0.17. Finally, in v8 through v9, none of the treated recidivate, because we give neither A nor B treatment, and 67% of the untreated do, giving a -0.67 treatment effect. Remember that in all cases, the treatment has no effect, so the true effect size is 0.

Say I ran my experiment and I assigned treatment like in v8, v9 or v10. The estimated treatment effect is a 67 percentage point drop in in recidivism. That’s even bigger than what we’re observing for MIR, but it’s totally due to chance. If we ran a permutation test, we would be warned of this; we would know there’s a 3 in 10 chance of seeing a 67 percent drop in recidivism with a useless treatment. Knowing this, we wouldn’t use this data as proof we should roll out the treatment to the whole population.

What’s it look like for us?

We don’t have 5 people, we have 90. And we’re going to treat 47 of them. Just like before, we’re going to start from the assumption that treatment has no effect: 29 people are going to recidivate no matter what and 61 won’t, no matter what.

With 47 out of 90 people getting treatment, we have 90 choose 47 = 9.508709e+25 possible combinations (9.5 and then 25 zeros) of treatment and control, not just 10 like in the toy example. I’m definitely not gonna write all of those possible combinations out in a table like we did before.

Luckily, we don’t need all of the possible combinations, a pretty big random sample of them will do just fine. That is, instead of writing out the whole table, I just randomly pick a bunch of columns from the table and use those. Each column is called a permutation.

I run 50,000 permutations where I take my 29 recidivators and my 61 non-recidivators and flip around who I call treatment and who I call control. Then for each permutation I calculate what the estimated treatment effect would be. Remember the treatment effect we observed was -0.4, so I draw a red line there to show where it lies on the distribution.


What I get at the end is that out of those 50,000 permutations, only 2 end up with a 40 percentage point or bigger effect size. So whereas in the toy example there was a 3 in 10 chance of seeing a large drop in recidivism with a useless treatment, here there is (about) a 2 in 50,000 chance. If the treatment didn’t really reduce recidivism, we got incredibly unlucky with our randomization.

So what?

What’s it all mean? It means that even with just 90 people, we can feel pretty good that the treatment is having a real effect on recidivism.

Also, good work Gascon! It looks like this program kept a bunch of youths out of a horrible situation and now there’s strong evidence to keep it going.

Season 2 Yes GIF by CBS All Access

SF Data Blog #3: Make It Right

SF Data Blog #2: Operation Peacemaker

Operation Peacemaker

A new paper by Matthay et al (and accompanying Chronicle article) claims that Richmond’s firearm reduction policy has led to 55% fewer gun deaths since its inception in 2010. Read it here. That’s a *big* decrease in a problem that seems to be quite sticky, and very pressing.

twins GIF by Soul Train

The paper relies on a cool technique called synthetic controls where you create an imagined identical twin version Richmond (or, really, a bunch of twins) that didn’t get the treatment. Then you compare what happened in real-Richmond to what happened in alternate-universe Richmond to figure out how much the treatment affected things.

In addition to finding that alternate-universe Richmond had more gun deaths, the authors also find a sort of surprising dynamic that Operation Peacemaker seems to have increased non-firearm violence. They posit this may be because people are more likely to get into fights when they suspect their opponent doesn’t have a gun. I mean, I would be too I guess.

Peacemaker background

The program is likely unique in the world, and focuses on the very few individuals considered to be at the root of a large share of the violence in the community. The program is described as,

The core components of Operation Peacemaker are individually tailored mentorship, 24-hour case management, cognitive behavioral therapy, internship opportunities, social service navigation, substance abuse treatment, excursions, and stipends up to $1000 per month for successful completion of specific goals set by the fellowship and ONS staff, including nonparticipation in firearm violence (a conditional cash transfer).

Treating 30 people or the population?

In this program, about 30 people are intensively treated, but the impact is meant to be felt by them and by the whole community. This is not the usual setup we imagine when we think about treatment effects. This isn’t prescribing some people aspirin to see if their headaches go away. It’s much more like a vaccine trial.

That means we have to be really careful about how we define things like treatment and population. Are we treating 30 people and estimating the effects on the community? Or are we treating the community as a whole?

Image result for its always sunny yarn wallIf we think of it as treating 30 people and estimating the broader effects, we’re in the world of networks! It’s a scary world that I stay out of where the basic assumption we barely deign to mention most of the time– “independent and identically distributed” (iid)–no longer holds. That makes things a lot harder and means you have to make a bunch of other assumptions we might not feel great about.

In the synthetic controls approach, you get to sort of bypass this and imagine treating not 30 people, but one giant unit–the whole community. The “treatment” isn’t at the person-level, it’s at the community level. The sort of weird thing about this setup though is you end up with n=1. That is, instead of having, say, 100 people (n=100) and treating half of them and seeing how they do compared to the untreated, you have one “person” who you treat and compare how they do relative to how they would have done if you hadn’t treated them. With synthetic controls you manufacture a bunch of clones of the treated individual to replicate what a population would look like but how well you can do this makes a big difference to your estimates.

TL;DR n=1 makes statisticians squirmy but we may not have good alternatives.

Sensitivity Analysis

There’s a famous case of causal inference in an observational setting that pitted one of the founding fathers of statistics (Fisher, what a jerk) against the medical community. Fisher didn’t think there was enough evidence to prove smoking caused lung cancer, because there was no experimental evidence. He argued there could be a confounder — maybe genetics made people more likely to smoke and more likely to get cancer. In this case, telling them to stop smoking would have no effect on their chances of getting cancer. cigarette smoking GIF

One of the most compelling refutations of this theory didn’t come from fancy methods or controlling for any other variables. It was the simple argument that this confounder would have to have an incredibly strong effect on both smoking and lung cancer to account for the higher rates of cancer among smokers, and that no one has found a plausible culprit that fits the bill. So it’s just more likely that it’s the smoking causing the cancer, and not some other mysterious force. This is a very good comment on that (very good) original paper.

I think that since the drop in murder rates is so large after Operation Peacemaker, a similar sensitivity analysis might be useful here too. It wouldn’t argue other cities are similar to Richmond or that we can make plausible Richmond clones, just that the drop in violence we see in Richmond and not elsewhere is hard to explain without Operation Peacemaker.

All that to say

I think it’s a really interesting program that seems to be very successful. It will be interesting to see if any other cities replicate the experiment and what kind of results they see.

Also, synthetic controls are pretty neat, but they still make me nervous. But I’m a nervous type.

SF Data Blog #2: Operation Peacemaker


I live in San Francisco, where there is a lot of homelessness and a lot of NIMBY-ism. I just want to point out a simple bit of logic that seems oddly contentious:

A) You cannot expel people from a city or refuse them entry into the city

B) You can hinder a city’s ability to build more housing

C) If there are more people in a city than there is housing, you will have homeless people

Ergo: If you stand in the way of new housing, you should stop complaining about people not having houses.


SF data blog (aka justifying my podcast addiction)

Episode 1: presumption laws


OK so I’m gonna try something new. I have started listening to the SF Chronicle’s podcast and I am loving it. But also theoretically I should probably not be listening to podcasts quite so much. But listen, sometimes code is slow.

In any case, they bring up a lot of interesting topics about my beloved city and so I’m thinking I’ll pick up on some of these and go wild on the data side. We’ll see what happens.

The first post I wanna dive into is this idea of presumption laws. This came up in the August 13 episode of City Insider, which interviewed (bamf) Fire Chief Jeanine Nicholson. Chief Nicholson had previously battled aggressive breast cancer, and mentioned in the interview that the laws around workers compensation for firefighters are unusual.

In most cases, a worker has to prove that something about their job caused their injury or illness. With a presumption law, the employer has to prove that the job was not the cause of the injury/illness. Now,

{Job caused illness} = {Job didn’t cause illness}^C

meaning the “event” that the job caused the illness is just the converse of the “event” that the job didn’t cause the illness. This implies that the probability that the job caused the illness is just,

P(Job caused illness) = 1-P(Job didn’t cause illness)

This sort of makes it look like there isn’t a huge difference between what the employer has to prove under presumption laws and what the employee has to prove in the usual case, and so the number of cases that go for the employee might be about the same.

That is, for example, a trial without presumption laws might find that the probability the job caused the illness was .4 and the employee would not get worker’s comp. The same case tried without the presumption laws would find the probability the job didn’t cause the illness was .6, and the outcome would be the same.

Not so. Let’s do this.

merciless maya rudolph GIF by NETFLIX

Firefighting and cancer

First, the numbers. Firefighters get a lot of cancer. In Boston, the rates are estimated to be about twice the overall average [1]. A CDC report [2] found increased risk of cancer, especially mesothelioma.

But is the cancer caused by the firefighting? On that point, we get a lot more hemming and hawing. See,

If you are a fire fighter and have cancer this study does not mean that your service caused your cancer. This study cannot determine if an individual’s specific cancer is service-related. In addition to exposures that you may have encountered as a fire fighter there are other factors that may influence whether or not you developed a particular cancer, and this study was not able to address many of these factors. [2]

And the classic, “may be/association/suggests” language, which gives authors a good way to avoid making statements about causes,

We report that firefighting may be associated with increased risk of solid cancers. Furthermore, we report a new finding of excess malignant mesothelioma among firefighters, suggesting the presence of an occupational disease from asbestos hazards in the workplace. [3]

Proving Causation

The classic and most trusted way to prove one thing causes another is to run an experiment. Experiments should have the exposure randomly assigned and compare the exposed people to a control group who weren’t exposed, which is why we call these sorts of experiments “Randomized Controlled Trials” (RCT’s). In this case, an ideal experiment would involve randomly making some people firefighters. Or randomly sending some firefighters out to inhale/roll around in more toxins than others. In other words, not a workable solution. But in the absence of an experiment, proving causation gets way, way harder. Some even think it’s impossible [4]. Some will trot out the old “correlation is not causation” apple. Sigh. I know.

Image result for correlation is not causation spongebob

In these cases, add the fact that not only does the worker have to prove firefighting can cause cancer in general, they have to prove firefighting caused their own cancer. Statistics is generally much better at looking at large samples than small samples, and a sample of one is about as small as it gets.

So hopefully we’re starting to get a sense of why proving cancer is caused by the job is way harder than keeping your employer from proving the cancer wasn’t caused by the job. That is, suing your employer when the presumption is that the cancer was caused by the job makes it the employers job to prove causation and not yours. And that makes all the difference.

Impact of the law

Unfortunately, we don’t have a lot of data readily available about how many more cases are won in states with presumption laws than without.

In Pennsylvania, the number of claims has increased since their presumption law was passed.

Before Pennsylvania enacted its presumption law, in July 2011, Kachline said, Philadelphia firefighters with cancer filed only about two or three cases a year. By December 2013, 62 had filed claims. [5]

In Texas, the law seems to be giving firefighters very little of a leg up,

Of 117 workers comp cancer claims filed by firefighters in the state since 2012, 91 percent have been denied, according to the Texas Department of Insurance. [6]

Probability of Causation

Alright so we’ve sort of seen that proving that something caused your cancer is hard, but I want to talk about why that’s true mathematically. Turns out, it’s a pretty interesting question. Here, I’m going to lean heavily on the work of my brilliant, generous heroine Prof. Maria Cuellar of Penn.

Anyone who has had the (mis)fortune to take a stats 101 course has heard of hypothesis testing. We set our “null” hypothesis as sort of what conventional knowledge says about the world. We call the null hypothesis H0 (“H naught”). The alternative Ha (…”H A”) is the new theory we’re trying out. With a presumption law, our null is exactly what we “presume” to be true–that the job caused the cancer:

H0: job caused cancer; Ha: job did not cause cancer

Without the presumption law, this is flipped:

H0: job did not cause cancer; Ha: job caused cancer

As many of us have heard (and some of us have repeated) a hundred times, we can’t prove the null, we can only reject it in favor of the alternative. So in the presumption law setting, we might reject that the job didn’t cause the cancer in favor of the idea that the job did cause the cancer, based on the evidence presented. In the usual setting, we reject the employer’s claim that the job didn’t cause the cancer in favor of the claim that the job did cause the cancer, and we give the employee some money. Whew.

The reason these two settings are so different is that we need sufficient evidence in order to reject the null. In order to fail to reject the null, there just has to be not enough evidence to reject it. The problem is not symmetric. It’s much easier to not have enough evidence than it is to have enough evidence.

Now that we’ve established our null and alternative hypotheses, we have to figure out what we mean by “enough evidence.” Often in hypothesis testing we’ll have a question like, “is the mean of group 1 bigger than the mean of group 2?” This looks like:

H0: E(X1) = E(X2) ; Ha: E(X1) > E(X2)

We can estimate the true means E(X1), E(X2) using the average of the data we have from each group. We reject the null hypothesis if the sample average of group 1 is big enough compared to the sample average from group 2 to make us suspicious that group 1 and group 2 have the same means.

In our case, we aren’t interested in the means of groups. We want to know the probability of cancer for an individual if they hadn’t been exposed. We can’t use sample averages anymore, instead we have to look at something a bit more delicate.

Under the usual laws, you have to prove that the cancer you got was from the exposure. Meaning, if you had not been exposed, but everything about you was the same, you would not have gotten cancer. In causal inference these types of “had X, would have Y” statements are called potential outcomes. So for example,

Y(A=1) = 1 and Y(A=0) = 0

would mean that if I’m exposed (A=1) I get cancer (Y=1) and if I’m not exposed (A=0), I don’t get cancer (Y=0). The employee has to show that the probability of causation (PC),

PC = P( Y(A=0) = 0| Y=1, A=1, X ) 

is high. That is, they need to show that the chances are good that the firefighter would not have gotten cancer if they hadn’t been exposed (Y(A=0) = 0) knowing that this firefighter was exposed (A=1), did get cancer (Y=1) and has some other characteristics (X) like their age, their smoking habits, gender, etc. So our new hypothesis test looks like:

(i) H0: PC < t1 ; Ha: PC > t1

Where t1  is some legally determined threshold for “likely enough.” For example, if t1=.5, the employee wins their case if it seems any more likely that the exposure caused the cancer than that something else did. If t1=.75, it means we rule for the employee if there is at least a 75% chance that the cancer is due to the exposure and not something else.

The reason estimating this is so difficult is that for a person who was exposed (A=1), we can’t see what would have happened if they hadn’t been. Aforementioned heroine Prof. Cuellar has some cool estimands for this problem [7].

In the presumption law case, the employer wants to show that the employee would probably have gotten cancer anyway. Meaning, they want to show

P( Y(A=0) = 1| Y=1, A=1, X )  = 1-PC

is high. That’s what they want to prove, which means we start with the assumption of the opposite: the probability they would have gotten sick anyway is small. This gets translated into a hypothesis test of,

(ii) H0: 1-PC < t2 ; Ha: 1-PC > t2

Which we can just move around terms to make:

H0: PC > 1-t2; Ha: PC < 1-t2

Let’s say t1 = 0.75 and t2 = 0.25. That means case (i)–no presumption law–is:

H0: PC < 0.75 ; Ha: PC > 0.75

And case (ii)–with a presumption law–becomes:

H0: PC > 0.75; Ha: PC < 0.75

confused disney animation GIFJust to reiterate (because I’m confusing even myself here, there’s a reason we’re told to stay away from double negatives) case (i) now means: “I assume the probability that this person would have stayed healthy if they hadn’t been exposed is less than 75%.” To win the case, the employee has to show enough evidence that the chances they would have stayed healthy if they hadn’t been exposed are high.

Case (ii) says, “I assume the probability this person would have stayed healthy if they hadn’t been exposed is at least 75%.” To win the case, the employer has to show that the chances they would have stayed healthy without exposure are relatively low. In some sense these seem symmetric, but statistics are always measured with uncertainty.

So let’s say we Do Math and come up with a best guess of the probability of causation that’s 0.7, and we’re pretty sure the truth somewhere between 0.6 and 0.9. We have this range because 0.7 is just a guess, an estimate. We used modeling and sampled data and whatever else to get to it, so we can’t be totally sure it’s exact. But based on probability distributions, we can get a good idea of what a likely range is. So we’re not sure it’s exactly 0.7, but we’re pretty sure it’s not anything higher than 0.9 or lower than 0.6.

In case (i), this means I don’t have enough evidence to reject the null that the probability of causation is less than 75%, because I said there’s a pretty healthy chance it goes all the way down to 0.6. The employer wins. In case (ii), it means I don’t have enough evidence to reject the null that the probability is greater than 0.75%–it could be as high as 0.9 and I wouldn’t be surprised. The EMPLOYEE wins! As long as the range we estimate isn’t totally outside the null hypothesis, the null wins.

It’s harder to have enough evidence than it is to not have enough evidence.

pew pew finger guns GIF by SpongeBob SquarePants

Hopefully this made sense. The long and short of it is, things like presumption laws should theoretically make it way, way easier for employees to win these kinds of cases. That’s because proving you wouldn’t have gotten sick in the alternate universe where you weren’t exposed is much harder than sitting back and letting the other guy try to prove there’s an alternate universe out there where you got sick anyway.

Thanks for listening, all. Hopefully now there’s an army out there rooting for me to keep listening to podcasts and so I’m obliged to continue.

Sources (I’m being lazy with citations, I’m sorry):

[1] https://www.nbcnews.com/health/cancer/cancer-biggest-killer-america-s-firefighters-n813411

[2] https://www.cdc.gov/niosh/pgms/worknotify/pdfs/ff-cancer-factsheet-final-508.pdf

[3] https://oem.bmj.com/content/71/6/388.full

[4] My mother

[5] Daniels RDKubale TLYiin JH, et al Mortality and cancer incidence in a pooled cohort of US firefighters from San Francisco, Chicago and Philadelphia (1950–2009) 

[6] https://www.pewtrusts.org/en/research-and-analysis/blogs/stateline/2015/12/07/special-treatment-for-firefighters-with-cancer-some-states-say-yes

[7] https://www.houstonchronicle.com/news/houston-texas/houston/article/Despite-Texas-law-nine-in-10-firefighters-with-13144635.php

[8] https://arxiv.org/pdf/1810.00767.pdf

Some other useful info:


Presumptive Legislation for Firefighter Cancer

SF data blog (aka justifying my podcast addiction)

Prediction v Inference v Causal Inference

Maria Cuellar and I were on a long drive back from a conference recently, and to keep ourselves entertained we had a wide-ranging argument about the difference between prediction, inference and causal inference. Yea, this really is how statisticians have fun.

I was confused about where inference fit in the whole story. I figured, prediction is just fitting a model to get the best \hat{Y}, regardless of the “truth” of the model. If I find some coefficients \hat{Y} = \hat{\beta_0} + \hat{\beta_1}X + \hat{\epsilon}, I’m only saying that if I plug in some new X, I’ll predict a new \hat{Y} according to this model. Easy.

If I care what the real relationship is between variables, I’m doing inference, right? That is, I claim Y = \beta_0 + \beta_1X + \epsilon because I think that every increase in X really implies a \beta_1 increase in Y, with some normal error. In other words, I think that when Y was being generated, it really was  generated from a normal distribution with mean \mu = \beta_0 + \beta_1X and some variance. I’ll get confidence intervals around my coefficients and say that I’m 95% sure about my conclusions.

But I’m playing fast and loose with language here. When I say “implies” do I mean “causes”? Most people will quickly and firmly say no to that can of worms. But! when people talk about regression, they will often say that X affects Y–affects is just a different word for causes so… what’s the deal? How is this not (poor) causal inference?

Well, it’s sort of still my impression that it is. But that doesn’t mean there isn’t such a thing as inference that’s totally separate from causal inference.

Inference asks the question — from this sample, what can I learn about a parameter of the entire population? So if I estimate the median of a sample, I can have some idea of what the median is in the whole population. I can put a confidence interval around it and be totally happy. This isn’t the same as prediction and prediction intervals, because I’m not asking about the median for some future sample and how sure I am that I’ll my guess of the median will be in the right range. I’m asking about the real, true, underlying median in the population.

So what about that regression example? Well, inference will say, there is a true \beta_1 in the population, such that if I took (X^TX)^{-1}X^TY I would get back \beta_1. Does that mean that \beta_1 has any real meaning? No. It’s some number that exists and I can get a confidence interval around. But if my model is wrong, the coefficients don’t say anything particularly interpretable about the relationship between X and Y.

All that to say, Maria was right and I’m sorry.

Prediction v Inference v Causal Inference

Not good science

I often read The American Conservative, a conservative outlet which I think is generally careful, smart and honest. I recommend it, especially if you’re a liberal who is looking for another viewpoint.

With that said, this article fails in its interpretation of data. Spectacularly. The author presents this figure:

He then concludes from it: “for communities who wish for their children to remain heterosexual, to form heterosexual marital unions, traditional families, etc., neutrality on the matter of sexuality will result in five to eight times as many people claiming homosexuality or bisexuality as would have otherwise been the case.”

Slow down.

This leap is not warranted. Setting aside any ideological disagreements, the scientific argument being made has a number of statistical issues that anyone who has dealt with data should identify at a glance. They are:

  1. The figure has no confidence intervals — we have no way to know if the trends we are looking at would be wiped out by randomness and/or missingness.
  2. We have no information on missingness, coverage errors or the many other issues that arise with survey taking.
  3. We have no idea how these lines were generated (splines? linear smoothers?)
  4. The figure shows the share identifying as LGB by age, not the number who are LGB. If older people are more likely to call themselves straight regardless of their underlying orientation, we would see the same pattern.
  5. This figure tells us nothing about the cause of the trend. To assert that this figure tells us that “neutrality on the matter of sexuality” is the reason behind any trend shown here is way premature.

The author looked at a figure and jumped to a conclusion he likely already believed, because it seemed to lend some support to his beliefs. I think we are all vulnerable to this kind of thinking. Luckily downer statisticians are here to remind you that a scatterplot of a survey can only tell you so much. And that so much is really not that much.

Not good science

Even stats 101 is better with gifs

Everything is better with a gif.

I made some figures for TA’ing last fall, and I like them. Basically frequentist statistics can seem weird, but computers can sample from the same distribution/population over and over again, so I think that’s a handy way to think of it when people talk about repeated experiments.

In this first one you sample from a distribution that is not Normal (meaning it doesn’t look like a bell curve) a bunch of times and each time calculate the sample average. If you keep track of your sample averages in the figure on the right, they start to look Normal. Magic.




In the second one, you sample from a distribution a bunch of times and each time you calculate the sample average and the 95% confidence interval. The line turns red each time the confidence interval doesn’t contain the true mean (which is 10 in this case). The confidence interval misses about 5% of the time. MAGIC.



Even stats 101 is better with gifs


I’ve recently gone from having very few publications, to having a couple publications so I’m posting them up here.

In the first, we studied stressors on the US ICBM (Inter-Continental Ballistic Missile) force. The second looked at the Los Angeles Fire Department’s hiring practices, which had come under considerable… fire. In the third I lent a small hand looking at publishing trends in China and the last few are some articles I wrote as a fresh-faced college student.


  1. Hardison, C. M., Rhodes, C., Mauro, J. A., Daugherty, L., Gerbec, E. N., Ramsey, C. (2014). Identifying Key Workplace Stressors Affecting Twentieth Air Force: Analyses Conducted from December 2012 Through February 2013. Santa Monica, CA: RAND Corporation, RR-592-AF.
  1. Chaitra M. Hardison, Nelson Lim, Kirsten M. Keller, Jefferson P. Marquis, Leslie Adrienne Payne, Robert Bozick, Louis T. Mariano, Jacqueline A. Mauro, Lisa Miyashiro, Gillian S. Oak, Lisa Saum-Manning. (2015) Recommendations for Improving the Recruiting and Hiring of Los Angeles Firefighters.  Santa Monica, CA: RAND Corporation, RR-687-LAFD. (http://www.rand.org/pubs/research_reports/RR687.html)
  1. Xin S, Mauro J, Mauro T, Elias P, Man M (2013). Ten-year publication trends in dermatology in mainland China. Report: International Journal of Dermatology, 1-5.
  1. Columbia Political Review: “Seeing Through the Fog: San Francisco Provides a Model for Health Care that Works” (http://goo.gl/Fas4t) and “Empowe(red): Ethical Consumerism and the Choices We Make” (http://goo.gl/g7GzF)