Hey, everyone! Try your hand at graphic reporting and see if you can win the Gelman Cup!

Score!

Former Freud expert & current stats legend  Andrew Gelman posted a blog (one he likely wrote in the late 1990s; he stockpiles his dispatches, so probably by the time he sees mine he’ll have completely forgotten this whole thing, & even if he does respond I’ll be close to 35 yrs. old  by then & will be interested in other things like drinking and playing darts) in which he said he liked one of my graphics!

Actually, he said mine was “not wonderful”—but that it kicked the ass of one that really sucked!

USA USA USA USA!

Alright, alright.

Celebration over.

Time to get back the never-ending project of self-improvement that I’ve dedicated my life too.

The question is, How can I climb to that next rung—“enh,” the one right above “not wonderful”?

I’m going to show you a couple of graphics. They aren’t the same ones Gelman showed but they are using the same strategy to report more interesting data.  Because the data are more interesting (not substantively, but from a graphic-reporting point of view), they’ll supply us with even more motivation to generate a graphic-reporting performance worthy of an “enh”—or possibly even a “meh,” if we can get really inspired here.

I say we because I want some help.  I’ve actually posted the data & am inviting all of you—including former Freud expert & current stats legend Gelman (who also is a bully of WTF study producers , whose only recourse is to puff themselves up to look really big, like a scared cat would)—to show me what you’d do differently with the data.

Geez, we’ll make it into a contest, even!  The “Gelman Graphic Reporting Challenge Cup,” we’ll call it, which means the winner will get—a cup, which I will endeavor get Gelman himself to sign, unless of course he wins, in which case I’ll sign it & award it to him!

Okay, then. The data, collected from a large nationally representative sample, shows the relationship between religiosity, left-right political outlooks, and climate change.

It turns out that religiosity and left-right outlooks actually interact. That is, the impact of one on the likelihood someone will report “believing in” human-caused climate change depends on the value of the other.

Wanna see?? Look!!

That’s  a scatter plot with left_right, the continuous measure of political outlooks, on the x-axis, and “belief in human-caused climate change” on the right.

Belief in climate change is actually a binary variable—0 for “disbelief” and 1 for “belief.”

But in order to avoid having the observations completely clumped up on one another, I’ve “jittered” them—that is, added a tiny bit of random noise to the 0’s and 1’s (and a bit too for the left_right scores) to space the observations out and make them more visible.

Plus I’ve color-coded them based on religiosity!  I’ve selected orange for people who score above the mean on the religiosity scale and light blue for those who score below the mean. That way you can see how religiosity matters at the same time that you can see that political outlook matters in determining whether someone believes in climate change.

Or at least you can sort of see that. It’s still a bit blurry, right?

So I’ve added the locally weighted regression lines to add a little resolution.  Locally weighted regression is a nonmodel way to model the data. Rather than assuming the data fit some distributional form (linear, sigmoidal, whatever) and then determining the “best fitting” parameters consistent with that form, the locally weighted regression basically slices the x-axis predictor  into zillions of tiny bits, with individual regressions being fit over those tiny little intervals and then stitched together.

It’s the functional equivalent of getting a running tally of the proportion of observations at many many many contiguous points along left_right (and hence my selection of the label “proportion agreeing” on the y-axis, although “probability of agreeing” would be okay too; the lowess regression can be conceptualized as estimating that).

What the lowess lines help us “see” is that in fact the impact of political outlooks is a bit more intense for subjects who are “low” in religiosity. The slope for their S-shaped curve is a bit steeper, so that those at the “top,” on the far left, are more likely to believe in human-caused climate change. Those at the “bottom,” on the right, seem comparably skeptical.

The difference in those S-shaped curves is what we can model with a logistic regression (one that assumes that the probability of “agreeing” will be S-shaped in relation to the x-axis predictor).  To account for the possible difference in the slopes of the curve, the model should include a cross-product interaction term in it that indicates how differences in religiosity affect the impact of differences in political outlooks in “believing” in human-caused climate change.

I’ve fit such a model, the parameters of which are in the table in the inset.

That  regression actually corroborates, as it were, what we “saw” in the raw data: the parameter estimates for both religiosity and political outlooks “matter” (they have values that are practically and statistically significant), and so does the parameter estimate for the cross-product interaction term.

But the output doesn’t in itself doesn’t show us what the estimated relationships  look like. Indeed, precisely because it doesn’t, we might get embarrassingly carried away if we started crowing about the “statistically significant” interaction term and strutting around as if we had really figured out something important. Actually, insisting that modelers show their raw data is the most important way to deter that sort of obnoxious behavior but graphic reporting of modeling definitely helps too.

So let’s graph the regression output:

Here I’m using the model to predict how likely a person who is relatively “high” in religiosity—1 SD above the population mean—and a person who is relatively “low”—1 SD below the mean—to agree that human-caused climate change is occurring.  To represent the model’s measurement precision, I’m using solid bars—25 of them evenly placed—along the x-axis.

Well, that’s a model of the raw data.

What good is it? Well, for one thing it allows us to be confident that we weren’t just seeing things.  It looked like there was  a little interaction between religiosity and political outlooks. Now that we see that the model basically agrees with us—the parameter that reflects the expectation of an interaction is actually getting some traction when the model is fit to the data—we can feel more confident that’s what the data really are saying (I think this is the right attitude, too, when one hypothesized the observed effect as well as when one is doing exploratory analysis).  The model disciplines the inference, I’d say, that we drew from just looking at the data.

Also, with a model, we can refine, extend,  and appraise  the inferences we draw from the data.

You might say to me, e.g., “hey, can you tell me  how much more likely a nonreligious liberal Democrat to accept human-caused climate change than a religious one?”

I’d say, well, about “12%, ± 6, based on my model.”  I’d add, “But realize that even the average religious liberal Democrat is awfully likely to believe in human-caused climate change—73%, ± 5%, according to the model.”

“So there is an interaction between religiosity & political outlooks, but it’s nothing to get excited about–the way somone trained only to look at  the ‘significance’ of regression model coefficients might — huh?” you’d say.

“Well, that’s my impression as well. But others might disagree with us. They can draw their own conclusions about how important all of this is, if they look at the data and use the model to make sense of it .”

Or whatever!

Now.

What’s Gelman’s reservation? How come my graphic rates only “not awful” instead of “enh” or “meh”?

He says “I think all those little bars are misleading in that they make it look like it’s data that are being plotted, not merely a fitted model . . . .”

Hm. Well, I did say that the graphic was a fitted model, and that the bars were 0.95 CIs.

The 0.95 CIs *could* mislead people –if they were being generated by a model that didn’t fairly convey what the actual data look like. But that’s why one starts by looking at, and enabling others to see, what the raw data “look like.”

But hey–I don’t want to quibble; I just want to get better!

So does anyone have a better idea about how to report the data?

If so, speak up. Or really, much much better, show us what you think is better.

I’ve posted the data.  The relevant variables are “left_right,” the continuous political outlook scale; “religiosity,” the continuous religiosity scale; and “AGW,” belief in climate human-caused-climate change =1 and disbelief = 0. I’ve also included “relig_category,” which splits the subjects at the mean on religiosity (below the mean, 1 = above; see note below if you were using “relig” variable).  Oh, and here’s my Stata .do file, in case you want to see how I generated the analyses reported here.

So … either link to your graphics in the comments thread for this post or send them to me by email.  Either way, I’ll post them for all to see & discuss.

And remember, the winner—the person who graphically reports the data in a way that exceeds “not wonderful” by the greatest increment– will get the Gelman Cup!

Leave a Comment

error: