Okay, here’s a new feature for the blog: “Wanna see more data? Just ask!”
The way it works is that if anyone sees interesting data in one of my posts, or in any of our studies (assuming it was one I worked on; for others, I’ll pass on requests but don’t necessarily expect an answer; some of my colleagues have actual lives), and has some interesting question that could be addressed by additional analyses, that person can post a request (in comments section or by email to me) & I’ll do the analyses and post the results.
Now notice I said the question has to be “interesting.” Whether it meets that standard is something I’ll decide, using personal judgement, etc. But here are some general, overlapping, related criteria:
1. The request has to be motivated by some conjecture or question. Basically, you have have some sort of theoretically grounded hypothesis in mind that can be tested by the analysis you’d like to see. The most obvious candidate would be a conjecture/question/hypothesis that’s in the sprit of a plausible alternative explanation for whatever conclusion it was that I reached (or the study did) in presenting the data in the first place. But in any case, give some indication (can be brief; should be!) of what the question/hypothesis/conjecture that you are curious about is & why.
2. Tell me how I can do the analysis and why doing it that way can be expected to generate some result that gives us more reason to accept a particular answer to the motivating question, or more rason to accept or reject the motivating hypothesis, than we would have had without the analysis. The “how to do” part obviously will be constrained by what sorts of variables are in the dataset. Usually we have lots of demographic data as well as our cultural outlook variables and so forth. The “why” question requires specifying the nature of the causal inference that you think can be drawn from the analysis. It’s gotta make sense to be interesting.
3. No friggin’ fishin trips! Don’t ask me to correlate global warming with the price of cottage cheese just because you think that would be an amusing thing to do.
4. Don’t even think of asking me to plug every conceivable variable into the right-hand side of a regression and see what sort of gibberish pops out. Of course, I’m happy to do multivariate analyses, but each variable has to be justified as part of a model that relates in a specifiable way to the interesting conjecture motivating the request and to the nature of the inference that can be drawn from the analysis. Or to put it another way, the analysis has to reflect a cogent modelling strategy. Overspecified regression analyses are usually a signature of the lack of a hypothesis — people just see what turns out to be significant (something always will with enough variables) & then construct a post-hoc, just-so story for the result. In addition, the coefficients for overspecified models are often meaningless phantoms– the impact of influences “holding constant” influences that in the real world are never “constant” in relation to those influences…. I’ll write another post on why “over-controlling” is such a pernicious, mindless practice….
Okay. This first installment is responsive to questions posed in response to “part 3” of the GM food risk series. Disccusants there were curious about whether the “middling” mean score for the GM food risk item was best understood as “not sure; huh?,” as I proposed, or as a genuine, mid-level of concern. One suggested seeing some more raw data might help, and on reflection I can think of some ways to look at them that might, at least a bit.
Consider these histograms, which reflect the distribution of responses to the 8-point industrial-strength risk perception item for “Global warming” (left) and “Genetically modified foods” (right):
Here are some things to note. First, GM food distribution is much more “normal” — bell shaped — than the global warming distribution. Indeed, if you compare the black line — the statistical “normal density distribution” given the mean & SD for the global warming data –with the red one — the kernel density plot, which “fits” a locally weighted regression to the data– you can see that the distribution for global warming risk perceptions is closer to bimodal, meaning that the subjects are actually pretty divided between those who see “low risk” and those who see “high.” There’s not so much division for GM foods.
Second, the GM foods distribution has a kind of a fat mid-point (low kurtosis). That’s because a lot of survey respondents picked “3,” “4,” & “5.” Because an excess of “middle choices” is a signature of “umm, not sure” for risk perception measures of this sort, I am now even more persuaded that the 800 members of this nationally representative sample didn’t really have strong views about GM foods in relation to the other risks, all of which were ones that displayed substantial cultural polarization.
But my confidence in this conclusion is only modest. The cases in which a middling mean signifies generalized “don’t know” often have much more dramatic concentrations of responses toward the middle of the scale (high kurtosis); indeed, the labels that were assigned to each point on the likert item risk-perception measure were designed to mitigate the middle/don’t-know effect, which is usually associated with scales that ask respondents to estimate a probability for some contingency (in which case people who don’t know mean to convey that with “50%.”).
Now consider these two figures:
These are the kernel density estimates for responses to these two risk-perception items when the sample is split at the mean of the “individualism-communitarianism” scale. Basically, the figures allow us to compare how “individualists” and “communitarians” are divied on global warming (left) and GM foods (right).
Do you see what I do? The individualists and communitarians are starkly divided on climate change: the latter is skewed strongly toward high risk, and the former toward low (although perhaps a bit less so; if I looked at “hierarch individualists,” you’d really see skewing). That division (which, again, is compounded when the hierarchical disposition of the subjects is taken into account as well) is the source of the semi-bimodal distribution of responses to the global warming item.
Now look at individualists & communitarians on GM foods. They see more or less eye-to-eye. This is corroboration of my conclusion in the last post that there isn’t, at least not yet, any meaningful cultural division over GM foods. (BTW, the pictures would look the same if I had divided the subjects into “hierarchs” and “egalitarians”; I picked one of the two worldview dimensions for the sake of convenience and clarity).
Whaddya think? Wanna see some more? Just ask!
Reference
de Bruin, W.B., Fischhoff, B., Millstein, S.G. & Halpern-Felsher, B.L. Verbal and Numerical Expressions of Probability: “It’s a Fifty–Fifty Chance”. Organizational Behav. & Human Decision Processes 81, 115-131 (2000)
NiV, whose initial conjectures motivated this episode of “Wanna seem more data? Just ask!” offers some observations, the order of which I’ll rearrange a bit.
I was also intrigued by your final pair of graphs, in particular that the red H-I line on the climate change distribution seemed to be bimodal, and more broadly spread. I suspect the histogram would show this even more clearly. While the common stereotype is that it is H-I people are unusually and strongly climate sceptical, that it is now a cultural litmus test; it would be more accurate to say C-E people are strongly believers, while the H-I people are comparatively uniform, covering the whole range of possible opinions with just a leaning towards scepticism. (Of course, the histograms might give a different impression.) Who would feel the greater pressure to conform to their cultural peer group?
I found it interesting because I’ve seen the same thing before in poll results, but never as clearly. I’ve always been more interested in why people believe than what they believe, but it’s not something pollsters usually ask.
I don’t understand why the kernel density individualist lines extended so far to the left of zero, and further than the communitarian lines on the last two plots.
Well, 1st off, the reason the red “individualist” kdensity lines extended back so far behind “0” was my hasty/indecisive use of graphic-dispersion techniques. First, I “shifted” the individualists to the “left” a fixed amount (0.17), so I could plot “overlaid” histograms–the technique used in the graphic below. But then I decided I didn’t like that graphic so much; too blocky, too much color, too much “ink…” So I started to play w/ kernel density plotting w/o the bars. In order to make the density plots smooth over rather than form little bubbles at each point along the 8-point categorical measure, I “jittered” — added some noise to create visual spread around the observations that otherwise were clustering at each point on the Likert scale. But I should have remembered to put back the 0.17 that I had subtracted from all the individualists’ scores — or noticed the funny look that jittering on top of that ! — before jittering to produce the kernel density plots.
2d, on HI vs. EC. The kernel density plots for the individualists & communitarians understates both the magnitude and the asymmetry of the cultural polarization on global warming. “Individualists” include both egalitarian individualists & hierarch individualists (HI)s; “communitarians” both “egalitarian communitarians (ECs)” & “hierarch communitarians.” On environmental & technological risks, the polarization is greatest between HIs & ECs, but I wanted to include the entire sample in the “polarization” graphic, since the histogram graphic had been based on the entire sample.
So you can get a more accurate picture, here is what we see if we just look at HIs & ECs (actually, I would only split the sample up like this for illustrative purposes; for statistical testing, it’s better to estimate the “culture” effects based on the continuous cultural-worldview factor scores as applied to the entire sample).
As you surmised, we can see the bimodal nature of the distribution more clearly this way. But you can also see, now that I’ve removed the less divided HCs & EIs, that culture is exerting a pretty symmetrical effect in driving ECs toward high risk & HIs toward low on global warming.
We can also see that the HIs & ECs are pretty smushed together (as are HCs & EIs, btw) on GM foods.
Here is what you had to say, based on the histograms & kdensity plots, about the issue of whether the middling mean risk score for GM foods — the mid-range placement of it among the various risk perceptions measured — should be read as a collective “mmmm… not sure” or instead as a genuine middling level of concern.
what I was getting at was essentially the spread in the distribution. You said:
“Second, I’d say that I don’t think there’s very much concern about GM foods generally. The “middling” score likely just means that members of the sample didn’t feel nearly as strongly about GM foods as they felt—one way or the other—about the other risks.”But when you look at the distribution, you see a lot of people giving 7s sticking up above the bell curve, (as well as a few people giving 0s,) who are surely people who feel strongly about the issue. Although really strength of feeling is a different question. My point was that the mean is a characteristic of the population as a whole, not individual members of the sample. Some members do feel strongly that it is dangerous, some feel strongly that it is safe, and they cancel one another out on the danger, but not necessarily the strength of feeling. In this case it’s true there are a greater number (but far from all) who appear to be ‘not sure’ and signalled this with intermediate values – I assume this is what you meant.
It’s difficult to be certain, though, because somebody may feel very strongly that the risk is intermediate. Or it may be that some people who didn’t know of (or care about) any danger might signal this by scoring 1s and 0s, on the basis they tend to assume things are safe unless told otherwise. It would be easier to tell if people were asked their strength of feeling, or offered a “don’t know” box, or were allowed to tick several boxes to indicate a range, if they were uncertain. It’s just a thought.
Well, I agree that it’s hard to know what to make of the middling mean after looking at the picture.
Surely, too, some people are scared. But we wtill don’t have a good picture of who they are, much less why. All we know is that the division isn’t the classic cultural one that we are familiar with for climate change, nuclear power, guns, etc.
So we need some more hypotheses to test on what else might explain the GM food risk-perception variance. If it’s hard to explain–if it looks like just a lot of noise– then that would be more weight for the “huh .. say wha?…” interpretation. But if we can form a conjecture about who/why that then gains some support in testing, we’ll have more evidence in favor, now, not of a “uniform level of genuine moderate concern,” but of some sort of more reflective state of anxiety on someone’s part & possibly and likely, too, a corresponding reflective state of nonconcern.
Actually, religiosity is probably a good thing to try testing … But I think I’ll go for a run now!