Revisiting Limitations, Reliability and Validity of Large Database Research

I am often asked for a copy of these remarks I made as part of a presidential panel at the Association for the Study of Higher Education (ASHE) in 2011.  Here is an edited version of that talk.

I know many of you from the applied side of educational research: institutional research.  I don’t think there is anyone I know here from another hat I have worn in the past, which is as a cognitive psychologist and where I first learned the literature on the complications of trying to investigate any human phenomena.  Given the complications, it’s a wonder any of us try at all.

Yet we do.  I cannot imagine what an association for the study of higher education might do if it were not filled with people who, despite the complications, tried to gather comprehensive and systematic data from the many different types of institutions of higher education that we have.  So that we could assess, and attempt to improve the experience for all students who aspire to hang a college diploma or certificate on their wall.

So, I welcome debate on this topic.  Everything we do should be reliable and valid, at least, as reliable and valid as we can make it without being paralyzed by doubt.  I wonder if someone, back in 1965, had said to Sandy Astin, “you know, you better like this CIRP thing, because it is going to dominate your life for the next 40 years…” he would have had second thoughts about this grand plan to inform higher education on the impact that college has on students.

Yet we cannot become like the congress: so divided and convinced in our own certitude that we not only accomplish nothing but anger a lot of people who rely upon us.  I do not want to see educational researchers on the chart that is making the rounds now that shows approval of Congress at 11%, lower than Paris Hilton and BP during the oil spill. 

Let me go back to Sandy.  If there is anything I have learned in the six years of directing CIRP is that you cannot go wrong by referring to Sandy Astin.  CIRP was started to answer big questions.  Big questions that could not be answered by a study over here that had one institution with their questionnaire and a study over there at a very different institution that phrased things differently.  Big questions that cross-sectional design could not and would never answer.  Big questions that needed a lot of little questions to get at the Big answers.  If Sandy Astin had waited until those questions were absolutely perfect and everyone agreed on how perfect they were, well, there would be a big hole in higher education research these days.  And Pat Terenzini and Ernie Pascarella’s book would have only been six inches thick instead of seven.

But, as science does, we build upon the past.

What concerns me about some of the current debate is that it seems to me like some of the debate in congress, which has, as we know, not accomplished anything except making a lot of us really upset at them.  Telling schools not to do NSSE is not the answer. We should recognize the limitations in NSSE. And CIRP.  And every other study of higher education out there.  But also realize that we are in better shape because of how the research from these tools has informed higher education in general as well as hundreds of institutions.

As scholars, sometimes we forget that after all, the ultimate reason for this kind of work is to inform institutional change.  Not get published, not get grants, not get tenure, not get invited to conferences, but to actually have an impact on what our students get out of college.  Results from NSSE, and CIRP, and NCES, and Ernie and Pat’s work. National results get institutions talking about why and how they do what they do.  Having a local version of those results, like CIRP and NSSE offer, provides a great service to institutions that they cannot accomplish themselves.  We should be looking how to improve these tools, not completely tear them down.

As to reliability and validity.  I can tell you a few things about CIRP research.  I can tell you how we have looked at student self-report on things like GPA and SAT scores and find them highly accurate when compared to the actual scores.  I can tell you how we have good correlations of self-report measures of academic self-efficacy and subsequent performance on some standardized tests such as the California Critical Thinking Skills Test.  I could also tell you that I would love to do more of this kind of work. If anyone has a few million out there to help me with that, please come see me after the panel.

For every paper you can show me about how invalid self-report surveys are, I can show another that says they are valid.  The key thing is not a blanket statement that people cannot remember what they did last week and so we should stop asking, but in crafting questions that allow them to answer in a way that they can answer with a certain degree of validity.  But remember, even in physics, there is uncertainty. 

Let’s look at a typical CIRP question that asks for recall. When asking incoming first-year students to reflect on the past year and indicate how often they, for instance, were bored in class, students are given only three response options: “frequently,” “occasionally,” or “not at all.”  Fairly straightforward in themselves as qualifiers, the instructions also specify: “if you engaged in an activity one or more times, but not frequently, mark ‘occasionally’ and go on to tell students to mark ‘not at all’ “if you have not performed the activity during the past year.”  The wording of these questions provides sufficient direction to respondents and not enough latitude to waver off into vagueness.

I have personally administered this questionnaire to thousands of students over many years.  In the room with them, offering to answer questions.  They had questions, but they were more like “I just got my student ID and I don’t remember the number, what should I do?” and “why do they ask these questions?” and “can I go to the bathroom?”  Nobody ever asked me what this section of the questionnaire meant. 

Even so, what level of specificity in results do we really need in order to provide useful information and how should we interpret results?  I am sure that all of you were as eager as I was to get up yesterday morning and read about the new NSSE results.  There is important information in there about a number of things, but let’s take majors.  One of the findings the media picked up on was that on average, engineering majors spent more hours studying than business or, pause for effect, social science majors. To be more specific, engineering students studied an average of 19 hours a week and education majors an average of 15 hours a week.  Do I believe that engineering majors tend to spend more time studying than education majors?  Sure.  Do I for a minute think that the population value if we had a perfect way of recording hours spent studying (putting aside the Heisenberg Uncertainty Principle for a moment which of course tells us that this is never going to happen) that the final all-knowing results would be 19 hours?  Not at all.

But the important piece of information here is the relationship between the groups.  That we can get without putting all the engineers in a box with Schroedinger’s cat.

Another important recognition here is that some questions just cannot be compared against outside standards.  Respondent opinions, perceptions, values, and beliefs about themselves are important aspects of their every day experiences, and have value.  Certainly the questions should be crafted with care, by people familiar with all the potential sources of bias that can impact results.  But, just because perceptions and values are not easily verified, does not mean that they are not important or reliable in predicting student achievement. 

There are scores of studies that examine the connection between perceived campus climate and outcome measures, such as graduation, that are backed up by triangulating with observations and interview studies.  This is why it is a common practice in research to also use multiple questions that examine the same trait from different perspectives to create constructs that attempt to describe a phenomenon.  More sophisticated methods, such as how we at HERI are using item response theory in creating constructs, also have moved the field forward in this regard.

There is a very rich body of literature on survey questions. I encourage those of you interested in this topic to attend the annual conference of the American Association of Public Opinion Research.  They are way ahead of us in the area of survey methodology.  Don Dillman’s work, in particular, is masterful.  These people live and breathe the impact of question wording, response options and even horizontal or vertical formatting of the responses.  But they all believe that if we apply what we know about creating questionnaires, we can effectively utilize questionnaires to collect useful data.

So, in summary, what do I believe?

1)    There are important questions that only large scale surveys can answer.

2)    As with any line of research, there is uncertainty.  We need to recognize it and acknowledge it in interpreting our results.

3)    We need to always move forward in making our measures better.

Some of you might have heard this quote from George Elliot.  We can perhaps let the sexism slide a bit, since George really was Mary Anne, and hope that were she writing today she would do so under her own name and without the male pronoun:

The important work of moving the world forward does not wait to be done by perfect men.

Of course, maybe she was thinking that it would be the perfect women that did it, right?

Let us by all means strive to be perfect. But let us not let our failings in that area mean that we do not continue to try ourselves and support those who battle beside us.

Too Many Applications? Think Again

Do we have a problem with too many high school seniors applying to too many colleges?

That’s what the New York Times thinks.  A front-page article on Sunday (November 15, 2014) about college admissions (Applications by the Dozen, as Anxious Seniors Hedge College Bets) claims that a lot of high school seniors these days are applying to “more colleges than anyone would have previously thought possible.”  The sidebar proclaims that there is “a perfect storm of ambition, neuroses and fear among high school students.” Yikes!

Well, there must be pretty good data behind this, right?  It was on the front page of the New York Times, after all.

To shore up this claim, the reporter cites two high school seniors, one who applied to 29 colleges, and another who applied to 18. Two cases. An N of two. 

OK, that’s the human interest side (we have names, a back story, and in one case, a picture of a young woman on her laptop, presumably writing application number 29).  What else? A high school staff member tells a story of one person who applied to 56 schools.  Naviance (a company that, among other things, has a web-based program that helps high school students with the application process) says that 1 student in the US has 60 colleges they are thinking about applying to. 

So far I am not really impressed. Two interviews with students and two examples of hearsay.

Finally we get some actual data based on more than a few conversations.  The reporter tells us that the National Association for College Admission Counseling (NACAC) has a survey that says that in 1990 nine percent of college freshmen had applied to seven or more colleges, and by 2011 (which the reporter tells us is the most recent data), this had risen to 29%.  Now we’ve got some data. 

Only it’s not quite right, as this is not a NACAC survey, it’s the CIRP Freshman Survey, which NACAC clearly credits on their website as the source. It looked very familiar to me, since I directed the CIRP Freshman Survey for eight years, and provided the information to NACAC at the time.  We would typically have around 200,000 students represented in the CIRP Freshman Survey database each year (note to reporters, that is not a “2,” it’s “200,000”).

While the source is wrong, the numbers cited are correct.

Even though the reporter did not actually use the most recent data or the most relevant data.  Figures for the class entering in fall of 2013 (not 2011) have been released, and the percentage of four-year college first-year students who applied to seven or more schools rose to 31.6. 

But wait, seven schools isn’t what this is about. It’s about 18, or 56, or maybe even 60 if that student using Naviance applies to all the ones being considered.  The CIRP data doesn’t tell us about such high numbers because we topped out the available responses by asking about 12 or more applications. And that’s at 5.9% of the college freshman for 2013.

So make a reasonable guess about how many of those are sending 18, or 56, or even 60 applications.  It’s not very many, is it?  And that same database tells us that the median number of applications per student is still just, well, four. Which seems pretty reasonable.  

Why is this on the front page of the New York Times?  The headline was “Applications by the Dozen, as Anxious Seniors Hedge College Bets.”  And while the article does have quotes from guidance counselors that explain that this is not a good strategy, that wasn’t the headline, was it? Why not have a headline of “A Very Small Number of Anxious Seniors are Sending in Too Many College Applications in a Practice that May Actually Hurt Their Chances of Admission”? The message in the headline is that some seniors are hedging their bets by applying to a lot of colleges. Who doesn’t want to hedge a bet?  That’s good.

But this article is not good. It’s playing on the fears of already anxious students (and as a father with a high school junior, it’s scary to their families too). I expect better from the New York Times.

So, don't worry that we have hordes of students applying to 59 (or 60!) colleges. Worry how to pay for college these days. That's the scary part.