Revisiting Limitations, Reliability and Validity of Large Database Research

I am often asked for a copy of these remarks I made as part of a presidential panel at the Association for the Study of Higher Education (ASHE) in 2011.  Here is an edited version of that talk.

I know many of you from the applied side of educational research: institutional research.  I don’t think there is anyone I know here from another hat I have worn in the past, which is as a cognitive psychologist and where I first learned the literature on the complications of trying to investigate any human phenomena.  Given the complications, it’s a wonder any of us try at all.

Yet we do.  I cannot imagine what an association for the study of higher education might do if it were not filled with people who, despite the complications, tried to gather comprehensive and systematic data from the many different types of institutions of higher education that we have.  So that we could assess, and attempt to improve the experience for all students who aspire to hang a college diploma or certificate on their wall.

So, I welcome debate on this topic.  Everything we do should be reliable and valid, at least, as reliable and valid as we can make it without being paralyzed by doubt.  I wonder if someone, back in 1965, had said to Sandy Astin, “you know, you better like this CIRP thing, because it is going to dominate your life for the next 40 years…” he would have had second thoughts about this grand plan to inform higher education on the impact that college has on students.

Yet we cannot become like the congress: so divided and convinced in our own certitude that we not only accomplish nothing but anger a lot of people who rely upon us.  I do not want to see educational researchers on the chart that is making the rounds now that shows approval of Congress at 11%, lower than Paris Hilton and BP during the oil spill. 

Let me go back to Sandy.  If there is anything I have learned in the six years of directing CIRP is that you cannot go wrong by referring to Sandy Astin.  CIRP was started to answer big questions.  Big questions that could not be answered by a study over here that had one institution with their questionnaire and a study over there at a very different institution that phrased things differently.  Big questions that cross-sectional design could not and would never answer.  Big questions that needed a lot of little questions to get at the Big answers.  If Sandy Astin had waited until those questions were absolutely perfect and everyone agreed on how perfect they were, well, there would be a big hole in higher education research these days.  And Pat Terenzini and Ernie Pascarella’s book would have only been six inches thick instead of seven.

But, as science does, we build upon the past.

What concerns me about some of the current debate is that it seems to me like some of the debate in congress, which has, as we know, not accomplished anything except making a lot of us really upset at them.  Telling schools not to do NSSE is not the answer. We should recognize the limitations in NSSE. And CIRP.  And every other study of higher education out there.  But also realize that we are in better shape because of how the research from these tools has informed higher education in general as well as hundreds of institutions.

As scholars, sometimes we forget that after all, the ultimate reason for this kind of work is to inform institutional change.  Not get published, not get grants, not get tenure, not get invited to conferences, but to actually have an impact on what our students get out of college.  Results from NSSE, and CIRP, and NCES, and Ernie and Pat’s work. National results get institutions talking about why and how they do what they do.  Having a local version of those results, like CIRP and NSSE offer, provides a great service to institutions that they cannot accomplish themselves.  We should be looking how to improve these tools, not completely tear them down.

As to reliability and validity.  I can tell you a few things about CIRP research.  I can tell you how we have looked at student self-report on things like GPA and SAT scores and find them highly accurate when compared to the actual scores.  I can tell you how we have good correlations of self-report measures of academic self-efficacy and subsequent performance on some standardized tests such as the California Critical Thinking Skills Test.  I could also tell you that I would love to do more of this kind of work. If anyone has a few million out there to help me with that, please come see me after the panel.

For every paper you can show me about how invalid self-report surveys are, I can show another that says they are valid.  The key thing is not a blanket statement that people cannot remember what they did last week and so we should stop asking, but in crafting questions that allow them to answer in a way that they can answer with a certain degree of validity.  But remember, even in physics, there is uncertainty. 

Let’s look at a typical CIRP question that asks for recall. When asking incoming first-year students to reflect on the past year and indicate how often they, for instance, were bored in class, students are given only three response options: “frequently,” “occasionally,” or “not at all.”  Fairly straightforward in themselves as qualifiers, the instructions also specify: “if you engaged in an activity one or more times, but not frequently, mark ‘occasionally’ and go on to tell students to mark ‘not at all’ “if you have not performed the activity during the past year.”  The wording of these questions provides sufficient direction to respondents and not enough latitude to waver off into vagueness.

I have personally administered this questionnaire to thousands of students over many years.  In the room with them, offering to answer questions.  They had questions, but they were more like “I just got my student ID and I don’t remember the number, what should I do?” and “why do they ask these questions?” and “can I go to the bathroom?”  Nobody ever asked me what this section of the questionnaire meant. 

Even so, what level of specificity in results do we really need in order to provide useful information and how should we interpret results?  I am sure that all of you were as eager as I was to get up yesterday morning and read about the new NSSE results.  There is important information in there about a number of things, but let’s take majors.  One of the findings the media picked up on was that on average, engineering majors spent more hours studying than business or, pause for effect, social science majors. To be more specific, engineering students studied an average of 19 hours a week and education majors an average of 15 hours a week.  Do I believe that engineering majors tend to spend more time studying than education majors?  Sure.  Do I for a minute think that the population value if we had a perfect way of recording hours spent studying (putting aside the Heisenberg Uncertainty Principle for a moment which of course tells us that this is never going to happen) that the final all-knowing results would be 19 hours?  Not at all.

But the important piece of information here is the relationship between the groups.  That we can get without putting all the engineers in a box with Schroedinger’s cat.

Another important recognition here is that some questions just cannot be compared against outside standards.  Respondent opinions, perceptions, values, and beliefs about themselves are important aspects of their every day experiences, and have value.  Certainly the questions should be crafted with care, by people familiar with all the potential sources of bias that can impact results.  But, just because perceptions and values are not easily verified, does not mean that they are not important or reliable in predicting student achievement. 

There are scores of studies that examine the connection between perceived campus climate and outcome measures, such as graduation, that are backed up by triangulating with observations and interview studies.  This is why it is a common practice in research to also use multiple questions that examine the same trait from different perspectives to create constructs that attempt to describe a phenomenon.  More sophisticated methods, such as how we at HERI are using item response theory in creating constructs, also have moved the field forward in this regard.

There is a very rich body of literature on survey questions. I encourage those of you interested in this topic to attend the annual conference of the American Association of Public Opinion Research.  They are way ahead of us in the area of survey methodology.  Don Dillman’s work, in particular, is masterful.  These people live and breathe the impact of question wording, response options and even horizontal or vertical formatting of the responses.  But they all believe that if we apply what we know about creating questionnaires, we can effectively utilize questionnaires to collect useful data.

So, in summary, what do I believe?

1)    There are important questions that only large scale surveys can answer.

2)    As with any line of research, there is uncertainty.  We need to recognize it and acknowledge it in interpreting our results.

3)    We need to always move forward in making our measures better.

Some of you might have heard this quote from George Elliot.  We can perhaps let the sexism slide a bit, since George really was Mary Anne, and hope that were she writing today she would do so under her own name and without the male pronoun:

The important work of moving the world forward does not wait to be done by perfect men.

Of course, maybe she was thinking that it would be the perfect women that did it, right?

Let us by all means strive to be perfect. But let us not let our failings in that area mean that we do not continue to try ourselves and support those who battle beside us.

Too Many Applications? Think Again

Do we have a problem with too many high school seniors applying to too many colleges?

That’s what the New York Times thinks.  A front-page article on Sunday (November 15, 2014) about college admissions (Applications by the Dozen, as Anxious Seniors Hedge College Bets) claims that a lot of high school seniors these days are applying to “more colleges than anyone would have previously thought possible.”  The sidebar proclaims that there is “a perfect storm of ambition, neuroses and fear among high school students.” Yikes!

Well, there must be pretty good data behind this, right?  It was on the front page of the New York Times, after all.

To shore up this claim, the reporter cites two high school seniors, one who applied to 29 colleges, and another who applied to 18. Two cases. An N of two. 

OK, that’s the human interest side (we have names, a back story, and in one case, a picture of a young woman on her laptop, presumably writing application number 29).  What else? A high school staff member tells a story of one person who applied to 56 schools.  Naviance (a company that, among other things, has a web-based program that helps high school students with the application process) says that 1 student in the US has 60 colleges they are thinking about applying to. 

So far I am not really impressed. Two interviews with students and two examples of hearsay.

Finally we get some actual data based on more than a few conversations.  The reporter tells us that the National Association for College Admission Counseling (NACAC) has a survey that says that in 1990 nine percent of college freshmen had applied to seven or more colleges, and by 2011 (which the reporter tells us is the most recent data), this had risen to 29%.  Now we’ve got some data. 

Only it’s not quite right, as this is not a NACAC survey, it’s the CIRP Freshman Survey, which NACAC clearly credits on their website as the source. It looked very familiar to me, since I directed the CIRP Freshman Survey for eight years, and provided the information to NACAC at the time.  We would typically have around 200,000 students represented in the CIRP Freshman Survey database each year (note to reporters, that is not a “2,” it’s “200,000”).

While the source is wrong, the numbers cited are correct.

Even though the reporter did not actually use the most recent data or the most relevant data.  Figures for the class entering in fall of 2013 (not 2011) have been released, and the percentage of four-year college first-year students who applied to seven or more schools rose to 31.6. 

But wait, seven schools isn’t what this is about. It’s about 18, or 56, or maybe even 60 if that student using Naviance applies to all the ones being considered.  The CIRP data doesn’t tell us about such high numbers because we topped out the available responses by asking about 12 or more applications. And that’s at 5.9% of the college freshman for 2013.

So make a reasonable guess about how many of those are sending 18, or 56, or even 60 applications.  It’s not very many, is it?  And that same database tells us that the median number of applications per student is still just, well, four. Which seems pretty reasonable.  

Why is this on the front page of the New York Times?  The headline was “Applications by the Dozen, as Anxious Seniors Hedge College Bets.”  And while the article does have quotes from guidance counselors that explain that this is not a good strategy, that wasn’t the headline, was it? Why not have a headline of “A Very Small Number of Anxious Seniors are Sending in Too Many College Applications in a Practice that May Actually Hurt Their Chances of Admission”? The message in the headline is that some seniors are hedging their bets by applying to a lot of colleges. Who doesn’t want to hedge a bet?  That’s good.

But this article is not good. It’s playing on the fears of already anxious students (and as a father with a high school junior, it’s scary to their families too). I expect better from the New York Times.

So, don't worry that we have hordes of students applying to 59 (or 60!) colleges. Worry how to pay for college these days. That's the scary part.

Who Are You, And Who Knows?

On a recent visit to a top research university (ok, it was Caltech), I was struck with how much the institution’s mission was reflected in everything that I experienced there.

Why was a building being renovated? To provide more space to facilitate active learning among the students. Why were academic departments located next to each other with shared space? To encourage interdisciplinary research. Even the trinkets in the bookstore send the message that the connection between research and education is essential to whom they are (I picked up an awesome model of the Mars rover “Curiosity”” made by Hot Wheels there).

Institutional identity drives the kinds of students who apply for admissions, the types of faculty who apply for jobs, which organizations are interested in contributing funding for research, and the benefactors that donate to support that mission. The faculty, students and administrators at this school know their institutional identity and live it on a daily basis.

In a time when colleges and universities are experiencing declining enrollment and financial difficulties, institutions that understand and promote their identity are the ones that stand out among the pack. They attract applications and donations. And yet, despite being so important, most college and university presidents think that their institutional identity is not very well understood by the very people they need to understand it.

In a recently released survey, Gallup and Inside Higher Ed presented college presidents with a list of 12 constituent groups and asked how well they thought the groups understood institutional identity at that president’s college. The results show that most college presidents don’t think any group really understands this crucial element very well. In fact, of the 12, there was only 1 group that more than half of the college presidents believe know the institution’s identity extremely well: administrators. Although barely above half, at 51%, administrators are seen by more presidents as understanding institutional identity extremely well more so than their own trustees (43%), tenured faculty (28%), alumni (26%) and current students (21%). Prospective students, who when they translate into matriculants inject life into an institution both with their presence and their tuition dollars, are rated extremely low, with only 3% of college presidents thinking they understand institutional identity extremely well.

Clearly there is a huge wasted potential here in that many constituents are not perceived to have a very good understanding of what these colleges stand for.  As competition for new students and new sources of funding increases, what attracts people to your institution if they do not know what is special about it? Are you an institution that prides itself on a liberal arts orientation that also focuses the importance of civic engagement? Do you have the attention to student learning like a small college but with the resources of a research university? Or are you pretty much just like the college across town, except they are known for an emphasis on applied learning through college-sponsored internships? If, as a college president, trustee, faculty member or alumnus you are not sure who you are and what you stand for, how will anyone else?  And if nobody does know what you are passionate about, and what distinguishes you from all the other post-secondary institutions in the United States, then what are your chances of existing in the next decade?

You Can't Change What You Can't Measure

This is an adaptation of remarks I made on a panel October 29 at the SUNY Critical Thinking in Higher Education Conference. The panel was billed as "You Can't Change What You Can't Measure" and so I had to slightly disagree right off the bat. 

You can change what you can't measure. But you might not be as effective at that change if you don't measure it somehow.  My career in higher education has, for 25 years, focused on conducting research and evaluation in order to provide information for decision makers at colleges and universities.  At the local level, at Dartmouth, I created the office of student affairs planning, evaluation, and research.  I worked with faculty, administrators, and trustees at all levels of the spectrum of change.  Starting with defining and refining the basic question, conducting research through surveys, focus groups, interviews, and observation and refining what we know, we then create ways to measure if we were successful in making change happen.

People love innovation, but people hate change.  

Change disrupts the status quo. And in higher education, we like the way things are. So there will be those who do not want change. Having data that effectively demonstrates the problem can be very helpful. Be sure to collect that and use it when you communicate why change needs to happen.  Explain your thought process and refer to the research you have done to support your conclusion.  Lead with that.  Don't lead with the end.

In the late 90s at Dartmouth the word came down that the president and trustees wanted to improve student social life. This was in a boom period, and there were millions of dollars coming to this initiative. There were committees and working groups and consultants and I worked with a lot of them providing information from surveys and focus groups on social life. I had detailed data on student behaviors, attitudes, and beliefs from surveys we created in my office as well as that all important comparison data on social life at other schools.  There were plans for new residence halls, a new student center, new dining areas, and new extracurricular opportunities. Great, right? 

Most of that never happened.  

It never happened because instead of leading with “we want to improve student life because it is an integral part of the experience that is 1) impeding student learning 2) not providing enough opportunities for students to experience leadership, and 3) in some cases leading to unhealthy behaviors and attitudes and let me illustrate what the issues are…” the lead skipped all that and was “the fraternity system as we know it will end.”

It was a bold statement. But it never happened. Because that change was then not about why we need to change and how things will be better, it was about emotion.  And one of the more popular college presidents with students and alumni, overnight, became one of the less popular presidents. The administration had been part of the thought process, and some of the faculty, but key stakeholders were omitted from that process and were blindsided.

Dartmouth still has fraternities. And the grand plans were scrapped.  The student life initiative went forward, but it was a shadow of what had been envisioned.

The funny thing was that I had a presentation that lined it all up beautifully.  I had years of survey information that compared us with our peers.  I had detailed local information.  It was one of the most popular presentations I gave.  The alumni office saw my presentation and had me give it to alumni groups, such as the alumni council, when they visited campus.  Alumni loved it. Students loved it. We had great conversations about student life and what could change. (I also had a legendary talk I gave about the same time on the phenomenon of beer pong that at one time was referenced in the Wikipedia entry on beer pong, but that's another story.)

So, lead with why. That is how to persuade people to consider change. 

It also gives you the benchmarks from which to measure change.  It's great to have national data from peer institutions to give perspective to your situation.  It's easy to make abstract statements like: “Our students have academically rigorous experiences here !” In looking at the data, however, I might reply (and this is a fictional case, by the way) "Well, actually, we have fewer students going into graduate school, lower scores on the Collegiate Learning Assessment, and on our senior survey fewer students report academic gains than at our peer institutions.  Our analysis shows that our students study on average four hours less a week and produce 5 fewer papers over 20 pages over the academic year when compared with our peers.  One thing we have noticed though is that students that take more blended learning classes than strict lecture classes report higher levels of engagement and, when we track them, we see they score higher on the CLA than those without those experiences."

Ok.  So now we have laid out the problem not as abstract, but as specific issues.  We want more students going off into grad school.  We want a better demonstration of learning, and we are going to use the Collegiate Learning Assessment as one measure to look at change over time. Those are our outcome measures.  And, this is important, having students study more and write longer papers and take more blended learning courses are not the outcomes.  Those are methods by which you think, base upon research, they will help you achieve your real outcomes.  Don't confuse process with the goal. 

But you need to measure both.  Because that tells you if you are on the right track.  If all you measured was the final outcome, you would have no way of telling which experiences some students had that made it more likely that you will achieve your goal. Because, frankly, sometimes it doesn't matter what the intermediates steps are.  Don't tie your initiative to increasing study time.  That's not the goal. But be sure to measure all those possible behaviors and experiences so you make informed choices on how to reach your goal.  And of course, you need to effectively measure if you achieve your goals.  

Which means measuring change.  Use nationally available tools like CIRP or NSSE to measure change.  Use them longitudinally, with the same students over time, to make sure you are looking at actual change and not comparing different groups.  For instance, if you want to see if you have impacted your students, the survey them coming in with the CIRP Freshman Survey and the survey those same students as seniors with the CIRP College Senior Survey.  Or BCSSE and NSSE.  But don't compare first year students in 2014 with seniors in 2014 and think you are measuring change.  You are not. You are looking at different cohorts. Useful information.  But it doesn't measure change.

And you need to measure change.  From day one. That means getting people who can do that for you and giving them the resources to help you both gauge if you are achieving your outcomes and based upon the data, and what programs or policies you enact make it more, or less likely to be successful.