Recalled behaviour is not fact

There are essentially two types of forms in the world:

Administrative forms
These forms are predominantly used to collect data at an individual level, in order to provide a good or service.
Survey forms
These forms are predominantly used to collect data for aggregation, in order to understand the actions or perspective of a group.

For example, a form for renewing a driver's licence is an administrative form: it enables an individual to ensure their driver's licence remains current. A form asking for opinion on the way the government is handling a particular issue is a survey form: the government wants to know the feelings of the constituency as a whole, not follow up with individual voters.

This article we are going to focus on survey forms, which, in our experience, contain broadly 3 types of questions:

Demographic
Items like sex, weight, height, employment status, marital status etc.
Behavioural
Questions that ask the respondent what they did in the past or are planning to do in the future.
Opinion
Questions that ask the respondent what they think or feel.

Often you will hear people call the second type of question "factual", referencing the notion that these questions collect information that is less amorphous and changeable—and therefore more measurable—than thoughts or emotions.

Behavioural survey questions under the microscope

While the behaviours themselves may be stable and concrete—and even that is not always the case—measuring them is certainly not. This is because such measurement is so heavily reliant on the ability of the respondent to access the information from their memory.

This point was brought home by the little known and hard to get book "Survey Responses: An Evaluation of their Validity" by Ellen J. Wentland and Kent W. Smith. This book contains a meta-analysis of all eligible studies, from the previous 40 years, that have compared behavioural survey results to corresponding records. (Not all studies from the period were eligible because, for example, their authors may not have provided enough information about their comparison methodology.)

The authors go into the findings of each comparison study in great detail, and the results are often surprising. For example, one study by Clancy, Ostlund & Wyner (1979) involved showing magazine subscribers fake advertisements as well as fake articles, both brief and full-length. The subscribers were then asked whether they had read any of the ads or articles in the current issue of the magazine. Amazingly, "76% of the respondents claimed to have read at least one advertisement, 55% at least one brief article , and 64% at least one full-length article, all of which had actually not yet been published" (p.26, emphasis added).

One could certainly argue that a social desirability and/or acquiescence effect have influenced these results. Nonetheless, such error rates should at least give pause to all the market researchers out there. Moreover, such factors cannot entirely account for the findings of another study quoted by Wentland & Smith.

This 1955 study involved asking British public servants about any sick leave they had taken in the previous four and a half months, with comparisons made to human resources records for the same period. The study found that out of the 228 people who had taken sick leave in reference period, 11% said they hadn't taken any sick leave, 10% gave the correct month but the wrong amount and 47% gave the correct amount but the wrong month. Furthermore, out of the 205 workers who had not taken any sick leave in the period, 6% reported that they did.

The reliance on recall

In the case of this second study, the likely cause of error is forgetting: there is no real reason for individuals to remember such details as the amount of sick leave they have taken when there are systems that do this for them. But all too often our forms include questions for which the answers are similarly inaccessible to our respondents. Add this to the fact that responses are going to be influenced by other considerations—such as the desire to conform to social norms—and hopefully it becomes clear that behavioural questions are rarely factual.

Think this is starting to read like a storm in a tea cup? You might be surprised to learn that only 15% of the questions included in Wentland & Smith's meta-analysis yielded a 95% confidence interval that contained the true population result (p. 126). That's only 12 questions out of 79 that resulted in survey estimates that were in the ballpark of what was really going on. So even after spending a wealth of resources on ensuring representative samples, developing unbiased questions, minimising processing errors and so on, the people using the results of the other 65 questions would have been led astray.

This is not a call to throw up our arms and turn our backs on forms and surveying. Rather, the aim of this article is to draw more attention to the degree to which cognitive factors influence the quality of responses our forms achieve. Much as decision makers may not like it, we need to be more realistic about what data can be reliably collected on a form.

References

Wentland, E.J. & Smith, K.W. (1993). "Survey Responses: An Evaluation of their Validity". Academic Press, Inc., California.