Research Blog

By Craig Kolb, Acentric, 18 November 2018

What if you only have space for one question? Perhaps you need accurate representation and you’ve decided to include your question on an omnibus that uses a probability sample. These are expensive relative to online omnibuses, and omnibuses charge per question, so you need to keep it as short as possible. There may be other reasons, but whatever the context, the key problem is that you have multiple questions and budget for only one question. What do you do?

A step-by-step example

Let’s say you have the questions shown below. An awareness question, an ever-purchased question, a grid of agreement ratings (likert), and a purchase intention scale. Now your initial reaction may be that this appears to be a lot to fit into one question! But there is a way around this problem. The solution is to rephrase them to fit within a multiple response question, such that the question wording asks for the respondent to indicate which statements they agree with.

Figure 1: Questions that need to be compressed

 Figure 1: Questions that need to be compressed


Rephrasing questions to compress them into a single question
Now, keeping in mind that you will be including all of your questions within one checklist, they must be phrased in such a way that they make sense in terms of agreement.

Don’t forget to randomise response order (the order of appearance is rotated into a different random order for each survey respondent). This is because attention may correlate with order, and because a minority tend to be too lazy to look at the whole list, and by rotating order you give all statements a chance to be visible at the top of the list. This is especially important with small screens that may involve scrolling.

Figure 2: Compressed into a single multiple response question

Compressed into a single multiple response question
Coding for analysis

There are different ways of coding a multiple response. Categorical and binary. Categorical is fine for just reporting percentages. If you want to analyse relationships between answers however, it becomes very cumbersome. For such situations binary coding is preferable. In Figure 3 categorical coding is shown on the left, with a separate code for each possible response, and a separate column for each response instance. On the right, binary coding is used, with a ‘1’ indicating selection and a ‘2’ (often a ‘0’ – depending on software) indicating the absence of selection.

Figure 3: Categorical versus binary coding

Types of analysis

To analyse the relationships between your binary variables you can take a number of approaches. In order to examine bivariate relationships you could use a Phi coefficient. The Phi in essence measures the degree to which the responses fall diagonally in a 2 x 2 cross-tabulation. The higher the diagonal counts versus off-diagonal, the higher the positive correlation; the more off-diagonal relative to on-diagonal the more negative the correlation. The closer to a balance between on and off diagonal, the closer to zero correlation.

If you really want to squeeze the maximum out of the data collected, you could consider multivariate techniques like logistic regression or clustering using simple matching or some other distance measure suitable for binary data.


There are disadvantages though. Firstly, you lose sensitivity, which may be needed for certain questions. For instance replacing a 5 point likert scale with a binary reduces the sensitivity of the rating. If the pricing model is the same, consider using a likert grid instead of a multiple response question.

Secondly, you introduce awareness of context. You might want this, or perhaps not. But either way, respondents will be aware of the statements next to each other, and so more inclined to make comparisons.

Lastly, fitting long statements into one question might slow respondents down, and attention span may correlate with location in the list. So it is wise to randomize responses.