Marking the Questions

by Haydn Thompson

At the 2007 AGM, I described a system that I had developed for awarding "objective" marks to questions. I proposed this as an alternative to the system we used in the 2006–7 season, when marks were awarded for question balance as well as the traditional"entertainment value". This system, while a good idea in theory, proved ineffective in practice.

The 2007 AGM agreed to use my system for a trial period; it's been used ever since, and is now well into its second decade. The page you are now reading is an edited version of a document that I prepared at the start of the 2007–8 season, to explain how the system works. As I said at the time, we might call these FAQs – although "questions we thought people might ask" might be more accurate!

What is the new system?

The system awards three separate marks:

•	Difficulty
•	Balance
•	Entertainment value

The first two are calculated, as explained below; the third is awarded, as previously, by each team giving the questions marks out of ten on the night.

The three marks are simply added up, and the trophy goes to the team whose questions get the highest total.

How are the marks awarded?

Entertainment value

As stated above, this mark is awarded by each team giving the questions marks out of ten on the night. This is a totally subjective mark, but teams are reminded that the original purpose of the trophy was to reward the most entertaining questions. The Difficulty and Balance marks are as objective as we can make them, but the Entertainment Value mark should reflect how much (or how little) the teams enjoyed answering the questions.

Difficulty

This is the most controversial of the measures, and it's true that the easiest questions aren't necessarily the best. I do believe however that every set of questions (160 per week, plus supplementaries) will contain a range of difficulty, and I'd always prefer the sets that include a few "gimmes" to those that contained a few real stinkers.

To put it another way, I'd rather be asked which team Wayne Rooney plays for (to cite the example that one Manchester United supporter gave during discussion at the AGM) than how far away a tiger's roar can be heard at night, or how many feathers a swan has (both of which have been asked in the past).

People do like easy questions; any analysis of the subjective marks awarded, under either the old (original) system or this later version, will show a strong tendency for easier questions to get better marks.

The way we mark difficulty is simply to award ten points to the easiest questions of the season and zero to the most difficult. Other sets are marked on a pro rata basis; so, for example, those of average difficulty will get 5 out of 10. Calling this a mark for "Difficulty" is in fact a bit of a misnomer; it's actually a mark for Easiness. Difficulty just sounds better, I think.

How does the system decide how difficult each set of questions were?

By comparing each team's score on the night (i.e. the total number of points they scored for this set of questions) with their average over the season. This gives us the "Average score (%)" column in the spreadsheet.

Last season, for example, the easiest set of questions (according to the trial system) were the Ox–fford's General Knowledge questions, where teams scored on average 126% of their average over the season. The most difficult were the Plough Taverners' questions (also General Knowledge), where each team's average score was, on average, 81% of its average. So the Ox–fford scored ten for difficulty, and the Plough Taverners scored zero.

The Sutton Church House's questions were of exactly average difficulty – on average, teams scored exactly 100% of their average. The Church House's questions scored exactly 5 points for difficulty.

Balance

This is the most complicated of the three measures.

As with Difficulty, we start by comparing each team's score on the night with its average over the season. We then take the resultant percentages for each team that went first, and average them out; and we do the same for the teams that went second. This gives us two separate measures of difficulty: one for the teams that went first, and one for the teams that went second.

If these two figures are exactly the same, we have (in theory at least!) a perfectly balanced set of questions. The greater the difference, the more unbalanced were the questions.

In order to rank the sets of questions, we divide the higher figure by the lower. This gives us the "Relative %" column in the spreadsheet.

The most unbalanced questions in the 2006–7 season, according to the spreadsheet, were the Harrington 'B''s General Knowledge and the Castle's Specialists. For both of these sets, the "Relative %" figure was 1.38. In the case of the Harrington 'B', the teams going first scored on average 108% of their average over the season, while those going second scored only 78% of their average. In the Castle's case, the figures were 75% and 104% respectively. (On the Dragons' General Knowledge questions, they were 97% and 98%; on the Albion's, both figures were 93%.)

Each of these sets of questions scored zero for difficulty. Any set where there was no difference between the two average scores (for the teams that went first and the ones that went second) would score 10.

Overall scores

Having worked out the three marks, we simply add them together to give a total out of 30.

The highest score in the 2006–7 season was for the Ox–fford's Specialist questions: 8.92 for Difficulty, 9.13 for Balance, and 7.89 for Entertainment Value, giving a total of 25.95 (all figures are rounded). This was also the most popular set according to the system that was in force during that season, and so won the Cars & Vans 4U Trophy.

So what's the point of all the changes and the arcane calculations?

I was pleased that my system came up with the same winner as the traditional one. I think this shows that previously, we tended to take balance and difficulty into account when giving our marks out of ten. But in the past, I think the marks had lacked focus; if your questions scored a low mark, you could put it down to prejudice. I hope that under the new system, question setters would look at their marks for difficulty and balance, as well as entertainment value, and that this would help us all to set better questions next time.

How does it work in practice?

Mark sends me the match scores each week, separated into Specialist and General Knowledge, identifying which team went first and which team went second in each case. I feed this information into an Excel spreadsheet, which calculates the scores for Difficulty and Balance. I also enter the average scores for Entertainment Value (which Mark also sends me) into the same spreadsheet, which adds the three scores together to give the total. A summary is published on the website each week, in the News & Views and on a page dedicated to the Cars & Vans 4U trophy; and the full spreadsheet is also published on the website each week.

And just one more thing ...

At the start of each season, we obviously have nothing from that season to compare with. We therefore use the previous season's figures for comparison in the first half of each season; so each set of questions is marked according to its position relative to all questions over the whole of the previous season and for the current season so far. At the half–way mark (from Week 9 onwards) we switch to comparing against the current season's marks only.

You are here:

On this page: