Academia.eduAcademia.edu

Prediction science Puleston Hofkirchner

2014, ESOMAR Congress

INTRODUCTION This paper is an exploration of the science of prediction. It examines what we are good and bad at predicting and why; what influences our ability to make reliable predictions and what the differences arise in the data when we ask questions predictively. It provides advice on the best ways to ask predictive questions and the best techniques for conducting predictive research. It is based on the learnings from over 30 research on research experiments conducted over a three year period, the meta analysis of over 500 predictions and a review of the prominent academic literature published on this topic. We hope this will act as a guide for the market research industry wishing to exploit prediction protocols in their research.

Page 1 – CONGRESS 2014 Copyright © ESOMAR 2014 PREDICTING THE FUTURE PRIMARY RESEARCH EXPLORING THE SCIENCE OF PREDICTION Jon Puleston • Hubertus Hofkirchner • Alexander Wheatley INTRODUCTION This paper is an exploration of the science of prediction. It examines what we are good and bad at predicting and why; what influences our ability to make reliable predictions and what the differences arise in the data when we ask questions predictively. It provides advice on the best ways to ask predictive questions and the best techniques for conducting predictive research. It is based on the learnings from over 30 research on research experiments conducted over a three year period, the meta analysis of over 500 predictions and a review of the prominent academic literature published on this topic. We hope this will act as a guide for the market research industry wishing to exploit prediction protocols in their research. BACKGROUND: THE ROUTE FROM GAMIFICATION INTO THE WORLD OF PREDICTION SCIENCE In 2010 we began to explore the role of gamification in market research and started to play around with guessing games in surveys where respondents had to predict things and bet virtual money on different choices. We quickly discovered that respondents loved to play games like these, especially if there was an outcome where they could find out if they were right or wrong (see figure 1). FIGURE 1. Fun 100 80 60 40 20 0 Normal survey Prediction game We observed that participants would invest much more effort into answering these types of questions because of an innate desire to win. In turn we were often able to demonstrate remarkably improvements in data quality from these predictive game style questioning techniques. With advertising evaluation tasks for example we found that a switch from monadic rating to a prediction investment game delivered data that would correlate at a level of 0.89 with traditional monadic rating but with five times the level of differentiation (figure 2). Page 2 – CONGRESS 2014 Copyright © ESOMAR 2014 FIGURE 2. Evaluation of 20 different ads Monadic rating Performance prediction: Net Investment 400% 300% 200% 100% 0% -100% -200% We found that using this approach the same answers could be reliably derived using less than half the sample (see figure 3). FIGURE 3. So we began to experiment with switching all sorts of questions in surveys to more predictive approaches. As examples: Instead of asking people what they thought of about a brand we asked them to predict what other people thought; instead of asking them to quantify what they had done we asked them to predict their behaviour in the future; instead of asking people what they would be prepared to pay for products we asked them to predict the selling price; and instead of asking if respondents were aware of brands, we would show them a facet of the brands logo or strapline or de-branded ad and ask them to predict the brand. We started to amass quite a large volume of data on prediction protocol in surveys that we use to directly compare with more traditional questioning techniques. This process began to throw up a number of questions about how the data compared to traditional research questioning techniques. So began our journey into the world of prediction science more generally. To investigate this whole topic further we decided to embark on a series of larger scale dedicated prediction experiments where we got groups of research participants to make a diverse range of predictions, including predicting football scores, the weather, the price of consumer goods, who would win TV game shows, product sales figures and political & economic forecasting. Over the course of 12 weekly experiments we amassed over 500 predictions. Our aim was to understand what we are good and bad at predicting and what influences our ability to make good prediction. This dalliance with the topic of prediction lead inevitably into exploring the amazing world of predictive markets trading, which sits at the pinnacle of prediction methodologies. We started thinking about how prediction market trading techniques could be integrated into more mainstream research. This led to a collaboration with Prediki, a prediction markets software company and series of over 50 micro experiments comparing the predictions made in traditional market research with prediction markets trading. Page 3 – CONGRESS 2014 Copyright © ESOMAR 2014 This paper is a summary of what we have learnt along this journey. Whilst we have to our own standard done quite a considerable amount of research into this subject, this paper must be read as very much a scouting exploration of this topic. We have looked horizontally across the field of prediction, rather than digging too deeply into any one issue. As a result I think we amassed as many new questions as we have answers coming out of all this research. Furthermore, it only took a short while delving into the science of prediction in preparing for this paper, to realise that we were a bit of a rowing boat approaching an oil tanker of research conducted in this area by other very eminent researchers in both the commercial and academic arenas. Any paper on this topic must pay tribute to the brilliant exploratory work already done by Brainjuicer into prediction markets and the astonishing work of Philip Tetlock, who has been studying this topic for a couple of decades in the academic arena where he has amassed data on 15,000+ geo-political and economic predictions, and the work of Nate Silver who single-handedly has made prediction science into a cool topic for conversation at dinner parties. So for those already familiar with this topic, we do hope there is something fresh to learn from this paper, for those not familiar with the world of predictions, hang onto your hats – I predict you are going to find some of this paper interesting reading. UNDERSTANDING THE DATA So we do we start? By taking a step back and look at what we have learnt from prediction experiments we have run. This much we know. The maths of prediction The quality of a prediction is based on a number of dependent variables including the quality of information available to make the prediction, the effort invested in analysing the information, how difficult it is to interpret this information, the objectivity of the person analysing the information and ultimately the inherent randomness of the event. The formula looks like this: Prediction Quality=Information × Effort × Objectivity × (1-Difficulty) × R You might note as a market researcher that this equation is not directly dependent on sample size, i.e. one well informed, objective person with access to the right information and the skills to interpret this information objectively is potentially able to make a perfect prediction (e.g. Nate Silver’s achievement of predicting all 53 US state election results in the most recent presidential election.1) What are we good at predicting? To understand this better, we analysed the predictive power of our respondents across topics ranging from price prediction of consumer products, predicting the behaviour and opinions of other people, forecasting things like the weather and what people would purchase in the future, alongside of range of predictions that they could only really use guesswork to solve, like which products will achieve the highest value at an upcoming auction. In total we were able to aggregate around 400 different predictions under seven classifications. (See figure 4.) FIGURE 4. Page 4 – CONGRESS 2014 Copyright © ESOMAR 2014 What is clear is that we are better at predicting some things than others. In particular, perhaps not surprisingly, as we are human beings who have evolved to socially interact with each other, the thing we are best at is predicting the behaviour and opinions of other people, though we could not be described as perfect at it! What we are poorer at as individuals, is making more complex estimates that require more analytical thinking like estimating consumer purchasing behaviour and the price of things. Predictions feed off knowledge and information The problem with most predictions is the scarcity and often scattered nature of the information needed to make the predictions; in many cases no one individual making a prediction in isolation has enough information to make an accurate prediction and some people have more knowledge than others. Our ability as individuals to predict the price of household products using prompted option selection is for example is on average only 8% better than random (in a series of test we showed people four different price estimates for a range of household goods and asked participants to predict the price) On the other hand groups of people can potentially make better predictions than single individuals, with the right aggregation mechanisms, predictions can be a lot more accurate down to a phenomena popularly described as the wisdom of the crowd. This was first discovered by Francis Galton in the early 1900s and made famous in James Surowiecki book the Wisdom of the Crowds. He asked a crowd of people at a county fair to guess the weight of an ox and discovered that their median average prediction was within 1% of the ox’s actual weight. Academics have done a lot of work looking at the wisdom of the crowd process (e.g. Surowiecki 2005; Simmons et al. 2011) which has lead us to an understanding that each person’s prediction is made up of two components, information and error. If each individual’s judgement is independent and unbiased then the error will largely cancel itself out and the aggregation process then distils off the inherent knowledge. Philip Tetlock has been able to statistically quantify the wisdom of crowds of experts by analysing the 15,000+ medium and long term political and economic forecasts he has gathered. He estimates that a Baysian style aggregation of the opinions of a group of predictors is on average at least 15% better than any single individual’s prediction. His book Expert Political Judgement is one of those must reads for anyone wishing to get up to speed on the science of prediction. In an attempt to bring this original wisdom of the crowd experiment up to the 21st century we asked our respondents to try and predict the UK selling price of the new Apple iPad mini a month prior to its launch. We informed them of price of the existing model and asked them to predict how much the new one would retail at by typing the value into an open ended box (to avoid anchoring effects). Only 10% of respondents were able to get within 5% of the actual selling price, but the collective wisdom of the crowd produced a prediction that was within 1% (see figure 5). FIGURE 5. Median average guess = £316 Actual selling price = £319 Error = 1% Please note crowds can also be stupid Out of pure curiosity, we also challenged our respondents to predict the weight of an ox (as per the original wisdom of the crowd experiment) using a similar technique, but today as we have no real collective knowledge of a cow’s weight, many of our panellists may not have even seen a cow close up, and their collective prediction was woefully inaccurate out by 36% (see figure 6). Page 5 – CONGRESS 2014 Copyright © ESOMAR 2014 FIGURE 6. 1906 Plymouth County fair Median average guess = 1207 lb Actual weight = 1198 lb Error = <1% 2014 online survey Median average guess = 350kg Actual weight = 550kg Error = 36% And this is an important point; a crowd is only wise if it has some collective basis of knowledge and adequate information to work from to make its prediction. Galton’s original experiments were conducted amongst farmers, who all brought along some experience and knowledge about the weight of cows, and without collective knowledge the crowd can be plain stupid. This is probably the most important consideration to think about when asking groups of people to make predictions in market research - what knowledge can they bring to bear collectively on the topic or what information can we provide them to help them make an accurate prediction. Crowd predictions can be a bit behind the times When you ask people to predict the price of existing products in the market, our price predictions are often based upon years of historical interactions with these products, and so our memories of prices tend to be a few years out of date. We also have built in experiences of discount in some areas which can corrupt our memory of the actual retail price of things. We have asked participants in our experiments to predict the price of a whole range of household products and we have consolidating data from over 30 of these price tests and we estimate the average price prediction is 9% lower than the average selling price. Figure 7 shows a range of these different price predictions to give you an idea of how they can vary. FIGURE 7. We have also observed significant differences between men and women in their price predictions. Men’s price predictions are on average 6% lower than women for household goods, perhaps evidence that they don’t do so much shopping. In the luxury arena the differences can be even stronger (see figure 8). For example women’s expected price of a diamond ring is 40% higher than men’s and actually exceeds real selling price so there are some other emotional factors involved and not just experience. We are not alone in observing these differences, recent work by BDRC Continental2) has shown the women consistently attach a higher value to branded products across a range of sectors than men. Page 6 – CONGRESS 2014 Copyright © ESOMAR 2014 FIGURE 8. Price prediction actuary of luxury goods Price prediction accuracy of household goods 94% 100% 87% Females Males 150% 80% 60% 100% 40% 50% 20% 0% leather sofa 0% Women Men diamond ring Chanel No5 perfume There are also some minor age groups differences: younger people, perhaps because of lack of experience and older people because of longer histories of purchasing products for less money, both predict prices slightly lower but these differences are relatively minor by comparison. Tip: Median, mode or average for calculating the wisdom of crowds? In Francis Galton’s original wisdom of the crowd experiments, the figures were estimated using the median prediction to avoid the corruption of irrational outliers, say for example someone guessing a cow weighs less than a feather. Today it’s easier using Excel to deal with outliers and we found having first removed the top and bottom 10% of outliers produces a slightly more reliable prediction than the median. We tend to assume that the crowd is not quite as smart as we are when we make predictions When we asked people to predict the future of a range of mobile phone brands back in 2010, just prior to the rapid decline of the Blackberry phone brand, the rise of Samsung and the already apparent decline of Motorola, whilst many of the respondents were personally able to predict the future success and failures of these three brands, when we asked people to predict what other people would think, they assumed the crowd was less informed than themselves and they predicted that others would think Motorola and Samsung’s futures were flat and Blackberry was more likely to grow (see figure 9). FIGURE 9. Blackberry Motorola Samsung Personal Prediction Personal Prediction Personal Prediction Predict what others think Predict what others think Predict what others think Down flat Up Down flat Up Down flat Up This is an important consideration as many of the most successful survey gamification approaches we have explored utilise projective prediction protocols. However there is a solution we have found to this, which is a change of the question, away from asking people to predict what the crowd thinks, to predicting what a group of “experts” thinks. Without collective knowledge or information predictions can be easily steered off course by tiny nudges and anchoring effects To make predictions we make use of just about any information that is available, often subconsciously. When there is no adequate information our predictions can often be influenced by very small prompts and nudges often subconsciously processed which anchor people’s opinions. This has been well catalogued in behavioural science and is a known cognitive bias, the common human tendency to rely too heavily on the first piece of information offered (the "anchor") when making decisions. Page 7 – CONGRESS 2014 Copyright © ESOMAR 2014 A good example of this can be observed by simply asking any room full of people to predict whether a coin toss will be heads or tails. On average you will find because what presumably is simple order anchoring bias heads is mentioned first, 70% of people have a tendency to choose heads. Therefore basing your prediction on the groups collective prediction in this way would contain a massive system error 70:30 bias towards an outcome that we all know is actually 50:50. To explore just how sensitive predictions are to anchoring effects and nudges we conducted a series of experiments where we changed the wording of a question to emphasise one choice or another. We found the simple switching of a question wording to emphasise one choice or another could shift predictions by up to 20%, a far more significant shift than we had anticipated (see figure 10). FIGURE 10. What percentage of people prefer white wine? What percentage of people prefer cats? What percentage of people prefer dogs? What percentage of people prefer red wine? 54% 46% 46% 54% 49% 42% 51% 58% White Red Cats Dogs How predictions are influenced by other people’s opinions We then began to look at how other people’s opinions could affect predictions. What impact does one person’s opinion have on another. If for example a person was asked to predict which ad would be the most popular ad, A or B, and they were told that the other people thought ad A was better, how many people’s opinions would be changed by this nudge? (We are defining a “nudge” as a push towards one opinion or another as a result of some information or external influence.) So we conducted a very simple experiment to look at this. We asked a control group to make a series of predictions independently and then in a test group we “nudged” them in various ways with the opinions of other people, either telling them what one other respondent predicted, or showing them a comment specifically supporting one choice or another. In total we tested out opinion nudges across 20 different topics and found that on average 9% of opinions could be shifted by a single other person’s opinion or comments but the effect did vary significantly and was linked to the available amount of information and received knowledge needed to make the judgement in the first place and the level of certainty. Figure 11 provides some examples. FIGURE 11. Type of prediction Prediction Question Votes Certainty Complex calculation More have visited America than watch Eastenders? 30/70 Low Nudge effect 20% Self evident People prefer ad A or B 58/42 Low 9% Self evident People prefer ad C or D 70/30 High 2% Observed opinion Think more people prefer celebrity Clarkson or Cowell 53/47 Low 18% Observed opinion Think more people prefer celebrity Gaga or Bolt 52/48 High 2% Expert Think we will have cured Cancer by the year 2050 56/44 Low 9% Expert Think we won't have visited mars by 2050 70/30 Low 6% Observed behaviour Predict if more people drive car A or B 54/46 Low 27% Observed behaviour Predict more people bank A or B 65/35 High 0% Very roughly averaged out there appear to be some observable trends that need a lot more research to properly quantify. However here are some insights that may be worth exploring further: If the problem is complex and difficult to work out, nudges have a greater influence. Predictions based on extrapolating our own feelings are a lot less influenced by nudges than predicting inverted personal preferences, e.g. asking respondents to guess which of two celebrities other people least dislike out of Simon Cowell & Jeremy Clarkson, two UK Page 8 – CONGRESS 2014 Copyright © ESOMAR 2014 celebrities who are renowned for being disliked by some people. When the subject is less emotional and more factual we rely on the nudges more than when we make self-evident judgements like speculating about which ads people prefer. An example of this would be asking whether more people drive a Ford Focus or VW Golf. (See figure 12.) FIGURE 12. A way to think about this is that information is like gravity, a tiny nudge in a gravity-less environment will send an object spinning into outer space. The more information and knowledge we have the more force is needed to knock us out of orbit. Tip: You could think about a nudge as a way of testing out how reliable a prediction is. By measuring how much opinions change based on a simple nudge you have a means of assessing how stable your prediction is. Now this experiment simply explored the influence of just one person’s opinion on another, you can imagine the impact that a crowd of people’s opinions might have and this we touch upon later in the paper. Our personal perspective on things can dominate and badly distort our predictions We tend to think that the majority of people agree with us and we base a lot of our predictions on this assumption. To illustrate this point, in another experiment we asked people to make a range of predictions about what life might be like in the year 2050, presenting them with some prophecies and they had to say if they agreed or disagreed. We then asked these two separate groups of people to predict how many would agree with their personal judgements. To put a figure on it the average person assumes that 60% of other people agree with their own point of view (see figure 13). FIGURE 13. Believers Believe this 100 90 80 70 60 50 40 30 20 10 - How many people believe this Non believers Don’t believe 100 90 80 70 60 50 40 30 20 10 - How many people don't believe this Page 9 – CONGRESS 2014 Copyright © ESOMAR 2014 Interestingly those people that held the negative minority viewpoints in this experiment tended to assume that a higher proportion of people agreed with them (75% for non-believers vs. 55% believers) which Philip Tetlock might have something to say about as he has identified that people with firmer opinions are less able to predict things generally, and it is also interesting to see how stable this perception is across topic. Political polling is also good example of this. We asked a group of 500 of our respondents to predict who will form the next UK government and their predictions were so heavily biased towards their own political affiliations you may as well just have asked which way they will vote (see figure 14). FIGURE 14. So one of the challenges when designing prediction protocols is working out how to encourage respondents to be more objective. One of the best weapons to achieve this, as we will see this later on in the paper, is giving people the chance to win real money for making objective predictions. Crowds can also be cognitively challenged The crowd’s wisdom can also be badly disrupted by cognitive biases…our emotions in particular can get badly in the way of making objective predictions. This can be beautifully exemplified by studying the football score predictions of “fans” of teams and comparing these to non-fans. When for example we asked English people to predict the results of the England vs. Montenegro and Germany vs. Republic of Ireland, they could not help but envisage a bigger victory for England than Germany. Yet if you look at the relative rankings, Germany was statistically more likely to win by a higher score line than England in these matches (I am sure this experiment could be emulated in every football playing nation in the world, a rather over-optimistic viewpoint on a home nation’s footballing performance) . Our UK respondents predicted:   England win by a 3 goal margin over Montenegro Germany win by a 1 goal margin over Republic of Ireland Looking at domestic football matches there we observed the same emotional bias as figure 15 illustrates. No supporter of a team it seems can ever envisage losing by more than two goals no matter how good the opposition is even if they are 20 places apart in the league and vice versa very few fans of a dominant team can ever envisage losing to a lower placed team. Often these cognitive biases shake out in the wash but if a group of people are all subject to the same cognitive bias then there can be major network errors corrupting group prediction. Page 10 – CONGRESS 2014 Copyright © ESOMAR 2014 FIGURE 15. RESULT PREDICTIONS BY DIFFERENT FAN GROUPS Newcastle Liverpool Chelsea 60% 60% 40% 40% 20% 20% Cardiff 0% 0% new by 3 new by 2 new by 1 draw Southhampton liv by 1 liv by 2 liv by 2 chel by 3 chel by 2 chel by 1 Man U Arsenal 50% 60% 40% 50% 30% 40% 20% 30% draw car by 1 car by 2 car by 3 Norwich 20% 10% 10% 0% man by 3 man by 2 man by 1 draw south by south by south by 1 2 3 0% ars by 3 ars by 2 ars by 1 draw nor by 1 nor by 2 nor by 3 The area where we are most emotionally biased is in assessing ourselves A group of students attending a meeting were asked to predict if they would tidy up. Fifty percent said yes. Another group was asked to predict how many people in total would clean up and their average prediction was 15%. When they studied exactly how many students tidied up it was close to 15%. What we have here is a strong cognitive bias towards having a nice opinion of ourselves. We all like to think that we would be the type of person who would tidy up after a meeting. We see evidence of this type of bias scattered across market research surveys, will you vote, will you recycle your household good etc., how much alcohol do you drink, etc. There is also simple acquiescence bias to content which corrupts most “would you buy this product” prediction. We can get so tangled up in our emotions that it makes it difficult to objectively judge our own behaviour. Parents and friends are apparently far more able to predict the long term future of a couple’s relationships3) than the couples themselves, We also struggle to see the woods from the trees. In one of our own experiments we asked a group of men why they liked football, and got back rational answers like: it’s exciting and dramatic, they like the skill and the thrill of winning. When we asked a group of women why men like football, one woman observed that “it gives them something to talk about in the pub”, and this is probably something closer to the truth. It’s not all about emotions though, it’s also about being able to differentiate the signal from the noise This is the basic tenet of Nate Silvers book The Signal and the Noise which is another must read for any market researcher. He cites several examples in his book including one famous behavioural economics experiment by Gerd Gigerenzer who observed that people in Germany are more able to predict the relative size of two neighbouring Californian cities than people in California itself. Our opinions, motivations and our behaviour are often obvious to other people standing back from the noisy details, however we so often cannot see ourselves objectively because we are unable to differentiate the real main driving factors from all the noise of subconscious feelings and thoughts. Likewise, the reasons why Californians cannot so easily tell the difference between too neighbouring cities is that they hear a lot of “noisy” mentions and references to both of them all the time in their state. But in Germany it is only the signal from the loudest city that people can hear. Figure 16 illustrates re the results from one of our experiments where we asked people to rate how important different factors influenced their personal choice of shampoo and we then asked a separate group of respondents to predict how influential different factors were on buyers in general. You will notice for example self-assessment underplays the value of advertising, packaging and branding and places more emphasis on price and ingredients. It is difficult to know in that sense exactly how important these different factors are but it is clear that asking only the personal perspective does not give a 100% objective viewpoint. We all for example like to downplay the influence of advertising on ourselves. Page 11 – CONGRESS 2014 Copyright © ESOMAR 2014 FIGURE 16. REASONS FOR PURCHASING SHAMPOO Our predictions can tell us what we instinctively think & feel Used in a different way, predictions can tap into our unconscious feelings too and help us evaluate brands in a more objective way. Figure 17 illustrates an example from an experiment where we asked our respondents to rate a range of banks and then asked which bank they personally used and as you can see the ratings were quite closely correlated. The rating was as much as anything a proxy measure of their relative market size which is not particularly insightful. We then asked them to predict which bank would most likely to be around in the year 2050 and this revealed some nice, more subconscious feelings about the strengths of each bank and a significantly different pattern of answers emerged. FIGURE 17. Mainly bank with? Mainly bank with? Rating recommend 7 point 300% 200% 100% 0% barclay co-op first-… halifax hsbc lloyds metro nation… natwest rbs santa… tsb Predict which single bank will be around in year 2050 (single) barclay co-op first-direct halifax hsbc lloyds metro nationw… natwest rbs santander tsb 200% 150% 100% 50% 0% On a more practical level the simple use of prediction protocols to assess the relative value of products is extremely useful and in many respects far more valuable than standard monadic price tests (see figure 18). FIGURE 18. PREDICT THE PRICE OF THESE SHAMPOOS Page 12 – CONGRESS 2014 Copyright © ESOMAR 2014 We can predict the woods but it’s harder to predict the trees! Whilst predictive viewpoints might allow us to more objectively observe other people’s behaviour and reveal what we think about things (the woods), what we are less able to anticipate predictively is a lot of the detail and the subtleties of personal taste and thought (the trees). This can be beautifully illustrated by data from an experiment where we asked one group of people to go virtual shopping, and another to do this predictively; we asked them to try to predict the choices of a close friend of family member (see figure 19). FIGURE 19. WHICH INSTANT COFFEE WOULD YOU BUY? Throughout this process we added in a range of varying offers to see which offers stimulated the biggest uplift in sales and we compared the impact of the offers from those making personal to those making predictive judgements. Whist the overall brand selection choices from the personal and predictive choices were pretty similar in 9 out of 10 case, what was significantly different was the impact of the promotions. There were more significant uplifts in sales for the promotions for the more expensive and self-indulgent products when making personal choices and when making projective choices respondents responded to offers on bigger brand name goods more often. (See figure 20.) FIGURE 20. Now you can spend some time trying to understand why this is the case, but what is clear is that we are not able to predictively pick up on some of the emotional decisions people make and it’s very useful to see both answers in context to each other. This is a process of differentiation that is useful in many pieces of research. Differences like this are commonly seen whenever you compare in detail the personal and predictive perspectives. See in figure 21 the results from another experiment where we asked people to pick what words they associated with certain mobile phone brands. We then compared this to a group of respondents who were asked to guess the words other people would have picked. Page 13 – CONGRESS 2014 Copyright © ESOMAR 2014 FIGURE 21. Many of the words closely correlate, but the words we projectively select might be described as more “marketing speak” phases – cool stylish, user-friendly are all phrases in many respects imposed on us by marketers. We are also a lot more negative in the personal dimension When we asked people to pick words they associated with some advertising messages and a separate group to predict which words would be selected you can see that people are a lot more negative in the personal dimension, picking words like boring & annoying more compared to the predictive perspective where people selected positive words more often like curious and exciting. (A lesson perhaps for us all when assessing what other people will think about the things we produce!) See figure 22. FIGURE 22. WORDS CHOSEN TO DESCRIBE ADS HOW TO WORK THE CROWD In the second part of this paper we will explore the best technique to use to extract reliable predictions from a crowd. The three key issues when considering prediction protocols in your research are how to motivate, how to educate and how to weight the feedback from the crowd. It all starts with understanding how to properly ask a prediction question. How to ask a good prediction question The key to asking a good prediction question is in objectively pitching the problem and depersonalising it. Ask a crowd a stupid question it will deliver back a stupid answer The problem with the predictions quality from many of the questions we ask in traditional market research technique really boils down to the way we ask the question. Questions about who will win the election, whether the respondent will clean up after this meeting, whether you will buy this product, or heads and tails, have also seen all contain either emotion, cognitive or system biases of one form or another. To nearly every question we ask people in surveys the answer is affected by one or other of these types of bias. Turn questions into problems that you ask participants to solve Many basic prediction biases can be solved by simply changing the way you pitch the question, away from being a question into a problem that you challenge people to solve. Page 14 – CONGRESS 2014 Copyright © ESOMAR 2014 You could ask people “heads or tails?” or ask the question like this: “if a coin were tossed 100 times how many times do you think it will come up heads?” This is now a clear problem that everyone can solve. The cognitive bias involved in asking people to predict if they will tidy up is solved by pitching this question as a problem “predict how many people in the meeting will tidy up”. It sounds obvious but if you read through a lot of surveys you will see how rarely this simple piece of thinking is used. The trick really is moving away from thinking of a respondent as a bit of information, to treating them as a networked group of brains. Tip: Never ask a question without considering which problem you are trying to solve by asking it. What is the problem you are trying to solve when you ask people “How do you rate this brand?”? Information is key A simple shift of question from “Will you buy this product?” to “predict whether people will buy this product” potentially deals with personal cognitive biases, but does not deal with the lack of information needed to make such a prediction. To make an effective prediction it would be helpful to know a range of things: who this product is targeted at, what they are currently purchasing and how this much the product would cost compared to others, what makes this product different or better, etc. Without this information we rely totally on our gut instincts which may or may not be helpful. Take this example from an experiment we conducted to predict which of these four cups would sell the most (figure 23). FIGURE 23. CAN YOU GUESS WHICH OF THESE MUGS SELLS THE MOST? The mug participants predicted would sell the most was the blue camping mug. But this prediction was actually wrong, badly wrong. Examining the manufacturer’s sales figures revealed that the blue camping mug sold the least, it was the Green Gardening and red DIY mugs that were the best sellers. We wanted to find out why and so conducted a second experiment this time where we asked respondents what they would buy themselves, but this revealed that the blue mug was their personal preference too. So why the error? We talked to the person who sells these mugs and found out that the reason was that they are mostly purchased as gifts, e.g. for friends who are gardeners and male partners as a “thank you” for doing DIY jobs. Now not many people thought about this fact when making their decisions. So we conducted a third experiment when we added “please note that these mugs are often purchased as gifts” to the end of the question. And you can see in figure 24 the impact this had, their predictions were still not perfect, but as a result of adding in this information “nudge”, it certainly became closer as people began to speculate which of these mugs would make better gifts. Without this information they were basing their judgements mostly on their own feelings. The information helped steer the respondent toward making a better more objective prediction. FIGURE 24. Page 15 – CONGRESS 2014 Copyright © ESOMAR 2014 What this example hopefully illustrates is that without access to the right information, predictions can be badly steered off course and information is probably the key ingredient to gathering accurate prediction. It also underlines that people need to think much more than perhaps they normally would do in a survey which underlines the importance of the incentives you offer to participants to do this. Extracting wisdom from the crowd and sharing it around! In the absence of all the hard facts and evidence, say for example you are testing out a new product and you don’t have any information to know how popular it will be, in these situations thoughtful debate and argument can also help educate a crowd. This in many respects is the essence of a focus group – the cross communication helps educate the group to enable them to make better collective judgments. So many of the most efficient decision making protocols rely on discussion and debate. This for example is essentially how parliaments make decisions and indeed elections are won (you could see an election as the country’s collective prediction on who will most effectively run a country in the future). Imagine a board of directors all silently making their decisions: no, they all discuss and share their thoughts before making a decision and this is a valuable part of the decision-making process. Yet so much of what we do in quantitative research is devoid of any information sharing amongst the crowd and this could be seen as a real shortcoming of traditional market research techniques when it comes to making effective predictions. Encouraging crowds to self-generate the insights needed to solve prediction conundrums What corrupts most predictions is the guesswork individual people have to do. Often amongst a crowd there is a massively misbalanced distribution of knowledge. Some people may be sitting in isolation with pockets of knowledge that aggregation can tease out but think how potentially more effective a prediction process would be if this information was shared around. Information sharing has the potential to make crowds become wiser. For this reason there is potentially a significant role for a moderator to become involved in effective prediction protocols to help stimulate this debate, this is another point that we will address when we come to discussing predictive markets trading later in this paper. Dialectical bootstrapping There has been some significant academic work by Herzog and Hertwig (2009) looking into the process of allowing crowds to improve their own prediction through a process of reasoned debate and self learning and they have demonstrate that these techniques can improve the quality of predictions. This process has been given a wonderful name called “dialectical bootstrapping” which basically means self learning through dialogue. Realising the importance of motivation & reward feedback In most good predictions we are asking people to think, and want them to think as hard as they can about a problem or issue – motivation is critical for this. One of the most important weapons we have to motivate people to make a correct prediction is to tell them afterwards if they got it right or wrong and the experiment illustrated in figure 25 is a beautiful demonstration of this. We have designed a way of measuring brand awareness in the form of a prediction multi-choice quiz. Respondents play a game to guess the brand from a visual clue. In an experiment to test this out we asked one group the questions without any feedback as to how well they had done, and told a second group after each question if they were right or wrong and challenged them to see how many they could get correct. To make it more competitive we told them the faster they answered the more points they could accumulate. FIGURE 25. Page 16 – CONGRESS 2014 Copyright © ESOMAR 2014 As you can see the people answering the survey in the standard way with no feedback became bored with the task and the time they invested in each subsequent prediction steadily declined and this was reflected in the quality of their predictions. Those given feedback and encouraged to make their decisions faster actually spent slightly more time and gave more accurate answers. FIGURE 26. Note: Imposing time limits encourage “System 1” gut reactions Some predictions, particular things like advertising evaluation or brand recognition style research benefit from respondents making quick gut reaction predictions using their System 1 thinking processes. Figure 26 illustrates the simplest way to do this is by rewarding respondents more for responding faster, the reward being giving more points for reacting faster, or challenging them to see how many evaluations they can do in a fixed period of time, say 30 seconds. Both these techniques we have found to be very effective weapons. Gamifying the whole process of prediction We have applied this thinking to whole surveys and the simple “find out in the next survey how you got on in this survey” has proved to be an immensely successful means of gathering ongoing predictions from a group of respondents over a series of separate surveys and keeping them motivated to give thoughtful predictions. This is the technique we have used to gather many of the predictions in this very paper. We have named this process “surveytainment”.4) Enhancing the wisdom of the crowd by carefully weighting individual predictions Nate Silver famously cottoned on to the fact that if you go one step further and not just aggregate the predictions of the crowd, but weight individual prediction source based on their historical reliability you could make much more reliable predictions. In the case for example where he was predicting election results, he gathered together a set of other forecasters’ predictions, looked at how reliable each forecaster had been in the past and then weighted each prediction based on the forecaster’s historical prediction accuracy. Hey presto! He was able to come out with super forecasts that distilled the wisdom of a crowd of forecasts that was far more accurate than any one single poll – by about 20%, he estimated. In the academic world of prediction markets this is something that has been known for quite some time too and several papers have been published on this topic. One or two are listed in the appendix. One of the most impressive of these papers only recently published called “The Wisdom of Smaller, Smarter Crowds” (Goldstein et al, 2014) looked at thousands of fantasy football league predictions and demonstrated that crowd wisdom could be significantly improved by looking at the historical prediction performance and aggregating the predictions of the best historical predictors.5) Page 17 – CONGRESS 2014 Copyright © ESOMAR 2014 So we in theory could borrow this thinking to make predictions in traditional market research more accurate too. To do this though, what we need to do is identify and differentiate the good and bad predictors. Identifying good predictors We started exploring the possibilities of this by asking the question “is there such a thing as a good predictor or not in the types of research we were doing?” As part of this research we conducted a weekly series of prediction experiments using the same group of respondents, so we were able to track their prediction performance over the course of the experiments. What we found was there was clear stratification of performance of groups of respondents, the top group of people consistently outperforming the bottom groups (figure 27). FIGURE 27. PREDICTION PERFORMANCE OVER 7 WAVES What makes a good predictor? We then started to think about how we could identify good and bad predictors using some sort of personality test, with the thought of cherry picking the good ones or weighting the individual’s predictions. Certainly Philip Tetlock has been able to isolate a personality type that is able to make stronger political and economic long-term forecasts.6). He calls these groups of people Foxes, the type of people who are a bit more open minded and have fewer fixed opinions compared to Hedgehogs who are often more outspoken in their views and clouded by inflexible opinions. We wondered if we could find a way of segment out panellists like this. (Before you get too excited, the answer is no we have not be able to do that so far in terms of pure personality traits in the one or two experiments we have conducted so far. It’s not to say that is not possible but we have not been able to pin point anything yet.) What we have observed in our data is that men generally appear to be slightly better at making accurate predictions than women by a margin of around 10% (figure 28). Now this is one of those statistics that we would definitely not want to be cited out of context, the predication experiments we have run so far have been undertaken in a limited range of topics which may well favour the male point of view and we have found prediction skills do vary considerably by topic. As we have seen, when predicting the price of consumer goods women were better. FIGURE 28. INDEX OF AVERAGE PREDICTION SCORES BASED ON C200 PREDICTIONS What we have also divined is that when people make faster and more instinctive decisions they seem to make slightly more accurate estimates for self-evident predictions like judging advertising. But for many of the more complex type of predictions, increased consideration time leads to much more accurate predictions. An example would be predicting the behaviour and opinions of other people. Page 18 – CONGRESS 2014 Copyright © ESOMAR 2014 Making correct predictions requires a funny cocktail of thinking processes. We often use heuristic as short cuts when the thinking process is too complex to calculate analytically. This issue has been well explored in the work of Daniel Kahneman’s Behavioural Economics experiments and it basically boils down to the role of and importance of System 1 and System 2 thinking protocols for making instinctive and considered judgements and it is to do with the volume of information needed to make a calculation. There is a point for example when working out the price of a bat and ball7) that System 2, cognitive calculations can outfox our System 2 guesswork, but there is another point where it is so difficult to make an actual calculation that System 1 gut instinct tends to become more useful, like estimating the number of pennies in a jar. To explore this we conducted an experiment to measure the innate thinking style of individuals so see how better or worse they were at predicting what products people would buy. We asked our respondents to complete a standard self efficacy personality test which is a way of measuring styles of thinking and looked at whether the high and low scorers in this test performed any better at making these predictions. This experiment did not really pull apart many measurable difference I am afraid, perhaps because of the confined nature of the task. So more work is needed to understand this area better. Prediction confidence However we have found a good proxy way of identifying better predictors is simply to ask people how confident they are in the predictions they have made! The most confident predictors were indeed a third more accurate at making predictions (figure 29). FIGURE 29. PREDICTION ACCURACY The only problem with this is that less than 10% of any one sample sticks their neck out to say they were very confident so it’s not so practical in reality. What we discovered as a better and more usable proxy measure for confidence was how much people would be prepared to bet in virtual gaming exercises. As outlined at the start of the paper, we conducted an experiment where we gave people a budget and asked them to bet money on whether they thought an ad would be popular or not. As we had data on over 100 ads, we were able to do some basic correlation analysis looking at the relationship between how much respondents bet and the success of the outcome. The results were interesting (see figure 30). As we forced people to bet something, betting small positive amounts was the default choice of people who were not very confident in their choice, and as you can see, their results correlated far less with the outcome than people who committed larger virtual sums. Interestingly it was actually the small negative bets that correlated most with the overall outcome and excessive bets were slightly less successful than the more measured bets. FIGURE 30. BET AMOUNT: CORRELATION WITH OUTCOME Page 19 – CONGRESS 2014 Copyright © ESOMAR 2014 The value of this insight becomes important later when we discuss prediction markets trading. What are the best methodologies for gathering predictions? The next big question is what is the best methodology to gather predictions and this is where we move into the world of ‘prediction markets’ techniques. Without doubt prediction markets sits at the panicle of prediction methodologies. Made famous by the Iowa Electronic Market research,8) prediction markets is a process of gambling essentially on the outcome of events. Understanding the different “prediction market” techniques in traditional research But firstly for those not totally familiar with this technique it may be helpful to explain about some of the different approaches. We tend to bandy around the term “prediction markets” rather loosely and there are some specific distinctions between different prediction markets techniques that are worth clarifying.  Basic prediction markets in traditional market research: The prediction markets process pioneered by the likes of Brainjuicer in traditional research is a process of allowing participants in research study to buy or sell virtual “shares” in an option choice or idea or sets of ideas. You then compare the net amount times participants bought or sold shares in each idea and the volume of trading activity. This approach to decision making has been shown convincingly to deliver greater differentiation of opinion that standard monadic rating approaches.9)  Gamified prediction markets: This can be enhanced by giving people a virtual trading budget and allowing them to decide how much virtual money to invest in each idea and then gamified by letting people know at the end of a trading process how much money they have won or lost. The competitive element as we have shown earlier has proven to be a very effective means of encouraging more active participation in the task.  Prediction markets trading: The key difference between standard prediction markets question protocols in survey and proper prediction markets trading is that in a trading process firstly there is normally real money to be won for making correct predictions and the other main difference is that the price of shares in the opinion choices increases or decreases depending on the preference for one opinion or another like in a stock market. If people “invest” in an idea its value goes up. The value of this is that it helps to self-correct predictions. For example if you saw that the odds on heads and tails was stacked 70:30 in favour of tails you would be incentivises to bet on tails, in this way the market would drive the odds naturally to 50:50.  Open or dark markets: Some prediction markets technology also allows participants to see the trading value trends in the price of choices and allows free exposure to comments, debate and information that will allow participants to make more informed decisions which as we have explained earlier is very important. Participants are encouraged to explain their decisions as evidence. If participants are unable to see what others have traded and said, referred to as a “dark market”, you could describe the prediction market approaches used in traditional market research as a “dark market” technique compared to “open” prediction markets trading. There are some issues surrounding open market trading to do with the overt influence individual opinions can have in steering the market – a phenomenon known as the herd effect. As we have seen earlier in this paper when participants are making tough predictions with little information to guide them they tend to rely more heavily on other people’s opinions and this can lead to the opinions of a few people leading those of the follow-on participants. Think of it as cumulated nudging, one person’s opinion influencing the next and this causes a cascade of opinion. It’s all about the money! The big difference between prediction markets in traditional research and prediction markets trading is really all about the rewards. When I answer a survey question it really does not matter one way or another to me what answer I give, I get a reward for simply doing the survey and this as we have shown can render certain types of predictions like “predict who will form the next government” fairly useless. We have seen that making the process more rewarding by gamifying the process can dramatically improve the levels of effort put into answering questions in a traditional survey and in terms of the quality of answers, but in prediction markets trading it goes a step further by offering real financial rewards, not for simply participating but for getting the predictions correct. Page 20 – CONGRESS 2014 Copyright © ESOMAR 2014 How do the answers compare between these techniques? Comparing prediction market trading to traditional market research prediction protocols We wanted to understand more about the relative benefits of using predictive markets trading to make prediction forecasts compared to using traditional market research techniques. So to do this we set up a unique series of experiments in collaboration with Prediki which is a company that provides an open prediction market trading platform. We started out undertaking some small scale experiments where we would ask respondents in a survey to make some predictions and then directed them into prediction markets where we asked them to make the same prediction for a prize. We tested a range of topics from prediction products market share and reasons for purchase. From what we could see they generally appeared to generate the same results, see figure 31. FIGURE 31. But there were some notable exceptions, in testing out advertising, for example, we did seven head to head tests, on five occasions they both generated the same predictions but in two the prediction markets trading threw up different answers. The one big difference between the two techniques was the relative time and effort put into making the predictions and the numbers of people involved. In a survey it was around on average three seconds thinking time and 150+ people making the predictions. But in the prediction markets the number of people trading was much smaller, around 30 people on average but these participants would often return several times over to revise their trades and refine their predictions. The average respondent revised their predictions three times based upon the movements in the markets and the dialogue and responded very rapidly to any extra information we provided (figure 32). FIGURE 32. The opinions in predictive markets trading in most cases were reached very quickly (like trading chart in figure 33). Page 21 – CONGRESS 2014 Copyright © ESOMAR 2014 FIGURE 33. Some would be subject to some quite significant fluctuations of options in the first couple of days of trading and some quite heated debate (see figure 34) and once reached were very stable unless we added in more information and then we found opinions could move often quite rapidly. This was probably down to the nature of the questions we were asking about brands, unlike for example a political based prediction market where every day there may be new external factors influencing the market. FIGURE 34. EXAMPLES OF THE VOLUME OF DEBATE ASSOCIATED WITH THIS FLUCTUATING TRADE Page 22 – CONGRESS 2014 Copyright © ESOMAR 2014 To investigate further we decided to embark on a larger scale experiment where we asked participants to make a series of predictions in a traditional market research online survey and in the markets trading platform as before, but this time we collaborated with a couple of retailers and asked respondents to predict which consumer products generated the most sales. This meant we could make a judgement comparison against real sales data. To make things more comparative we also examined the statistical reliability of predictions from like for like samples. This is a summary of our findings. First out of the box advantage to prediction market trading In straight head-to-head comparisons using match sample sizes we explored the accuracy of micro samples of 15 random respondents making predictions in traditional surveys with the accumulated predictions of the first 15 trader’s predictions in 32 separate prediction market experiments. The reason why we chose such small samples is that many of the most successful prediction markets tend to operate with small samples of this scale. In the case of the famous Iowa Electronic Market predictions, many of these came from groups of less than 20 predictors. The average prediction accuracy of 15 randomly selected respondents10) in the traditional research was 55%. By random chance there was a 37% probability of them getting these predictions correct. The average prediction accuracy of the first 15 trader in prediction markets trading was 65%. The inherent advantage of prediction market trading was coming to play as people with the most confident opinions were betting more money on their choices and this helped to self-weight the answer. How much you bet is a measure of how confident you are and as we have seen confidence does appear to correlate with prediction accuracy and the extra effort and thought participants put into the process. Second stage advantage to traditional market research However where we started to evaluate larger samples and more trades the prediction accuracy of the prediction markets began to fall behind traditional market research. As we included more sample the prediction accuracy of traditional market research crept ahead reaching the point where 75% of the predictions were correct. We found that after around 20 people had made their trades and without any additional input of information into prediction markets, the opinions tended to solidify like concrete as adding more sample did not shift opinions much at all and so the prediction accuracy did not improve. Prediction accuracy got stuck below 70%. This could be seen as a direct example of the impact of herd effects. In these experiments we had not imposed any dark trading periods and so what we were observing was the influence of other people’s opinions diverting some of the “weaker” markets. Herding has been recognised as one of the key shortcoming of prediction markets for several years. In most of these predictions there was very little real information to go by other than the strength of the internal debate. So the most valuable information people had to go by is what the rest of the crowd thinks and once a crowd’s collective opinion has been established it’s hard to shift it. Even though you could bet against the market, once the odds reach a certain length it takes a bold gambler to bet against them. Increasing the pool of information From the work conducted in academic arenas into prediction markets trading it’s known that increasing the information supply up front and invoking a dark trading period at the start of the markets can help alleviate the herding effect. Also encouraging more participants to explain their decisions through more active moderation allows the systems to make more “rational” decisions and this can help too. Third stage advantage to Prediction markets trading To test the value of adding more information into predictive markets we set up a separate series of 16 prediction market trading experiments that we pre-populated them with some of the real comments and opinions that we had gathered from the traditional market research and encourage active discussion about their choices. This indeed did improve the prediction accuracy. Across the 16 tests using the same micro samples of around 15 participants, they were able to make 13 correct predictions, that is an aggregated prediction accuracy of 80%, pushing it ahead of the traditional research approach. Obviously this is just one simple experiment, but does provide some evidence that the information was helping the crowd make more effective decisions and there was a motive for those with confidence to use this information and think about it to profit; testing out the mugs, for example, they were able to correctly predict that the green mug would be the best seller Page 23 – CONGRESS 2014 Copyright © ESOMAR 2014 with comments like this, “Gardening is a popular pastime a garden tools type mug would be a good present”. (See figure 35.) FIGURE 35. Whilst adding information certainly improved things the predictions we were still not satisfied with a prediction accuracy of 80%, herd effects were still having a measurable influence we felt. We wanted to see if there was a way of driving prediction accuracy even higher. Dividing the herds! We have come up with a very simple idea and we think very effective solution to improve prediction reliability even further, the concept of dividing the herd (figure 36). Instead of running a single predictive market, what would happen if we divided up the herd into more than one predictive market and let them operate independently and then aggregate the results? It’s a technique that has been successfully adopted in qualitative research almost since the beginning. FIGURE 36. To test this out idea out we created three separate prediction markets all making the same 16 identical predictions independent of each other and then at the end we aggregated their collective trading opinion. Eight of the 16 predictions were identical across the three markets, in seven there were spit decisions and in one all three produced different answers. However when we aggregate the results, weighting each prediction on each markets relative confidence, we found that 15 of the 16 prediction were correct which equates to a 90% prediction accuracy (figure 37). Page 24 – CONGRESS 2014 Copyright © ESOMAR 2014 FIGURE 37. PREDICTION PERFORMANCE Again we must stress that as this was a small experiment, it must only be treated as anecdotal evidence, but it does point towards the potential value of using a divided herd technique. What’s more, for the more cautious, the technique of herd dividing actually has a fail-safe miss prediction error mechanism built in – if you are running three markets and then all predict the same thing there is a 88% chance of this being a reliable prediction. If there is some conflict, you can either aggregate the predictions or go further and run more markets as they are so quick and easy to set up. The academic work exploring predictive markets trading Whilst most of the paper has focused on our own primary research, prediction markets have also been a fascinating subject for hard-core scientific research for many years, and so we felt it important in rounding up this paper to highlight some of the key pieces of academic work that have been published relating to the some of the main issues raised in this paper. Investment size: Does a larger investment by one respondent in a prediction market, a measure of confidence, indicate better knowledge than of somebody who invests less and as a result do they make better prediction? Prof. Bernardo Hubermann, a HP Fellow, tested this in 2003. Strangely, he found no conclusive evidence that a larger investment means more knowledge or a better prediction of a trader. However then he had an idea. Some people like to gamble larger amounts than others. So he first ran respondents through multiple markets to obtain risk profiles for each individual respondent. Then he ran his numbers a second time, adjusting individual bets for the risk preference calculated in the first stage. Doing this Huberman found a significant improvement in forecast accuracy versus the imperfect market price. Excluding outliers, the gains amounted to roughly 20 to 30 percent reduction in forecast error. Gender issues: Prediki’s own extensive historical trading data demonstrate that women, on average, invest noticeably less than men in any single prediction market. So they themselves have developed internal prediction optimisation algorithms to improve the market price into an optimised forecast, by taking into consideration each user’s past investment pattern. These optimised forecasts can be used as an additional information point, often helpful when analysing forecasts in greater detail. Real money vs. play money: Do prediction markets work better when respondents have to put real money to where their mouth is? Or is it sufficient to use play money? This is an important question because issues with the legality of betting in a particular country – e.g. the USA – would prevent us from using the prediction market method there. Prof. Justin Wolfers from Wharton took a look at this, also in 2003. They asked the readers of USA Today to participate in a stock market game to forecast the outcomes of the American Football league. To their surprise, there was no perceptible quality difference. On some predictions, the playmoney markets performed better, on others the real-money markets. However, both did much better than individual experts and polls. For market researchers this is good news: Prediction markets can be done strictly with play-money thus avoiding all kinds of legal hassles and geographic limitations. Importance of Incentives: Do prediction markets not need monetary incentives, then? Wouldn’t it be great if we could tap into the wisdom of the crowd for free? In practice however, we regularly see prediction markets languish because a client Page 25 – CONGRESS 2014 Copyright © ESOMAR 2014 fails to provide enough incentive money or attractive prizes. Prediction markets need to attract an informed crowd for good results and they must be motivated to reflect in Kahnemann’s System 2 style. In 2006 Prof. Stefan Luckner of the University in Karlsruhe tested three variations: rank prizes for first, second and third, performance-based prizes, and fixed incentives, like traditional surveys. Not surprisingly, rank prizes gave better results than fixed incentives, participants had to work harder. Counter-intuitively, performance-prizes performed worse than fixed. Prediki believes this is because the experiment setup allowed a small but safe gain for no-action in the performance-prize version and players hate to lose a gain they already have. Number of traders: How many participants do you need for good results? As market researchers you know your ropes, statistical significance, demography, sampling and all. But how does the math work for prediction markets? In 2007, Jed Christiansen of the University of Buckingham took a look. Cleverly, he used a future event with very little general coverage and impact, rowing competitions, and asked participants to predict the winners. A daunting task, as there are no clever pundits airing their opinions in press, like in soccer. However Christiansen recruited his participant pool from the teams and their (smallish) fan base through a rowing community website, in other words, he found experts. His results were truly astonishing. The magic number was as little as “16”. Markets with 16 traders or more were well-calibrated, below that number prices could not be driven far enough. Experts are there generally good and bad predictors? Now what does “expert” mean, are there people who merely make noise but do not contribute to - or worse, detract from - prediction accuracy? One of Prediki’s own studies from 2009, done with Prof. Andreas Mild from the University of Economics in Vienna, has demonstrated that this can be a significant factor. They used data from seven years of political, sports and corporate prediction markets to identify Supertraders, Donkeytraders and Average Joes. They observed that good performance in one market indicated good performance in others. Then they took the next step and analysed market prices, correcting for Donkeytraders and giving more weight to Supertraders. Result was an improvement of 10 to 20% of forecasting accuracy. More recently they have taken this approach a step further and measured past trader performance segregated by topic, as a trader who is good in automotive must not necessarily make good predictions about toothpaste sales or election results. SUMMARY OF FINDINGS We appreciate there is a lot of content in this paper and so here is a summary of what some of the key insights we have learnt from these experiments 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Crowds can make very wise predictions but only if they have some collective knowledge to bring to bear on the topic, otherwise they can be plain ignorant. Crowd price predictions tend to lag behind actual prices and women appear to be more in touch with the prices of household goods than men. In the absence of collective knowledge we rely more heavily on the opinions of others which can make some predictions very sensitive to nudge effects and herd behaviour Our emotions can get badly in the way of us making objective predictions. How you ask prediction questions is extremely important – they need to be objectively pitched as “problems” for the audience to “solve”. The predictive perspective on the one hand can be more objective but on the other misses out on some of the detail so prediction needs to be used hand in hand with personal style questions. Prediction markets techniques can differentiate opinions more efficiently than standard rating protocols and making the most of this through gamification approaches really does to appear to boost the quality of predictions further. Predictions can become more accurate through weighting the answers from different prediction sources based upon their reliability. There are clearly good and bad predictors but they are hard to differentiate though a simple test (or rather we have not discovered a test yet). Our prediction powers are based on our knowledge & expertise on a topic and so predictions can be improved by taking account of the relative expertise (or reliability) of the predictors Personal confidence does seem to correlate with the quality of predictions as too does the amount of money you commit to a prediction bet. So these can be treated as proxy indicators of knowledge – but it does require respondent level calibration to take account of base line confidence levels. Prediction markets trading does appear to improve the reliability of predictions over traditional research using micro samples if they are effectively fed with information and well moderated. Dividing the herd seems to be a good way of alleviating general herding effects. Because they require in market research terms micro samples to provide stable predictions, we believe prediction markets techniques are potentially ideal for patching onto qualitative scale projects and MROC style community research. Page 26 – CONGRESS 2014 Copyright © ESOMAR 2014 CONCLUSIONS What are our main conclusions from digesting all this data? To make the most of prediction protocols in surveys our advice would be to move away from thinking of participants in surveys as bits of data towards thinking of them as a network of brains that you can use to help solve your problems. Pitch them the problem you are trying to solve objectively and provide them with all the information you can to allow them to solve it. (See figure 38.) FIGURE 38. Information is clearly the key ingredient to effective predictions. The more information you share with participants and the more you encourage them to self-issue knowledge to the other prediction participants the better your predictions will be. We would also advocate a more stereoscopic approach all round to conducting research. Prediction cannot be used in isolation from asking people about their own personal perspective on issue to. So we would advocate integrating more prediction protocols into research in tandem with more personal based question. As we have seen these two ways of looking at things can provide different perspectives – one looks at the woods the other at the trees (figure 39). FIGURE 39. HOW COULD ALL THIS THINKING BE USED TO UPDATE THE WAY WE THINK ABOUT CONDUCTING RESEARCH? Here to finish of this paper is a snapshot of how the learnings from this paper could be applied to update our approach to tracking research… Traditional tracker:  A respondent treated as a bit of data  You assemble a picture of the market like building a jigsaw puzzle of bits of data  Most questions are backward looking aiming to quantify behaviour  Respondents asked a set of questions  No collaboration or sharing of opinions Predictive tracker:  Use respondent’s brains to help solve problems  Their collective opinion represent the wisdom of the crowds  Shift to more future orientated questioning approach  Survey pitched to respondents as a game to predict the future trends in the market Page 27 – CONGRESS 2014  Copyright © ESOMAR 2014 Open sharing of thoughts and opinions Traditional tracking survey questions:  What brands are you aware of?  Have you purchased this brand?  Rate this brand  Which of these statements about this brand do you agree or disagree with?  Will you buy this brand in the future? Predictive tracker style survey questions:  Which of these brands do you believe has the biggest capacity to grow?  Can you explain the reasons why you think that?  What are the strengths of this brand?  How is your behaviour changing?  What trends do you observe  Here are the thoughts of other respondents based ACKNOWLEDGEMENTS Special thanks to Walker Smith, of the Futures Company, for pointing us in the right direction and the useful advice he provided in writing this paper. Special thanks also to Ray Poynter at Vision Critical, as it was his philosophical musings on what influences and disrupts our ability to make accurate predictions that stimulated the journey to write this paper. Lastly, special thanks to the members of the 2014 MRS BIG Conference who help in part to crowd source some of the insights included in this paper. ENDNOTES 1. http://en.wikipedia.org/wiki/Nate_Silver 2. Source: http://www.marketingweek.co.uk/analysis/essential-reads/how-to-stop-your-brand-becoming-acommodity/4010334.article 3. Source: http://psp.sagepub.com/content/25/11/1417.short 4. If you are interested to know more about this technique this is a link to a presentation about this topic that was given at the Amsterdam IIEX event http://www.insightinnovation.org/2014/04/23/surveytainment-how-can-you-makesurveys-more-addictive/ 5. http://dl.acm.org/citation.cfm?doid=2600057.2602886 6. Source: http://press.princeton.edu/titles/7959.html 7. A famous experiment that is described in the award winning ESOMAR paper by Stephen Phillips & Abigail Hill RESEARCH IN A WORLD OF IRRATIONAL EXPECTATION 8. http://en.wikipedia.org/wiki/Iowa_Electronic_Markets 9. Source: Brainjuicer predictive markets papers. 10. Note: we did this by randomly selecting 15 people from a sample of 400, 20,000 times using a macro routine and looking at the average accuracy of each group’s predictions REFERENCES Chen, K. Fine, L. and Huberman, B. 2003. Predicting the Future. Information Systems Frontiers 5:1, 47–61 Christiansen, J.D. 2007. Prediction Markets: Practical experiments in small markets and behaviours observed. Journal of Prediction Markets 1, 17–41 Galton, F. 1907a. Letters to the editor. Nature 75, 1952. Galton, F. 1907b. Vox populi. Nature 75, 1949, 450–451. Gigerenzer, G. 2008. Why heuristics work. Perspectives on Psychological Science. 3, 1, 20–29. Gordon, K. 1924. Group judgements in the field of lifted weights. Journal of Experimental Psychology 7, 389–400. Goldstein, D.G. McAfee R.P. Suri, S, 2014. The wisdom of smaller, smarter crowds, 15th ACM conference on Economics and computation Herzog, S.M. and Hertwig, R. 2009. The wisdom of many in one mind: Improving individual judgments with dialectical bootstrapping. Psychological Science 20, 231–237. Klugman, S. F. 1945. Group judgement for familiar and unfamiliar materials. Journal of Genetic Psychology 32, 103–110. Knight, H. 1921. A comparison of the reliability of group and individual judgements. M.S. thesis, Columbia University. unpublished. Page 28 – CONGRESS 2014 Copyright © ESOMAR 2014 Levitt, S. D., Miles, T. J., and Rosenfield, A. M. 2012. Is texas hold’em a game of chance? a legal and economic analysis. The Georgetown Law Journal 101, 581–636. Lorenz, J., Rauhut, H., Schweitzer, F., and Helbing, D. 2011. How social influence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences 108, 22, 9020–9025. Lorge, I., Fox, D., Davitz, J., and Brenner, M. 1958. A survey of studies contrasting the quality of group performance and individual performance, 1920-1957. Psychological Bulletin 55, 6, 337–372. Lovie, A. D. and Lovie, P. 1986. The flat maximum effect and linear scoring models for prediction. Journal of Forecasting 5, 3, 159–168. Luckner, S. 2007. How Do Incentive Schemes Affect Prediction Accuracy? Dagstuhl Seminar Proceedings 06461 Mannes, A. E., Soll, J. B., and Larrick, R. P. 2013. The wisdom of small crowds. Manuscript in Preparation. Mcmurray, J. C. 2013. Aggregating information by voting: The wisdom of the experts versus the wisdom of the masses. The Review of Economic Studies 80, 1, 277–312. Muchnik, L., Aral, S., and Taylor, S. J. 2013. Social influence bias: A randomized experiment. Science 341, 6146, 647–651. Puleston, J. & Sleep, D. 2011 The game experiments: Researching how game techniques can be used to improve the quality of feedback from online research. ESOMAR Congress Amsterdam. Servan-Schreiber, E. Wolfers, J. Pennock, D.M. & Galebach, B. 2004. Prediction Markets: Does Money Matter? Electronic Markets, Volume 14 (3). Tetlock, P.E. 2006 Expert Political Judgment: How Good Is It? How Can We Know? Simmons, J. P., Nelson, L. D., Galak, J., and Frederick, S. 2011. Intuitive biases in choice vs. estimation: Implications for the wisdom of crowds. Journal of Consumer Research 38, 1, 1–15. Surowiecki, J. 2005. The Wisdom of Crowds. Treynor, J. L. 1987. Market efficiency and the bean jar experiment. Financial Analysts Journal 43, 3, 50–53. Waitz, M. & Mild, A. 2009. Improving forecasting accuracy in Corporate Prediction Markets. Journal of Prediction Markets 3 3, 49-62 THE AUTHORS Jon Puleston is VP Innovation, GMI, United Kingdom. Hubertus Hofkirchner is CEO, Prediki, Austria. Alex Wheatley is a Researcher, GMI, United Kingdom.