May the Best Analyst Win

LAST MAY, JURE ŽBONTAR, A 25-YEAR-OLD computer scientist at the University of Ljubljana in Slovenia, was among the 125 million people around the world paying close attention to the televised finale of the annual Eurovision Song Contest. Started in 1956 as a modest battle between bands or singers representing European nations, the contest has become an often-bizarre affair in which some acts seem deliberately bad—France’s 2008 entry involved a chorus of women wearing fake beards and a lead singer altering his vocals by sucking helium—and the outcome, determined by a tally of points awarded by each country following telephone voting, has become increasingly politicized.

Žbontar and his friends gather annually and bet on which of the acts will win. But this year he had an edge because he had spent hours analyzing the competition’s past voting patterns. That’s because he was among the 22 entries in, and the eventual winner of, an online competition to predict the song contest’s results.

The competition was run by Kaggle, a small Australian start-up company that seeks to exploit the concept of “crowdsourcing” in a novel way. Kaggle’s core idea is to facilitate the analysis of data, whether it belongs to a scientist, a company, or an organization, by allowing outsiders to model it. To do that, the company organizes competitions in which anyone with a passion for data analysis can battle it out. The contests offered so far have ranged widely, encompassing everything from ranking international chess players to evaluating whether a person will respond to HIV treatments to forecasting if a researcher’s grant application will be approved. Despite often modest prizes—Žbontar won just $1000—the competitions have so far attracted more than 3000 statisticians, computer scientists, econometrists, mathematicians, and physicists from approximately 200 universities in 100 countries, Kaggle founder Anthony Goldbloom boasts.

And the wisdom of the crowds can sometimes outsmart those offering up their data. In the HIV contest, entrants significantly improved on the efforts of the research team that posed the challenge. Citing Žbontar’s success as another example, Goldbloom argues that Kaggle can help bring fresh ideas to data analysis. “This is the beauty of competitions. He won not because he is perhaps the best statistician out there but because his model was the best for that particular problem. … It was a true meritocracy,” he says.

