Patterns of activity on Wikipedia can predict the opening box office takings of blockbuster movies a month before they are released, according to scientists.
Taha Yasseri, a physicist at the Budapest University of Technology and Economics, has created a mathematical model that takes into account data such as the number of readers and editors for the Wikipedia page of an upcoming movie and shown that it correlates with takings on the film's opening weekend.
Mr. Yasseri and his colleagues, Marton Mestyan and Janos Kertesz, built the model using data on 312 movies with Wikipedia pages, out of a total of 535 that were released in the U.S. in 2010. Overall, the predicted box office takings matched reality with an accuracy of around 77 per cent.
For the biggest movies in the sample — such as Iron Man 2 , Alice in Wonderland , Toy Story 3 and Inception —the relative accuracy of the model's prediction was more than 90 per cent. Predictions for less successful movies, such as Never Let Me Go , Animal Kingdom and The Killer Inside Me , varied more widely from what actually happened.
The paper, which has not yet been peer-reviewed, was posted this week on the arXiv database.
“We were looking for the fingerprints of popularity of a movie,” said Mr. Yasseri. The Wikipedia entries of movies that were going to be popular were more heavily edited and visited by more readers.
Mr. Yasseri added that the model could be used by studios to help predict the potential success of their movies. But his principal aim was to show how researchers could address sociological questions by using the enormous data sets being collected on social media sites such as Wikipedia, Twitter and Facebook.
“We wanted to show there is a way to trace these things through social media impacts,” said Mr. Yasseri.
Scientists at HP Labs in Palo Alto have shown that the number of times a movie is mentioned on Twitter is a good indicator of its subsequent box office revenue.