Limitations of the pdf

The basic format doesn’t include any requirement that text be selectable or searchable, while data presented as charts and tables is often impossible to export in any useable way.

May 12, 2014 12:15 am | Updated 12:15 am IST

It’s the standard file format for nearly every academic paper, political briefing and research note. But a new report by the World Bank suggests that the venerable pdf is keeping valuable information buried in servers, unread and unloved.

The working paper — released, naturally, as a pdf — examines which reports released by the organisation are widely read, or even read at all. Of the 1,611 reports the study looked at, only 25 were downloaded more than 1,000 times in the five-year period between 2008 and 2012. At the other end of the scale, over 31 per cent of the reports the group looked at — 517 separate research papers — were not downloaded a single time.

“It is, however, important to keep in mind that many policy reports were not intended to reach a large audience,” note the report’s authors, Doerte Doemeland and James Trevino, “but prepared to assess very specific technical questions or inform the design of lending operations.” As for which reports were actually read, the pair state that “more expensive, complex, multi-sector, core diagnostics reports on middle-income countries with larger populations tend to be downloaded more frequently.” The portable document format, or “pdf”, was invented by Adobe in 1993 as a way of rendering documents with rich text formatting and inline images in a consistent way across multiple computing platforms and various software packages. A document saved as a pdf should always look the same, no matter where it is being viewed, a fact which has made it popular for the digital release of complex reports.

Blocks data analysis But owing to the way such documents are rendered, pdfs often give up machine readability in favour of human readability. The basic format doesn’t include any requirement that text be selectable or searchable, while data presented as charts and tables is often impossible to export in any useable way.

That then makes it impossible to mine the documents for the data they contain and so create databases of new information pulling together disparate sources. Despite efforts to create “pdf to html” converters, they still need human oversight to check for errors of interpretation.

Nathanial Manning, a fellow for the White House’s open data project, argued in The Guardian that it’s understandable that the format is used. “There are often numerous different documents used to make a single project report, including Excel models, GIS shapefiles, and Photoshop charts.

“The ease of taking screenshots and putting it all into a pdf report, and sending it along via e-mail is completely understandable. But this is like funding James Cameron to make Avatar , and then releasing it in a black and white flipbook. We are missing all the good stuff. This has to change.” — © Guardian Newspapers Limited, 2014

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.