What is the Internet Archive and why is it facing a backlash from book publishers? | Explained

Internet Archive, parent of Wayback Machine, is facing a serious legal hurdle from book publishers. Why is the digital database experiencing backlash?

Updated - July 08, 2024 09:29 am IST

Published - July 06, 2024 01:36 pm IST

FILE PHOTO: Internet Archive is embroiled in a major legal challenge as it faces off against traditional publishers accusing it of copyright violations.

FILE PHOTO: Internet Archive is embroiled in a major legal challenge as it faces off against traditional publishers accusing it of copyright violations. | Photo Credit: AP

The story so far: Internet Archive, a non-profit that aims to digitise, preserve, lend, and share multi-media content, is embroiled in a major legal challenge as it faces off against traditional publishers accusing it of copyright violations. The free digital library is currently fighting the forced removal of around half a million books from its platform, which it argues functions like a library.

What is the case against Internet Archive?

While a great number of books digitised and uploaded by Internet Archive were already in the public domain—such as historical sources, old classics, etc.—many traditional publishers have alleged that Internet Archive violated their copyrights and illegally made their books available to the public as well, by scanning physical copies and distributing the digital files.

In the case Hachette vs Internet Archive that began in 2020, traditional publishers Hachette, HarperCollins, Wiley, and Penguin Random House sued Internet Archive. On March 24 last year, District Judge John G. Koeltl issued an order in favour of the publishers.

“IA’s Website includes millions of public domain ebooks that users can download for free and read without restrictions,” noted the order, adding, “Relevant to this action, however, the Website also includes 3.6 million books protected by valid copyrights, including 33,000 of the Publishers’ titles and all of the Works in Suit.”

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

In particular, traditional publishers were against IA’s temporary ‘National Emergency Library’ (NEL) initiative that it launched during the COVID-19 pandemic. This was to allow more users to access the e-books in its collection while physical libraries were locked down.

“During the NEL, IA lifted the technical controls enforcing its one-to-one owned-to-loaned ratio and allowed up to ten thousand patrons at a time to borrow each ebook on the Website,” stated the 2023 order.

In general, IA uses a system known as “controlled digital lending” to limit the number of people who can access an e-book. It ended its emergency library system after being hit with the lawsuit.

Internet Archive used the doctrine of fair use to defend itself in the case, but this did not hold up. The organisation said it would appeal, but did so after some delay.

The case is ongoing, with the oral argument stage of the appeal taking place on June 28.

Why are books being removed from the Internet Archive?

As a result of the lawsuit, IA was forced to remove over half a million books from its database, with the Director of Library Services at Internet Archive, Chris Freeland, calling out the “profoundly negative impact” on users. 

According to testimonies collected by IA, the mass removal hurt students who could not access books for academic research. 

While IA identifies itself as a library, it has been compared to a shadow library or a piracy database by traditional publishers, who disagree with its “controlled digital lending” approach.

Despite the removal, however, Internet Archive is still home to a rich collection.

As of late June, the web archive said it contained 835 billion web pages, 44 million books and texts, 15 million audio recordings, 10.6 million videos, 4.8 million images, and 1 million software programs. Live concerts and television programs also make up part of this collection.

What is Wayback Machine?

While Internet Archive buys physical books, digitises them, lends them to users, or makes them available for download, it has since 1996 also focused on preserving web pages. The platform claims users can explore over 866 billion saved web pages through its own search service.

“We began in 1996 by archiving the Internet itself, a medium that was just beginning to grow in use. Like newspapers, the content published on the web was ephemeral - but unlike newspapers, no one was saving it. Today we have 28+ years of web history accessible through the Wayback Machine and we work with 1,200+ library and other partners through our Archive-It program to identify important web pages,” noted Internet Archive on its website.

Users can help IA archive parts of the internet at no cost, or they can reach out to the platform to make their own work publicly available.

How can one use Wayback Machine?

Using Wayback Machine is easy and free of cost, though results are not always guaranteed.

To begin, navigate to the Wayback Machine web page, where you will see a bar in which you can enter a URL/keywords relevant to the web page or content you are looking for. Then, hit ‘enter’ and wait for the results to be shown.

If the content was new, rarely viewed, or deleted a very long time ago before being captured for the archive, you may not get many results or any at all. 

However, you have a good chance of finding content such as old websites that no longer exist today, earlier versions of existing websites, deleted social media posts, archived versions of paywalled articles, and archived versions of content that is blocked or censored in your jurisdiction.

A graphic will show you how many times Internet Archive “crawled” the content in the past months or even years, allowing you to click on the calendar bubbles to pick out “snapshots” of the web content from different periods of time. However, the service can be patchy at times and not all content might have been perfectly saved; broken links, missing media, or pages that won’t load are often the end result. 

A screenshot showing the calendar archive of preserved snapshots for the site barbie.com on Wayback Machine

A screenshot showing the calendar archive of preserved snapshots for the site barbie.com on Wayback Machine | Photo Credit: Wayback Machine

While Wayback Machine is useful for personal research or to access information sources, users should be cautious about relying on the data obtained through such sources, as the saved information can sometimes be outdated or inaccurate.

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.