Carl Malamud contends that all the books will be out of copyright at some point

“I said, ‘I’m not removing that many books’”

November 25, 2017 05:28 pm | Updated 05:28 pm IST

‘I’m in a bit of a stand off with the Indian government’

‘I’m in a bit of a stand off with the Indian government’

“I’m in a little bit of a stand-off with the Indian government on the Digital Library of India,” Carl Malamud begins. We’re meeting in a hotel lobby to discuss another matter related to his relationship with our government. Malamud explains that aside from that little matter of 19,000 Bureau of Indian Standards documents awaiting a decision in the Delhi High Court (reported in The Hindu , November 12), he also maintains, through his one-person NGO, Public Resource, other digital collections of important Indian information.

The most recent one is the official gazettes of India, 150,000 of them on the Internet Archive, searchable, categorised by date, extraordinary versus ordinary, and so on. He plans to add State gazettes too: “I have four queued up, ready to go. Our goal is to get all 29 States. The idea is you have an integrated repository of all the official notifications.”

Then there is a photo set, images that he found on some effectively inaccessible corner of the Ministry of Information’s site. “I took the time to kind of sort it. There were actually 90,000, but I found 12,000 that were high enough resolution.” That’s all in a Flickr album.

Value added

And there’s the Hind Swaraj collection: “A carefully-curated one: the collected works of Gandhi, all 100 volumes; Nehru (we found most of them on the government site, but they were missing three volumes, so I got those, and we now have the most complete Selected Works of Nehru); Ambedkar, most of which were on the Maharashtra site, but six volumes had come out subsequently, and I got those and scanned them, so we have the most complete Ambedkar collection. There is Bharat Ek Khoj [the 1988 Shyam Benegal television series based on Nehru’s Discovery of India ]. It was closed-captioned in English [the original audio was in Hindi], so for five episodes, we added closed-captioning in Urdu, Telugu, and Hindi; we’re trying to add value.” There are also 129 audio recordings from All India Radio of Gandhi speaking at prayer meetings.

The stand-off then?

He explains that the Indian government had, for several years, been scanning books, about half of them in English, French and German, and the other half in Indian languages. There were about 5,55,000 of them in the Digital Library of India (DLI). Malamud discovered them on a trip to India, where he and Sam Pitroda spoke at Gandhi’s ashram.

While waiting to get his flight back to the U.S., he took a look at the DLI. “I wrote a little script [to download the books], and when I got home, a few were there. Over the next three months, I was able to get 4,63,000. I loaded them up on the Internet Archive, and they started getting about a million views a month.” He also added value to the collection.

“There were a lot of problems. The metadata, titles were wrong, some of the scans were bad, there were duplicates, but it was still 50 different languages and a lot of unique books. We are search-engine-optimised, so people started going in.”

Since end-November last year, when the collection went online, it has had over 8.4 million views.

Malamud anticipated take-down notices — the DLI said the books were out of copyright, but he noticed that there were some that were not — and “Sure enough, we got 20 or 30 people calling and saying, ‘My god! You got our book!’ ‘Not a problem, we’ll remove it.’ We removed 127.”

Then a certain well-connected Russian began creating a fuss: his father’s books were there, and he wanted them out. “The government panicked and removed the DLI website from the Net!”

The government also got in touch with Malamud, and asked him to take his archive down too. “I said, ‘I’m not removing that many books.’ To give them some time to figure out what was going on, I ramped it down to about 2,00,000 books, 1923 and back. That reduced our traffic to half a million views [a month]. That’s a lot of people to deny access to.”

His contention is that all the books will be out of copyright at some point, and he will have them ready then, “I could serve all of them to the blind today. You’re allowed to serve a copyrighted book to the blind: international treaty.”

He has also continued to add books to the collection, about 120,000 more from the original set he downloaded from the DLI. Most recently, he writes in a later email, “I added 4,440 books from the Archaeological Survey of India and there are another 30,000 or so queued up that I’ve been gathering from around the Net.” The total at archive.org/details/digitallibraryindia as of this writing is 394,359.

Official blessings

With glee lighting up his eyes, he speaks of a windfall gift. Someone in Hollywood had found a box of books that Richard Attenborough had read before making his film Gandhi , and asked Venkatesan Ashok, India’s Consul General in San Francisco, for advice. The consul suggested they call Malmud.

“Waiting for me at home are all the books that Attenborough read to do the film. Are there notations in the margins? It’ll be so much fun!” In an email after he returned to the U.S., he says the trove contained a number of books out of copyright, including some Navijan Press books he had not seen, so those will go up too, as well as the last nine volumes that will complete the set of second series of the Selected Works of Nehru.

That he is thrilled to find more works he hasn’t read isn’t just the data nerd talking. He has read deeply of and on Gandhi, and about the Indian freedom movement, and his activism is, in part, informed by the Mahatma’s methods. He calls his refusal to take down material ‘civil disobedience,’ and his strategy a ‘satyagraha,’ albeit one without risk of danger to life, limb and personal liberty.

That the Indian government is ambivalent about this particular one of Malamud’s exploits is evidenced not just by the consul’s recommendation of him as the recipient of Attenborough’s books, but that Mr. Ashok also spoke at the official launch of the mirror of the DLI at an event in the Internet Archive’s San Francisco office in June this year.

Still, one senses that Malamud would rather have official blessings for at least this project. He has enough going with litigation in the U.S. and India. “Because we’ve been screwing around with this ‘take down the books’ thing, we haven’t been able to begin making the metadata better.”

The plan is, “Bring all the books to India, and make them available to an educational institute.”

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.