Internet companies fight to catalog books online

Internet giant Google is being sued for copyright infringement, paving the way for Yahoo! and the Internet Archive to develop a book digitalization organization, the Open Content Alliance.

Google’s two digital scanning projects, the Print Publisher Program and the Library Project, both download excerpts or full texts from previously published books onto the Google search engine, causing the literary union Authors Guild to sue Google. As a result of the lawsuit, the company is delaying further scanning until November 2005.

Nick Taylor, the president of the Authors Guild, said in a statement to the New York Times, “By digitizing mountains of copyrighted books without permission, Google is exercising a renegade notion of eminent domain: Google decides what’s good for us and seizes private property to get it done.”

The basics of copyright are referenced in Article 1, Section 8 of the U.S. Constitution, which says the U.S. must “promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.”

Google’s Vice President of Product Management Susan Wojcicki said that Google does not intend to steal money from authors but is trying to increase book sales for the authors.

“Just as Google helps you find sites you might not have found any other way by indexing the full text of Web pages, Google Print, like an electronic card catalog, indexes book content to help users find, and perhaps buy, books,” Wojcicki wrote in her Google blog.

Wojcicki also said that Google’s practices are well within copyright laws, citing the literary criticism clause of the Fair Use Doctrine.

The Fair Use Doctrine states that “the fair use of a copyrighted work … for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.”

Like Google, the Open Content Alliance – made up of several technical companies and universities including the Internet Archive, Yahoo! – will also digitalize books in PDF format. They, however, will sidestep copyright laws by only scanning texts if they are public domain or if the copyright owner gave the OAC permission to use his/her work.

The University of California will donate their public domain books, which Daniel Greenstein, associate vice provost and University Librarian of the California Digital Library, told CNER News is 15 percent of their 33 million volumes.

Jennifer Colvin, a worker at the California Digital Library, the UCA online library, said, “We digitalize out-of-copyright American fiction from the 1800s to 1920s – books by Mark Twain [and] Jack London.”

Though UC has just begun to scan books, Colvin said she believes the digital material will be available “by the first of the year.”

Michael Beller, a reference librarian at Mills College, is excited about the new digital library. “We have this incredible technology that can deliver content … at a moment’s notice; it astounds me that it’s been so difficult to make it [books available online for the public] happen.”

The downloaded works will soon be available at www.opencontentalliance.com, and later the works will be accessible from most search engines.