University of California, Riverside

UCR Newsroom

UC Riverside Wins Google Grant

UC Riverside Wins Google Grant

The $50,000 award will use a UCR-based, online catalog about books published before 1801 to update the massive Google Books repository.

(July 14, 2010)

RIVERSIDE, Calif. – Researchers from the University of California, Riverside and Eastern Connecticut State University have received a $50,000 grant from Google to improve descriptions of books published before 1801 that are part of the Google Books digitized archive.

Google, the world’s most popular search engine, today announced 12 grants totaling $479,000, the first in its Digital Humanities Research Awards program. The company has established a growing digital archive containing more than 12 million books in more than 400 languages, including one of the largest collections of digitized early modern books.

In announcing the awards Google said the recipients were selected “in part because the resulting techniques, tools and data will be broadly useful: they’ll help entire communities of scholars, not just the applicants.” The grant is eligible to be renewed for an additional year.

Brian Geiger, director of UC Riverside’s Center for Bibliographical Studies and Research (CBSR), said center staff and Benjamin Pauley, associate professor of English at Eastern Connecticut, will compare metadata about books in Google Books that were published before 1801 with descriptions of those books contained in the English Short-Title Catalog at UCR. In addition to the author and book title, metadata includes such details as the printer and year of publication, a physical description of the book, where copies of the originals are located, and general notes. Such information is critical for scholars, Geiger said, and is too often missing from the Google Books database.

Google Books has amassed scans of tens of thousands of books published before 1801, a period known to historians as the “hand-press era,” Geiger said. The archive has tremendous potential for transforming teaching and research in the humanities, especially for students and scholars at institutions that cannot afford access to costly commercial collections.

“This remarkable collection offers great potential for students and scholars of the early modern era (roughly the 17th and 18th centuries). But it suffers from one shortcoming: The metadata at Google Books is too inconsistent and cursory to allow for serious, detailed study of the books that the service holds,” Geiger wrote in the grant application. “In a period in which type was set by hand and print runs were small, there can be considerable variation among seemingly identical titles. These differences, even when small, often have substantial repercussions for understanding a time period or topic. For scholars of this era, whether they are in English, history, or interdisciplinary fields, the task is often not simply to find a text, but to understand which text they have found.”

The English Short-Title Catalog contains the highest quality records available of printed copies of books published before 1801, Geiger said. Begun in the late 1970s, the ESTC endeavors to record all surviving copies of works published anywhere in English or in any language in Great Britain and its dependencies from 1473 to 1800. It is a joint effort of the British Library, the CBSR, and contributing libraries throughout the world that has grown to approximately 500,000 items and 4 million holdings. The ESTC is widely regarded as the single most authoritative source for the identification of early modern editions.

The UC Riverside-Eastern Connecticut proposal calls for the researchers to match the Google Books data against that contained in the ESTC, embedding URLs for Google Books in appropriate ESTC records and sending high-quality metadata with ESTC IDs back to Google Books. Geiger estimates that there are between 150,000 and 200,000 pre-1801 works currently in Google Books and that the project will be able to match at least 80,000 of those to the English Short-Title Catalog using computers and student workers, a process that will take about six months.

For the remaining Google Books items, using Pauley’s “Eighteenth-Century Book Tracker” as a model, the two researchers hope eventually to develop a website that will allow users to match items in Google Books to ESTC records and edit those records based on information they find in the digitized copy, improving the bibliographical reliability of thousands of entries.

Finally, Geiger and Pauley will do research on Google Books metadata throughout the grant cycle, producing a summary report of their findings at the end. “By comparing the English Short-Title Catalog records to items in Google Books,” they write, “we will assess the strengths and weaknesses of metadata currently in the archive and suggest ways that Google can reasonably improve its metadata collection and delivery in the future.”

The University of California, Riverside ( is a doctoral research university, a living laboratory for groundbreaking exploration of issues critical to Inland Southern California, the state and communities around the world. Reflecting California's diverse culture, UCR's enrollment has exceeded 21,000 students. The campus opened a medical school in 2013 and has reached the heart of the Coachella Valley by way of the UCR Palm Desert Center. The campus has an annual statewide economic impact of more than $1 billion.

A broadcast studio with fiber cable to the AT&T Hollywood hub is available for live or taped interviews. UCR also has ISDN for radio interviews. To learn more, call (951) UCR-NEWS.

More Information 

General Campus Information

University of California, Riverside
900 University Ave.
Riverside, CA 92521
Tel: (951) 827-1012

Department Information

Media Relations
900 University Avenue
1156 Hinderaker Hall
Riverside, CA 92521

Tel: (951) 827-6397 (951) UCR-NEWS
Fax: (951) 827-5008

Related Links