Converting Historical Archives into Cloud-based, Worldwide Libraries

Documenting The History of Science: California Institute of Technology’s Einstein Papers Project

Located on the campus of the California Institute of Technology is a small building that houses the library of Albert Einstein. Caltech, along with other prominent libraries in the US and Israel, is engaged in the noteworthy quest to collect, catalog, and organize documents from among the many works of Albert Einstein.

The Collected Papers of Albert Einstein is one of the most ambitious publishing ventures ever undertaken in the documentation of science history. It provides the first complete picture of Einstein’s massive written legacy ranging from his first work on the special and general theories of relativity to the origins of the quantum theory to his active involvement with international collaboration, cooperation, human rights, education, and disarmament.

The published volumes draw upon more than 40,000 documents contained in the personal papers of Einstein (1879-1955), and more than 30,000 additional Einstein and Einstein-related documents discovered by the editors since the 1980s. The printed series will contain over 14,000 scientific and non-scientific documents, and will fill close to 30 volumes.

The director and general editor of the Einstein Papers Project at Caltech engaged the services of Global Archives to help plan the launch of an online, electronic archive.

Launching a multilingual, international archive of written treasures using the existing, hard copy document index scheme provided by the Caltech team, Global Archives built an electronic version of the works. This ensured that the library’s contents would be universally and seamlessly integrated to other Einstein libraries worldwide (now that’s truly a “global archive”!). The library contains a wide range of document types, including letters, bound documents, magazine articles and other personal and professional pieces. Using the existing document indexing method, those accessing the archived library online can easily access, search and share documents; data searches can be conducted by index, content, and through a variety of  languages.

Global Archives analyzed the Einstein Library’s paper indexing method, evaluated the document types, and built a database structure to incorporate the director’s requirements. Global Archives also created a front-end tool based upon LockBox to open up access by other libraries to library contents, allowing them to securely render URL links onto their respective sites of target documents.

Secure, customized search and access of the Einstein Papers Project — namely, a universal library of all things Einstein — was the primary goal of the caretakers at Caltech and elsewhere. Global Archives led a painstaking, meticulous conversion phase, wherein source documents were imaged and their key indexing data populated into the record’s database system. In addition, a full content scrape was performed on most mechanical print, including typewritten and published text. Since many of the personal papers were handwritten, these documents could only be accessed via the conventional index method. Languages scraped by OCR included Hebrew, German, French and English.

Now, the Einstein Papers Project is universally accessible using Global Archives’ LockBox. Users worldwide can review the entire library from either LockBox’s website or via the Einstein Archives Online home page https://einsteinpapers.press.princeton.edu.