- Contact Us
Development Agency Reclaims Toxic Waste Site for New Community
Engineering Management Solutions for Controlling EPA Superfund Data Proliferation
Global Archives’ customers in the engineering services industry include public and private organizations who rely upon accurate, up-to-the-minute data sources to efficiently run programs while complying with the EPA SuperFund’s formidable regulations.
Global Archives was selected by the City of Irvine Redevelopment Agency to convert, analyze and organize the repository of data into a meaningful user library, such that authenticated users could easily access and find key documents based upon content or key metadata. The primary engine for this library is LockBox, equipped with specific stored procedures that enable users in multiple sites to conduct key metadata searches, as well as search by content.
The Environmental Protection Agency oversees an estimated 2,600 designated Superfund cleanup sites nationwide. Many sites are located in highly desirable locations and are ideal for community redevelopment. One such site is the former El Toro Naval Air Station in the city of Irvine, California. NAS El Toro began as a World War II training facility and ended up as a helicopter training and deployment center for both US Marine Corps and Naval Aviation. El Toro’s 4,700 acres are designated by the city of Irvine for planned, phased redevelopment and will ultimately contain several thousand housing units and commercial and industrial zones.
Before all this, though, comes massive cleanup efforts and remediation plans to recover the land from over 60 years of hard use and environmental compromise. Teams of public and private agencies, environmental consultants, real estate developers, architects, engineers, and building contractors, are on the job to reclaim, remediate, rezone and redevelop the El Toro land.
These efforts are yielding large-scale drawings, test results, and policies and procedures to be used during the redevelopment of this prime southern California land. The repository of this data collection alone contains over 500,000 documents, maps, drawings and reports. The challenge is a common problem for any of the Superfund cleanup sites: reining in massive amounts of documents — created at considerable investment to the taxpayer — and organizing this repository into a meaningful, scalable and shareable user library. These publicly funded investments are the roadmaps to remediation and restoration of the site, and its redevelopment for mixed uses.
Ensuring Data Management Cost Savings and Versatile Search Functions
Global Archives recommended a pilot program using LockBox based upon proven success with other comparable field installments. First, a “step-by-step” process was enacted to minimize disruption of daily operations.
Using a custom-built, OCR engine to scrape data from the documents, Global Archives built a reference table that included several key metadata fields, including document type, approximate number of pages, processed dates, EPA category, author, document recipient, subject, site location, and pertinent engineering information.
In addition, all of the documents (except engineering drawings) were made OCR-searchable for content. In this way, a user could search by key metadata, i.e., by author and “runway,” and retrieve all documents relating to this Boolean condition. Here is an example:
Document index search.
The search in this example yielded specific documents for the match, but also identified the content relationships amongst the documents. Since much of the searching was to find specific terms, two primary search result methods were provided — an OCR version for content only, and a complete document in its original format. In the sample below we see results from the OCR version.
Content search by key word.
Above are the results in “original” version.
Upon locating the desired document, the user could then view, print or share via e-mail. In addition, Global Archives generated several internal customized reports to provide the City of Irvine Redevelopment Agency’s administrator with user and file access reports. All mono-color documents were converted at 300 DPI and color documents (including engineering drawings) were set to 600 DPI to provide the desired quality. In fact, since many of the documents were more than 50 years old, Global Archives enhanced the images using a software enhancement tool to yield a “second” wash of the document after conversion and before data scrape.
The project was completed on time and user acceptance has been good as the library has been in active use for more than four years.
“Global Archives took on an impossible task, made sense out of the document storeroom, brought in the necessary equipment and staff and completed the job on time and on budget.” - Jill M. Schoener, Municipal Records Administrator, City of Irvine.
In work environments wherein strict adherence to cost guidelines and data security are of utmost concern (and subject to industry and governmental regulation), Global Archives offers proven success in data governance for very large scale engineering data management.
Scraping content data is an Optical Character Reader (OCR) function that captures content electronically, in a few different ways:
- Zones. Data capture can be organized to look at specific areas within a document. These areas are called “zones.” The zones are scraped to capture machine generated characters, barcodes, text and other generated data. The scraped data is then interpreted and written to a hidden text file attached to the image and can be used as a search or index field value. This is a fast and effective method to capture specific data on a form.
- Full OCR. Scraping the entire document is the generally accepted method for a full OCR search of the document’s content. This was effectively employed for all machine generated documents, including typewritten or type set documents. As in the Zone method, a hidden text file is attached to the document image and can be used for content search. Many documents in the library were able to be processed in this manner.
- Custom Reference Library. The custom reference library is used when a document has been handwritten or created in a non-standard character set such that conventional scraping is ineffective. In this process, sample letters are captured in pixel format and entered into a character table. This table has corresponding matching characters for each handwritten character; several language interpretations are allowed. Although there is a lower probability of success, most Custom Reference Libraries can provide a reasonable definition of the content in electronic format and, most importantly, allow both electronic viewing and content searching.