The integration of the Merritt repository with Amazon’s S3 and Glacier cloud storage services, previously described in an August 16 post on the Data Pub blog, is now mostly complete. The new Amazon storage supplements Merritt’s longstanding reliance on UC private cloud offerings at UCLA and UCSD. Content tagged for public access is now routed to S3 for primary storage, with automatic replication to UCSD and UCLA. Private content is routed first to UCSD, and then replicated to UCLA and Glacier. Content is served for retrieval from the primary storage location; in the unlikely event of a failure, Merritt automatically retries from secondary UCSD or UCLA storage. Glacier, which provides near-line storage with four hour retrieval latency, is not used to respond to user-initiated retrieval requests.
|Content Type||Primary Storage||Secondary Storage||Primary Retrieval||Secondary Retrieval|
In preparation for this integration, all retrospective public content, over 1.1 million objects and 3 TB, was copied from UCSD to S3, a process that took about six days to complete. A similar move from UCSD to Glacier is now underway for the much larger corpus of private content, 1.5 million objects and 71 TB, which is expected to take about five weeks to complete.
The Merritt-Amazon integration enables more optimized internal workflows and increased levels of reliability and preservation assurance. It also holds the promise of lowering overall storage costs, and thus, the recharge price of Merritt for our campus customers. Amazon has, for example, recently announced significant price reductions for S3 and Glacier storage capacity, although their transactional fees remain unchanged. Once the long-term impact of S3 and Glacier pricing on Merritt costs is understood, CDL will be able to revise Merritt pricing appropriately.
CDL is also investigating the possible use of the Oracle archive cloud, as a lower-cost alternative, or supplement, to Glacier for dark archival content hosting. While offering similar function to Glacier, including four hour retrieval latency, Oracle’s price point is about 1/4th of Glacier’s for storage capacity.