The UC Curation Center (UC3) has offered innovative digital content access and preservation services to the UC community for over six years through its Merritt repository. Merritt was developed by UC3 to address unique needs for high-quality curation services at scale and a low price point. Recently, UC3 started looking into Amazon’s S3 and Glacier cloud storage products as a way to address cost concerns, fine-tune reliability issues, increase service options, and keep pace with ever-increasing scale in the volume, variety, and velocity of new content contributions.
The current Merritt pricing model, in effect since July 1, 2015, is based on recovering the costs of storage use, currently totally over 73 TB contributed from all 10 UC campuses. This content is now being replicated in UC private clouds supported by UCLA and UCSD. Since the closure earlier this year of the UCOP data center, the computational processes underlying Merritt, along with all other CDL services, have been moved to virtual machines in the Amazon AWS cloud. Collocating storage alongside this computational presence in AWS will provide increased data transfer throughput during Merritt deposit and retrieval. In addition, the integration of online S3 with near-line Glacier storage offers opportunities to lower storage costs by moving archival materials with no expectation of direct end-user access to Glacier. The cost for Glacier storage is about one quarter of that for S3, which is comparable with UCLA and UCSD pricing. Of course, the additional dispersed replication of Merritt-managed data in AWS will also increase overall reliability and long-term preservation assurance.
The integration of S3 and Glacier will supplement Merritt’s existing use of UC storage. Merritt’s storage function acts as a broker that automatically routes submitted content to the appropriate storage location based on its curatorially-defined access characteristics. Once Amazon storage has been added to Merritt, content tagged for public access will be routed to S3 for primary storage, from which it will be automatically replicated to a UC cloud. Retrieval requests for this content will be served from the S3 copy; should these requests fail (for example, if S3 is temporarily non-responsive), Merritt automatically retries from its secondary copy.
The path for content tagged for private access is somewhat different. It is initially routed to S3 for temporary storage until the replication to a UC cloud completes. The content is then moved into Glacier for permanent low-cost primary storage. Retrieval requests will be served from the UC cloud. In the unlikely event that this retrieval doesn’t success, there is no automatic retry from Glacier, since Glacier, while inexpensive for static storage, is costly for systematic retrieval. UC3 staff can, however, intervene manually to retrieve from Glacier if it becomes necessary. In the case of both public and private access, the digital content will continue to be managed with at least five copies spread across independent storage infrastructures and data centers.
The integration of Amazon S3 and Glacier into Merritt’s storage architecture will increase overall reliability and performance, while possibly leading to future reduction in costs. Once the integration is complete, UC3 will monitor AWS storage usage and associated costs through the end of the current Merritt service year in June 30, 2017, to determine the impact on Merritt pricing.