Streamlining Merritt Microservice Configuration

Eric Lopatin, November 5, 2020

Posted in: Digital Preservation, Merritt, UC3

Working with a Centralized Parameter Store and how Merritt Stands to Benefit

– Eric Lopatin, Terry Brady, Marisa Strong –

Introduction

There are many paths forward for the technology stack that underpins any digital preservation system. It’s often the process of choosing an appropriate path that can be challenging, rather than enumerating any number of possible, new implementations. The Merritt team has found itself in this position recently, having assembled a long list of new paths to tread, and knowing that our resources are finite and should be applied with care to solutions that stand to benefit users.

Leading up to this point in time, we’ve finished migrations in the past year that reduced preservation storage costs for campuses to nearly a quarter of what they were. These also helped secure our vision for a revised approach to preservation that includes a third object copy in a separate geographic region of the country in order to mitigate risk to collections.

Having completed this work, the team has decided that among a number of possible initiatives, one in particular provides a solid building block in a larger strategy to improve the repository’s resilience – and it takes the form of streamlining configuration practices across all of Merritt’s microservices. Let’s dig into that a bit.

Merritt is a complex system, employing both Java and Ruby web applications across nine microservices responsible for tasks ranging from content ingestion, to inventory, replication and fixity checking. Since its introduction, each microservice application has employed a unique method to specify its configuration for criteria such as database credentials, authentication service credentials, cloud storage node characteristics and among others, information associated with the use of upstream, external services. This type of strategy lends itself to the need for specialized, tribal knowledge across the members of a development team, and in turn leaves itself vulnerable to the loss of that knowledge. Of equal and arguably greater importance, it manifests itself in a way that distributes configuration information across several facilities such as private code repositories and directly within executable packages (for instance, within the .war file that encapsulates a Java web application). It’s easy to go a step further and intuit that the overall security of an application or system may be compromised as well when configuration parameters are distributed in this manner.

By focusing our efforts on streamlining the approach to application configuration across Merritt, the team stands to benefit from consistent implementations across microservices while raising the security bar and ultimately making the system as a whole more robust and manageable. There’s yet another benefit that we will be privy to through this focus – and it plays directly into a larger goal of dynamically scaling services when large influxes of content are generating high load: moving application configuration parameters out of compiled executables promotes the use of a single application package version across multiple microservice instances. Given that all Merritt microservices are high-availability, this added benefit clearly fits in with long term goals pertaining to scalability.

Our approach

But what does it mean to streamline configuration? For our team, it boils down to removing the complexity of literally hundreds of configuration properties files from our codebase, and shifting to the use of a centralized parameter store provided by AWS Systems Manager (a.k.a. SSM).

SSM provides a parameter store that allows us to make use of a hierarchy of configuration parameters defined in YAML files. More specifically, the keys to parameters are stored in .yml files in application code repositories. Each compiled executable contains these keys, rather than values specific to a particular instance of the application running on a host. The key hierarchy incorporates entries for properties grouped by environment (dev/stage/production) including, for example, paths to endpoints, and service parameters such as regions, nodes and bucket names. Here’s a snippet of a YAML file with a key that refers to a database password in a Stage environment, highlighted in blue (followed by a corresponding section for Production):

stage:
  user: username
  password: {!SSM: app/db-password} 
  debug-level: {!SSM: app/debug-level !DEFAULT: warning}
  hostname: {!ENV: HOSTNAME}

production:
  user: username
  password: {!SSM: app/db-password} 
  debug-level: {!SSM: app/debug-level !DEFAULT: error} 
  hostname: {!ENV: HOSTNAME}

Importantly, an environment-specific path for a property is concatenated with the desired key value during a call to the SSM centralized store. For the above Stage environment example, a call to the store would result in the following path being used to obtain a secret:

/system/stage/app/db-password

When possible, keys are used in conjunction with SSM API endpoints to obtain actual secrets and other configuration information from the centralized store. On application startup, all necessary configuration values are obtained from the store and loaded at runtime. If it is not feasible to use an endpoint, information can be copied from the store into local environment variables on individual microservice EC2 hosts. In our case, environment variables are used for properties that are not sensitive information or are expected to be unchanged at runtime.

Furthermore, if the parameter store is offline, an application can load a default value which causes it to throw an exception that is captured in logs. The following example illustrates such a default value for a cloud storage provider access key:

production:
  user: username
  password: {!SSM: app/db-password}
  debug-level: {!SSM: app/debug-level !DEFAULT: error}
  hostname: my-prod-hostname
  accessKey: "{!SSM: cloud/nodes/my-accessKey !DEFAULT: SSMFAIL}"

Altogether, this process is dynamic in nature and is the basis of a strategy that enables loading and potentially re-loading values when required, without the need to incur downtime. Although each application must of course implement a mechanism to reload configuration on-demand, use of the SSM parameter store enables our stage and production microservices to be configured on-the-fly. In the future, we see this as a potential way to switch to new storage nodes that come online when it is beneficial to channel incoming content to a node during a migration, or for other risk mitigation purposes. And altogether, it’s our hope that we’ll be able to minimize downtime, and make the Merritt system more resilient to changes to its dependencies.

Working with DevOps

Underlying the implementation of SSM calls in our microservice applications is associated infrastructure and a partnership with our DevOps team. By working with DevOps, we were able to complete preliminary research into the use of SSM and tackle the learning curves that come with it, all while the team experimented with different approaches to configure, set and retrieve SSM values on Merritt’s EC2 instances. Based on this cooperative experimentation, we designed our overall approach.

To begin working with SSM, Identity Access Management (IAM) roles needed to be configured to perform certain actions. These roles help a DevOps administrator control access to resources provided by the systems manager. Roles are then assigned SSM-specific policies which allow the SSM to interact with an EC2 instance via a related IAM instance profile. Each policy defines access such as Systems Manager core functionality, S3 bucket access or running a CloudWatch agent.

In addition to the above roles, each of our AWS resources, servers, and lambda functions are assigned AWS tags. These tags are metadata that serve to define characteristics of each resource. For example, we have a tag that corresponds to a group of EC2 instances that all run Merritt’s Ingest microservice. Tags are used for many reasons. One in particular is to restrict access across environments. For example, recall our earlier SSM parameter path:

 /system/stage/app/db-password

This path corresponds to the tags for a stage resource. Therefore, the environment-specific construction of a tag promotes restrictions, such as a stage EC2 instance only being able to read stage parameters. It would therefore never be able to successfully access (e.g.) a production database password. Tags and policies govern the management and access to SSM parameters, and provide for an inherently more robust configuration strategy.

Of course, while these tag-based restrictions exist for Merritt’s microservice hosts, for administrative purposes, the team has access to an operations server that provides a central hub to manage SSM parameters. Here one can query SSM parameter values for all resources across all environments in UC3 systems. Our operations server makes for an excellent center of cooperation with DevOps that promotes further experimentation with SSM.

One such experiment that’s come to fruition is a set of tools created by our DevOps engineer to assist with routine tasks. It consists of aliases for common SSM commands, wrapper scripts, and other utility functions. One wrapper script in particular allows for the retrieval of SSM parameters that store database credentials. It provides well-defined and secure access to databases according to the same roles established for the parameters themselves. In fact, the script allows a user to access a database without ever viewing required credentials.

Current implementation and the road ahead

So where are we on our journey with regard to the use of SSM and its secure parameter store? At this point, the Merritt microservice with the most complex configuration strategy is now taking advantage of the parameter store in production. This is Merritt’s Storage microservice, which coordinates the shuttling of digital objects and their metadata to multiple cloud storage providers for safe keeping. During this implementation, we literally did move hundreds of configuration parameters and files from a private code repository into the SSM parameter store. We’ve migrated Merritt’s frontend service as well, which provides the system’s UI and a limited number of API endpoints. These are also both significant from the standpoint of languages. The Storage service is a Java application, while the frontend is a Ruby on Rails application. We’ve strived for parity in functionality across implementations in both languages.

At present, we’re working on similar implementations for two more Java-based microservices. These control the ingest of new content, and the execution of inventory tasks for stored objects and their versions. Once done with those, we’ll be able to wrap up our configuration strategy by doing the same for our replication and fixity checking applications.

In summary, it can be said that this project has helped (and is helping) us on a number of levels. Not only is it providing for centralized, secure application configuration management – but it is also allowing us to begin realizing the larger goal of dynamically scaling Merritt’s microservices. With scaling and increased resilience, the system will ultimately better serve our users and bolster our mission to provide a secure, cost-effective digital preservation solution.

If you would like to learn more about the specifics of our SSM implementation, please visit the following links.