Deploying your application secrets: Hashicorp Vault and continuous delivery

Alexandre DuBreuil
Gilles Di Guglielmo

Previously at Open R&Day

Context

Web application secrets

We define a secret as information that can be used to access sensitive data. Pretty much any information that we cannot put on a public repository. That includes:

  • Insurer web service credentials (username, password)
  • Encryption keys and key passphrases
  • Database credentials (username, password)
  • Out of scope: customer credentials, PII

Secret in Java file

                
public class ClientPasswordCallback implements CallbackHandler {

  private static final String USERNAME = "lesfurets";
  private static final String PASSWORD = "hunter2";

  @Override
  public void handle(Callback[] callbacks) {
    final WSPasswordCallback pc = (WSPasswordCallback) callbacks[0];
    if (USERNAME.equals(pc.getIdentifier())) {
      pc.setPassword(PASSWORD);
    }
  }

}
                
              

Secret in Tomcat server.xml

                
<?xml version='1.0' encoding='utf-8'?>
<Server port="1234" shutdown="SHUTDOWN">
  <!-- ... -->
  <GlobalNamingResources>
    <Resource name="jdbc/b2b2cDatabase" 
              username="dev"
              password="hunter2"
              url="localhost:2345"
              type="javax.sql.DataSource"
              driverClassName="org.mariadb.jdbc.Driver"
              jdbcInterceptors="..."/>
  </GlobalNamingResources>
  <!-- ... -->
</Server>
                
              

Today's objective

Remove secrets from code and production machines

Overview of password life-cycle

Our objective is to have a life-cycle that works like this:

  • Developer use password key in code (ex: insurer_password)
  • Developer puts development value in code (ex: testpass)
  • Security admin adds production secret value in secret system
  • Release manager deploys app without seeing the production value
  • Production application uses the secret

From code to production, different person with different access rights are handling secrets.

Prerequisite: Infrastructure as code

If you do infratructure as code, you probably have secrets in your source code. We want to keep infra as code, but remove the secrets.

Prerequisite: Infrastructure automation

Our machine provisioning and deployment is done with Ansible. It makes staging possible by facilitating the creation of new environment and enables disposable infrastructure.

Prerequisite: Continuous delivery

At LesFurets we deliver code to production at least daily. Continuous delivery means that it is easy to push a feature to production, and also easy to push an old version in case of emergency.

Security

Choosing a tool

Many tools are available for secrets management, yet not all will fit your purpose. Making your own custom solution might not be a good idea given how hard it is.

  • Ansible, Chef, etc.:
    do not remove secrets on production machine
  • Square Keywhiz:
    very similar to Vault and could have been a good choice
  • Amazon KMS, Azure Key Vault, Google KMS:
    somewhat similar to Vault but tied to specific ecosystem

Buildtime secrets vs Runtime secrets

You can fetch the secrets at:

Buildtime which means the production machine will have a cleartext copy of the secret

Runtime which means the production machine will dynamicaly get the secret, use it, then discard it, resulting in increased security

Hashicorp Vault

Lightweight, performant, open-source and battle hardened.

  • Seal and unseal makes your Vault safe
  • Wrap secrets to distribute them safely
  • Authenticate with different methods
  • One-time token by combining token auth and wrap
  • Audit log out of the box and easy to use

Our Vault usage context

Deploying multiple copies of Vault instead
of using it as a central database.

Why use Vault decentralized?

We are looking for very specific advantages:

  • Disposable infrastructure, continuous delivery and version migration
    it's easier to replace than modify.
  • Attack surface and staging
    deploy specific secrets for a specific env
  • SLA and performance
    network issues are hard and one local Vault per JVM is super-fast

Team Password Manager

TPM is a password manager (like Vault) containing our secrets, but it is never used directly by the production servers.

  • Additional failsafe layer: if it fails, it doesn't impact the system
  • Easier to migrate: since the production doesn't depend on it
  • Can be any database system: ever another Vault

Storage of key -> values

Uses permissions and audit logs

Threat model

What is a threat model?

A process by which potential threats can be identified, enumerated, and prioritized.

Why a threat model?

To design a system with security in mind.

How to do a threat model?

There are many ways to do a threat model,
today we'll use the popular STRIDE method.

(see wikipedia.org/wiki/Stride)

Threat model
Vault Startup

Threat model
Vault startup

Threat model
Vault startup

Denial of service
TPM

Threat model
Vault startup

Information Disclosure
Unseal key

Threat model
Vault startup

Information Disclosure
Unseal key

Threat model
Vault startup

Information Disclosure
Key in transit

Threat model
Tomcat Startup

Threat model
Tomcat startup

Threat model
Tomcat startup

Elevation of privilege
One time token

Threat model
Tomcat startup

Elevation of privilege
One time token

Threat model
Tomcat startup

Denial of service
One time token

Threat model
Tomcat startup

Spoofing
Session token

Security design implications: decryption key

There is only one decryption key that can unseal the Vault. It should never be written to disk. If the Vault is sealed (manually or not), it cannot be unsealed again.

If that happens, the application needs to be redeployed.

Security design implications: authentication

There is only one, single use, wrapped token than can provide the session token. Once the wrapped token is used, there is no other way of connecting to the Vault.

If the connexion to the Vault is lost for too long, the lease for the session token expires and the app cannot authenticate anymore.

If that happens, the application needs to be redeployed.

In practice

Overview of delivery pipeline

No more secrets in code!

                
public class ClientPasswordCallback implements CallbackHandler {

  private static final VaultService VAULT = CoreServiceFactory.getInstance().getVaultClient();

  @Override
  public void handle(Callback[] callbacks) {
    final WSPasswordCallback pc = (WSPasswordCallback) callbacks[0];
    if (VAULT.getSecret("insurer_username").equals(pc.getIdentifier())) {
      pc.setPassword(VAULT.getSecret("insurer_password"));
    }
  }

}
                
              

Java design implications

Read OWASP Secure Coding Practices and make sure it is known in the development team. A secure system needs a secure codebase.

Java isn't a secure language but for our use case using short lived secrets (stack memory, not heap memory) is a good start.

Using a security static code analysis tool like Checkmarx is also recommended.

Operations

Performance / Scalability

Using Vault decentralized makes it easier to manage and performance is not an issue if each JVM has it own Vault

We rely heavily on Vault, since each PII encryption needs an encryption keys in Vault.

It's also easier to scale by adding new Vaults and more resilient to network failures.

Conclusion

In retrospective: good specific solution?

Remember our goals, mainly: disposable infrastructure, continuous delivery, version migration, reduced operation, performance (speed and network).

In retrospective: disadvantages

Complex solution: compared to a single Vault, this is more complicated to implement, but easier to automate and maintain.

Requires strong automation: we had to port old Bash deployment to Ansible, but it is a healthy approach that benefits the whole system.

Impossible application restart: this is disposable infrastructure, it is not a problem if redeployment is fast.

In retrospective: advantages

Continuous delivery and disposable infrastructure: easier to replace than migrate.

DevOps: no additional infrastructure, less work for the operations team, and more freedom for the devs.

We have no network failures, no migration, excellent performance and easy staging for new environment.

Thank you!