As a payment processor, we deal with many secrets – Encryption Keys, database configurations, application secrets, signing certificates etc. Most of these secrets are required by a specific service (say the Razorpay dashboard) to do routine tasks (such as connecting to the database).
Secret Management is how you make sure that the specific service (and only that specific service) gets access to the correct (and latest) secrets.
This is mostly a non-problem when you are a small startup, but as we’ve grown from a small startup managing just a couple of servers, to managing large Kubernetes clusters, the way we store/use secrets has changed considerably. Over time we’ve switched through various approaches in how we store and ship these secrets to our services.
Secret Management is a common orchestration problem and has multiple different solutions. This blog post walks you through Razorpay’s Secret Journey: how we’ve tried out various solutions over various timelines and what benefits did they bring us.
Stage 1: Ansible Vault
We started out with all of the secrets being stored in a common Ansible Vault file. Ansible is part of our DevOps tooling and used to configure servers. This vault file was used on automated Ansible runs, which would run on the live servers using Ansible-ssh.
However, this resulted in our CI pipeline having access to our production servers, which we weren’t comfortable with. Ansible-vault also did not permit any granularity on the secret access – everyone with access to the vault key had access to all the secrets.
- Spin up a new base VM in EC2.
- Run ansible-ssh on the instance against the correct role
- Push the final image to Amazon as an AMI (Amazon Machine Image)
This is a very common infrastructure setup (Ansible+Packer) and works reasonably well.
Stage 2: Credstash
In order to get more granular control over our bakes, we switched to Credstash. Credstash is a well-established project (written in Python) for storing secrets safely in AWS. It does the following:
- Uses Amazon KMS to encrypt/decrypt secrets
- Uses AWS DynamoDB to store the encrypted secrets
- Supports a few nifty extras such as secret versioning
While we continued to use Ansible, Ansible’s Credstash module was an easy replacement for Ansible vault. It allows us to use:
lookup ('credstash', 'super_secret')
inside the Ansible jinja templates. We managed access using AWS IAM roles granted only to the Packer instance (we called these “baker instances”).
Stage 3: Alohomora
While Credstash served us well, we faced challenges with development velocity because of the bake process being slow. Each layer on our Ansible build system took anywhere between 10-45 minutes to run and led us to look for faster alternatives.
Since we were pretty happy with Credstash as our vetted secure storage method, we decided to take a leaf out of Etsy’s book and try out “configuration deployments”. The basic idea is to allow configuration updates on the same footing as your regular deployments – fast, easy, and accessible. We’d already been using AWS CodeDeploy for deployments to our codebase and decided to merge the two approaches.
Instead of splitting deployments into two categories (which is what Etsy does), we decided to make some changes to our Code Deploy infrastructure. Because of our current usage of Ansible Vault and switch to Credstash, most of our applications relied on secrets being readable from specific files. We worked around this problem by writing a small wrapper on Credstash called Alohomora. It does the following:
- Fetch secrets from a specific DynamoDB table using Credstash
- Write them to disk using a jinja template
The Jinja template is shipped alongside our codebase, and lets developers know exactly what secrets are exposed to the application. We run Alohomora as part of our deployment:
alohomora cast --env $CODEDEPLOY_GROUP --app $CODEDEPLOY_APP secrets.j2 license.j2
The extra variables ($CODEDEPLOY_*) are exposed by AWS CodeDeploy and let Alohomora decide which table to read the secrets from (It standardizes a naming scheme of
In case a secret is missing in the DynamoDB table, the deployment fails with an error message since we prefer to fail a deployment than allow it to go through with a missing secret.
We’re open sourcing Alohomora alongside this blog post, go check it at https://github.com/razorpay/alohomora. It has been a great enabler of faster configuration deploys at Razorpay, and we hope it can be of help to other companies pushing secret updates regularly to their applications.
Stage 4: Kubestash
Our Devops team bet on Kubernetes early on. We were running production code on our in-house Kubernetes cluster by Q3-2017. The Alohomora deployment script was moved to the entry point for our docker images and the IAM roles maintained using kube2iam (we’ve since switched to Kiam).
Alohomora, while working decently in a Kubernetes infra, wasn’t Kubernetes-native. As such, it gave us issues with:
Resource Quotas: We saw CPU spikes in the application during the deployment when Alohomora ran. As a result, we had to accommodate for higher resource quotas on the applications compared to what the service was using.
Python: Alohomora was written with Ubuntu 16.04 based deployments in mind and supported Python 2.7. We started facing issues with python dependencies with services using Python themselves. We’d have faced this issue with our Ubuntu setup as well, but running on docker exacerbated it.
Not Kubernetes First: Kubernetes already provides a secret management solution in Kubernetes Secrets. It allows for both file and environment variable based secrets. Running Alohomora and fetching secrets from Credstash felt like an alien solution in the Kubernetes world.
We found a solution in another small Credstash wrapper called Kubestash – a small command line application to sync your Credstash secrets to Kubernetes. We’ve since contributed patches to Kubestash that work with our specific workflow and allow for cluster level syncs.
This allows us to store our secrets using Credstash and know that they will get pushed automatically to our Kubernetes cluster using Kubestash. The primary command that we use is Kubestash Daemonall which syncs a complete dynamoDB table against a Kubernetes cluster. We run this as a single pod deployment in our cluster.
One caveat to keep in mind if using Kubernetes secrets is to make sure that your etcd store is encrypted, otherwise etcd will store all your secrets on disk, unencrypted.
You can find more details in the Kubestash documentation at https://github.com/af-inet/kubestash.
If you’re reading this, there are several other alternatives now available to you that you might wanna consider before picking a solution:
Confidante by Lyft : We didn’t try this out since this was released after we’d switched over to Credstash, but it is fairly similar in scope (KMS for encryption + DynamoDB for Storage). It also features a Web UI where users can update secrets.
Just Kubernetes Secrets: If you’re running on a managed Kubernetes cluster, this is a very good solution that you should consider. In our case, we wanted something other than etcd to be our primary secret store which is why we went with Kubestash (it lets us keep dynamoDB as the primary store)
AWS Parameter Store: The AWS Parameter store allows you to store arbitrary key/value pairs and grant access using IAM roles. There are some wrappers (similar to Credstash) that use Parameter Store instead of DynamoDB.
AWS Secrets Manager: Recently announced at this year’s AWS: Invent, this is a slightly costlier solution that allows for secret versioning and automated secret rollover using Lambda jobs. We might consider this if it supports a native Kubernetes integration (which might show up with AWS: EKS perhaps?)
Interested in automating things and helping us scale the most robust payments platform in India? We’re looking for Infrastructure Engineers at Razorpay! Check out the job postings at https://razorpay.com/jobs