Transforming the Management of Application Configurations & Secrets at 24 Hour Fitness
At 24 Hour Fitness, for many years, operations and development teams have gone through the pain of trying to manage and deploy application configurations with data stored in many files and locations across the ecosystem. The DevOps team was tasked with architecting and implementing a simple, reliable, highly available, testable solution to meet the growing needs of their applications. Through the combined use of Consul and Vault, they have successfully transformed the business.
In this talk, 24 Hour Fitness Senior DevOps Engineer Jason Yoe will describe the challenges faced, the overall design, and the implementation of the solution.
Transcript
It was a Wednesday afternoon before Thanksgiving, and I was finishing a few things up before I was heading out, when the team received an email talking about an intermittent problem that was happening with one of our sales applications.
A user, when selecting an option in our web application, was receiving an error. Not everybody would receive an error when they’d select the option, and as a matter of fact, when this user clicked the option again, they were successful. From the email, it was uncertain when this problem had begun and how many sales had been affected.
Flash forward an hour later, the issue had been escalated. The DevOps team is on a WebEx call with many agitated people. Agitated managers are in our cubicle aisle; they’re looking for answers.
It’s understandable. It’s the day before Thanksgiving, it’s a production issue, and it’s our sales application. So the team’s frantically looking through logs, we’re checking deployments that had happened the previous days, any sign for what could happen. Finally, we stumble upon an email about a configuration change that had happened the previous week.
This change was made directly to an application instance, but unfortunately those instances were not bounced at the time of the change. When a code deploy occurred a few days later, the applications were restarted and the change was applied.
Unfortunately, the configuration change was missed on one of the application instances, hence the intermittent issue. Of course, this change had been applied in other environments and had been tested, but unfortunately the person applying this change in production had made a few small mistakes. So, we had to transform our configuration management.
for full transcript is on https://stackshare.io/hashicorp/transforming-the-management-of-application-configurations-and-secrets-at-24-hour-fitness