Stop Clicking, Start Automating
Don't let a simple problem snowball into a major outage. In 2019, do DevOps the right way!
Everything starts with the software development lifecycle and the belief that cross-functional teams, collaboration, and agile programming lead to a higher grade of software. High quality software manifests itself in the availability, dependability, throughput, and scale of the software you deploy. In order to reach this goal however, teams need to be tightly coupled with automated testing, integration, build pipelines, and deployments. Without any one of these pieces, a simple problem can exacerbate itself and snowball into a major outage, or worse. That’s why in 2019 you need to click less and automate more – start doing DevOps the right way. Here is how to do it.
Codify All Your Cloud Resources
You’ve heard this referred to as Infrastructure as Code. This is it. All infrastructure should be built in a repeatable and testable manner; do not configure things by making manual changes to your application. At Prominent Edge, all infrastructure is defined in a Git repository using DevOps tools such as CloudFormation, Ansible, Chef, HashiCorp’s Packer, Kubernetes, Terraform, Prometheus, Grafana, kops, and helm.
Your goal is to create an Immutable Infrastructure that is at the core of your system.
Our Engineers treat servers as ephemeral resources and load as much persistent configuration as possible into containers and/or machine images. This allows us to utilize various features such as unattended-upgrades to ensure our underlying physical hosts are always at the latest stable software. Our images are created using an automated process that builds and tests all packages added to the base machine image. Runtime configuration is then applied to servers when they are launched into service (e.g. network addresses of peer nodes, identifiers for running cloud resources, etc.). These servers utilize cloud provider services to achieve the highest uptime with the most minimal costs that suit the proper balance for a given client. This is the foundation of a highly available cluster of servers in the cloud.
Next, enter containers. Using containers we decrease deployment time from hours to seconds. Developers push code to repositories which instantly build containers with the latest commit sha, and/or branch. The container is deployed to a kubernetes cluster running on top of our infrastructure into its respective environment. Disk space is minimal as containers only utilize the files necessary to run its code. This means the potential for scalability, both horizontally and vertically, is immense. Containers take servers that were once severely underutilized and optimize their usage to the max potential. Blue-green / red-black, canary, etc, no longer involve tearing down and standing up servers in the cloud; it is now done with a simple variation of the helm install command. Kubernetes and kops ensure that servers are always being utilized or they are taken down to maximize their usage. If a container needs space their cluster is automatically scaled to add more resources for the incoming container. This is all done through a well-defined markup syntax that exists in a GitHub repository.
Automate All the Things
No environment should get created without access management, role-based access control, subnetting, proper firewall rules, and network security being considered first. At Prominent Edge, no application is released to the public without a delivery pipeline that includes continuous integration and testing as well as continuous delivery, proper authentication and encryption, and security features such as setting Web Application Firewall (WAF) rules. But remember here, these are not human functions. Any task we perform relating to infrastructure, we automate. And, once automated, we monitor telemetry - and then we automate that too.
If it cannot be automated, it should not be done and if it isn’t automated, somebody is doing it wrong.
Currently, we use kubernetes as our container management system. This, plus kops, allow us to stand up a kubernetes cluster with the latest version of docker and kubernetes in minutes using a simple set of commands. Kops uses Terraform under the covers, but can also use or create CloudFormation templates. This means it can export code once the cluster is created. Further, we recommend using Kops with SpotInst. SpotInst is a third party software we use to get the cheapest spot instances across our availability zones and regions. We still use packer, ansible and terraform to bake our base images, but these are just our underlying physical hosts we specify in our configuration code for kops.
Lastly, all infrastructure components are designed to be self-healing (read: no more pager duty!). As a result of these principles, when the service allows, we deploy things in a manner that is infinitely scalable. All this automation might make you nervous, but keep in mind before we get here, the all source code was tested multiple times and deployed through Continuous Integration and Continuous Delivery (CI/CD) pipelines. By the time we are here, we are super sure of ourselves.
Do it in the Open
So, you codify everything, then you automate everything, then the next big step is to leverage open source as much as possible, if not completely. We are big believers in open source software. Prominent Edge utilizes and contributes to open source software in order to expand capabilities, increase reliability, and reduce development costs. Our company and our individual employees, both in an official capacity and as individuals contribute to the open source community. Open Source software provides a higher degree of security, is of superior quality, is easily customized, and is far more cost effective than comparable proprietary software. Further, with the open source software, our engineers have a deeper understanding of the inner workings of the underlying libraries. Why? Because some of our engineers may have been actual contributors to the software itself. They know exactly what is happening when a process executes or a service returns a result. As a result of the ‘open-ness’ there exists a higher level of trust with the overall system. This is built into the paradigm that one does not get with closed source software.
We can claim to be experts in many areas, mostly because we wrote that area.
The Prominent Edge website boasts 18 Ninjas, 219 years of combined experience, and 3,490 recent open source contributions. These are real numbers. We practice what we preach. You just need to put the time into understanding the tools with which you are working.
The Philosophy Applied
Recently, we stood up a collection of services in AWS for a customer. We built all machine images using Chef, Packer, and KitchenCI. Further, we dynamically selected machine images using AWS lambda functions (rather than manually select Amazon Machine Image IDs) and all cloud resources were specified in CloudFormation templates. But don’t stop there. We used Jenkins as a Continuous Integration/Delivery hub, but to stay true to our philosophy, we converted Jenkins into an ephemeral resource. As an ephemeral resource, master nodes can come and go with zero critical data loss and minimal impact on developer productivity. In this model, all pipeline configuration is stored in code external to Jenkins. A typical pipeline will be triggered when a change is detected in a remote repository (in our case, GitHub). What follows is completely automated, and awesome.
The first step is usually some kind of static code analysis followed by a suite of internal testing (e.g. unit testing). Next, some sort of discrete artifact is built (either a jarfile, binfile, tarball, etc) and pushed to an external store (e.g. Nexus or Amazon S3). After an artifact is generated and located in the external store, the artifact is staged in the cloud. If staging is a success, a series of black-box tests are run against the service (e.g. curl requests to HTTP endpoints, etc.). Lastly, the staged service is brought into production via one of two methods; either a blue-green deploy via DNS updates or a blue-violet deploy like for attaching new instances to an existing load balancer. And, that’s it – a new service is in production.
Because everything starts and ends with the software development life-cycle it stands to reason that the sharper you keep your tools, the cleaner your work. If you or your organization can’t explicitly lay out your DevOps position, you’ve got a problem. Let 2019 be the year you achieve scale with your application, reach five nines of availability, and realize efficiency unlike anything you’ve done before. If that is what you seek then tightly coupling testing, integration, build pipelines, and deployments to your core software development process are critical. So in 2019 remember to stop using web UIs for configuration, codify all your resources, use open source, and automate everything.