About this DevOps Guide
DevOps is an umbrella term encompassing a variety of interrelated and quickly evolving technologies, as well as a profound culture shift. The culture and the technology can be difficult to understand, even for people working directly in the field.
We created this guide to help non-technical people make better sense of the DevOps landscape, what the tools within it do, and how they are related to one another. We hope this guide will enable both non-technical and technical readers to engage in more fruitful conversations and make informed technology and training decisions around their DevOps tools, stacks, and processes.
About Us: DevelopIntelligence is an award-winning provider of managed technical learning solutions, including software development, open source technologies, and technology leadership development.
Producer: Kyle Pennell (Technical Content Manager for DevelopIntelligence)
Editor: Dave Wade-Stein
Contributors: Angela Karl, Chloe Talbot, Eric Griego, Fern Pombeiro, Mayank Bhardaj, and Kyle Pennell
DevOps (a clipped compound of DEVelopment and OPerationS) is one of the fastest-growing areas in the programming and development world. This section will explore what DevOps is, the various types of tools involved in DevOps, and how DevOps might be evolving.
The Origins of DevOps
In 2007, Patrick Debois accepted a position with a Belgian ministry wherein his task involved migrating a data center. Debois was determined to understand every part of the IT infrastructure, and his role in QA (Quality Assurance) forced him to move between the development and operations worlds with regularity. Some days, Debois would be working with the dev team–planning, participating in agile development, working with developer tools, and so on. Other days, he found himself embedded within the operations group–fighting fires, keeping production running effectively and ensuring that code was effectively deployed. Switching back and forth illustrated the stark contrast between the development and operations cultures and Debois came to the realization that there must be a better way for them to work together.
Development and operations were generally separate “siloed” functions within an organization before the advent of DevOps.
Now that we’ve briefly explored the history, we can focus on what DevOps is–as well as what it isn’t. First and foremost, DevOps is a human problem–specifically, that of a historical lack of communication and collaboration between developers, IT professionals, and QA engineers (and, more recently, information security professionals). Therefore, embracing DevOps translates into a profound culture shift, wherein developers, IT, and QA communicate and collaborate on a daily basis, breaking down the silos that formerly existed between these groups. Without this culture shift, DevOps cannot succeed.
Let us be clear–a culture shift of this magnitude is difficult, and it will not happen overnight. Understanding the steps that are required is easy, whereas implementing them is quite another story. Furthermore, for successful adoption of DevOps, 100% management buy-in is required. Should management continue to expect its employees to remain entrenched in the old ways (where “old” means circa 2008!), an attempted DevOps adoption will fail spectacularly.
Given that we haven’t yet truly defined DevOps, let us begin with the notion that DevOps embodies a set of principles that espouse increased communication and collaboration (among other cultural changes). These principles are often described via the CALMS model–Culture, Automation, Lean, Measurement, and Sharing. As before, understanding these principles is easy, whereas changing behavior to embrace them is anything but.
At this point, we are almost there, and a practical, technical definition of DevOps will certainly be helpful–this one was taken from the Agile Admin blog:
DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support.
Here is another definition, this time from Wikipedia:
…a culture, movement, or practice emphasizing collaboration and communication of software developers, QA, and other IT (operations) professionals, while automating the process of software delivery and infrastructure changes.
It should be clear that there is no single definition, and indeed, DevOps can mean different things to different people. But at its core, DevOps is certainly about increased collaboration and communication, culminating in the breaking down of the “silos” that formerly existed around Dev, Ops, and QA.
Before we examine some of the various workflows and technologies of DevOps, we want to make clear what DevOps isn’t. DevOps isn’t simply a mixture of Dev and Ops, nor is it a department in your organization. There are no DevOps certifications, so it’s not compliance. It’s not a product–you can’t buy or download it, nor is it a tool or even a collection of tools, although as we’ll see shortly, there are plenty of tools which we may leverage in order to enhance our DevOps journey.
Striving for Continuous Integration and Continuous Deployment
A company that runs a DevOps environment effectively is rewarded with continuous integration (CI). This is a development lifecycle in which there is a continuous stream of deployments from the production code base. The metric most commonly used to determine the relative success of CI is “deployments per day,” stemming from the seminal 2009 presentation “10+ Deploys per Day: Dev & Ops Cooperation at Flickr” by John Allspaw and Paul Hammond.
Instead of the relatively slow and cumbersome 48-hour deployments of yesteryear, continuous integration allows developers to pivot quickly to address issues, make changes, and constantly experiment.
CI is the beating heart of agile, lean, and many other management philosophies. CI makes for better software, happier users, and healthier companies; but, in order for effective CI to occur, we need to decentralize much of the traditional “dev” and “ops” activities such that every member of the team is working together. This is the management challenge of DevOps.
The Tension Between Dev and Ops
Operations teams have historically concerned themselves with such things as user environments, server states, load balancing, and memory management. They need to keep things running in a fixed state within a constantly changing environment. Developers, on the other hand, are all about constant deployment and constant change. Getting these two teams to work together can be a gargantuan effort. As we will see, many new technologies have been developed that can help us overcome this challenge.
Version Control Technologies
The early days of DevOps saw the re-invention of version control technologies such as Git, Github, Bitbucket, and SVN. These tools actually existed in the pre-DevOps world, but they’ve taken on new importance under DevOps. Instead of devs simply being concerned with getting the correct version of the code, the ops team is now deploying code that is checked in and built daily. Everything that is deployed must go through rigorous integration testing before it will be allowed onto a production machine. Version control allows a virtual connection between the developers and the operations team such that it is a trivial process to “roll back” undesirable code in order to return the production machines to their previous states.
Automating Deployments and Continuous Integration
Continuous integration tools such as Jenkins, TeamCity, and Travis enable code to be built and tested as soon as it is checked in, effectively automating the deployment and QA processes. Given that automation is a key principle of DevOps, these tools allows for faster integrations which no longer rely on human intervention, and in concert with version control systems, they allow for easy rollback in the event any errors are detected.
Cloud Services and Configuration Management
Another obvious issue arises when one considers the prevalence of cloud technologies in modern development lifecycles. For the first time in history, production servers may be created or destroyed at will using platforms such as AWS, Azure, and Google Cloud Platform, enabling elasticity in our load balancing. Before cloud servers were the norm, companies purchased physical servers in order to handle the maximum computing loads that they anticipated. This is the metaphorical equivalent of owning enough warehouse space for “Christmas level” throughput whilst needing the vast majority of that space but a few times a year.
As these cloud-based servers are brought online, we must ensure they share a common configuration with current production server(s). The technologies used to manage these servers are configuration management tools such as Puppet, Chef and Ansible. These tools were created to manage the configurations of large numbers of servers through an easy-to-use scripting-like language. They work by creating machine descriptions (“infrastructure as code”) that can be stored in and retrieved from version control, and can be swiftly applied to tens, hundreds, or even thousands of machines. Should the desired configuration change, it is typically a trivial process to push out a new configuration to all of the machines in our infrastructure.
Microservices and Containers
Continuous integration and deployment are built on the philosophical concept of modularization. This is the idea that a thousand small changes are better than one large one and that developers should seek to isolate their code to enable these small changes. The idea of re-deploying an entire code base over several days has long passed.
On a practical level, this means that instead of building the monolithic applications of yore, developers are currently building applications based on microservices–small, independent, easy-to-replace applications. Architectures based on microservices allow for easier continuous deployment.
Container technologies such as Docker and LXC are based on a simple idea–namely, they enable developers to encapsulate (or “containerize”) an application (or a microservice which is part of an application), along with any dependencies that are required to run them.
Instead of depending on “golden image” virtual machines, developers can now simply encapsulate their work in containers which can then be deployed into the production environment as completely independent microservice applications.
Containers isolate the code they contain from the underlying host machines. As a result, the dependencies of the code live inside the container and therefore cannot conflict with versions of those dependencies which may be installed on the host machine. Concerns regarding the state of the production server at any given time become irrelevant–the containers will run regardless of where they are deployed. Another advantage is that the containers themselves are disposable. They are environments that launch, run an application, and then disappear. Instead of building an application in an environment, developers are able to build an environment around an application.
Managing Microservices and Containers
Once we deploying large numbers of containerized microservices rapidly, we have to have some way of managing them. An abstraction level is necessary to allow an operations team to effectively deploy and manage microservices that are all part of a larger production ecosystem. Cluster Manager tools (also called Orchestration tools) such as Docker Swarm, Kubernetes, and Mesos are designed to help with this. These tools allow the scheduling and rapid deployment of microservices to multiple nodes in a cluster, enabling operators to manage the rapid integration and deployment of large numbers of containers in a multi-node environment.
Where is DevOps going?
As you can see, our current DevOps ecosystem is not really a single item, it is an environment made up of several technologies combined together to make for a smooth code transition. This is partially done by design. A large part of any decent engineering framework is the ability to unplug a widget from the machine and plug in a new one at will. An example would be replacing Puppet with Ansible or replacing Bitbucket with Gitlab at a moment’s notice. The ability to make these rapid changes afford engineers the flexibility to use the constantly evolving technologies and plug them in with as little disruption to the overall flow of the process–but, there is a cost to doing this.
First, there’s the human element. At this point, to be an effective DevOps engineer, one needs to have competency in many different technologies. A DevOps engineer could be spinning up AWS machines today, writing bash scripts tomorrow, and rolling back changes in version control the day after (or all three of these tasks in the same day). Being able to do all this, and do it well, is a daunting task for any engineer. We find ourselves in a situation where developers and operations folks need to know how to code and have a deep understanding of all of the different aspects of a deployment. How do we solve this conundrum?
Management technologies have been created recently to help merge these disparate technologies into a seamless system for controlling these technologies. This is the concept of “abstraction.”
Abstraction lets us take these complex smaller processes and abstract them away so we’ll need fewer people to run them effectively.
Developers might still occasionally have to go into production machines and write bash code or tweak a Docker container which isn’t working properly, but generally speaking, abstraction will give us a single tool that will do the majority of the work around scheduling, prioritizing, and ensuring the smooth flowing of most of the tasks in this system. It’s much easier to understand and operate the tool than it is to understand the unique aspect of every process the tool is controlling.
We are starting to see the rise of integrated environments such as Cloud Foundry, Spring, and Sonatype. These environments enable DevOps teams to manage a smooth, integrated DevOps chain whilst simultaneously ensuring the same “plug and play” flexibility that comes with the current setup. More companies will adopt integrated environments to smooth out their DevOps workflow.
Continuous integration (CI) is a fundamental part of any DevOps team’s toolchain. While often mentioned in the context of the modern DevOps movement, the idea of continuous integration began as one of the core tenets of Extreme Programming, a software development methodology that became popular in the 1990s. The basic idea is that code is integrated continuously, throughout a software project’s development, rather than at the end of the development cycle.
Prior to the adoption of continuous integration, it was not uncommon for software engineers to work in relative isolation, developing code independently.
At the end of a project would then come a tedious, painful, and occasionally disastrous integration phase in which the developers would work to combine, or integrate, their independent code into a single functioning program. This type of workflow has an obvious drawback–miscommunications, errors, or incompatibilities are worked on in an extremely costly manner after coding has been completed.
Perhaps the CI tool that is the most widely-known in the DevOps community, Jenkins is estimated to have over one million users worldwide. As with any other CI tool, Jenkins offers the benefits of eliminating post-development integration steps and the ability to uncover errors directly after committing the code.
As is often the case with market-leading products, one of the shining highlights of Jenkins is the large amount of plugins and integration points that have been developed for external systems.
Jenkins supports a wide variety of version control systems include Git, (Subversion) SVN, CVS, Mercurial, Perforce, and even Clearcase and AccuRev. Many users appreciate the simplicity of the Jenkins server. Additionally, builds are represented visually in the testing pipeline, which can be of great help in visualizing progress and assisting in troubleshooting.
With this wide range of supported capabilities and ecosystems comes some drawbacks. Configuring Jenkins can be quite daunting. Configuration screens and menus lack an intuitive feel, and the myriad options are nested in pages that might not be immediately apparent.
For such an advanced tool, the visual look and feel is geared towards being utilitarian, at the expense of having a modern-feeling interface.
TravisCI stands out for its simplicity of use and maintenance, as it is a hosted application with a unique and straightforward configuration process. The beauty of TravisCI is that the configuration of the tool is all defined in a configuration file which resides in the root of the project directory. Using a configuration file is an idea that been copied by several other CI tools, such as the GitLab CI runner and CircleCI. This means no GUI configuration is required–simply add the configuration file to your repository using simple sequential commands and Travis uses the file to generate full CI. GitHub users drive much of Travis’ userbase and as a result it has near-flawless integration with the site.
Cost is listed as a major consideration when implementing continuous integration, and as a result, many avoid TravisCI which is free for open-source projects, but $129 per month for commercial use.
Another issue to consider is that while TravisCI’s cloud-based model is attractive to some companies and industries, is can be a dealbreaker for others, such as those in the government or healthcare sectors who typically possess hefty security concerns.
Service outages and slow builds are possible with the TraviCI cloud-based service. It is, however, possible to install Travis CI on your own servers if you purchase an Enterprise TravisCI license.
Developers familiar with JetBrains, the company behind TeamCity, can expect the same polished and intuitive interface as their wildly popular IDEs such as IntelliJ IDEA and PyCharm. Growing in popularity in complex environments, TeamCity is another commercial CI tool that offers a limited free tier as well as a licensed plan for larger companies and environments. TeamCity stands out in the field as having an excellent array of standard, default features that other CI tools provide only through plugins and extensions. One feature that is highly beneficial to development teams, DevOps admins, and agile scrum masters alike is the rich data TeamCity provides in the form of statistics, test history reports, and detailed change logs. TeamCity also shines in integration with other JetBrains products, as one would expect, as well as with many other DevOps tools.
Some issues to consider with TeamCity is that it is a versatile tool, allowing you to create your own processes, flows, and controls. Of course, this versatility and vast configurability on a per-project basis can be a stumbling block to those not familiar with CI or those looking for a simpler, out-of-the-box solution. You need a clear understanding of what you wish to accomplish with TeamCity, and an idea of what processes you want to follow in your continuous integration workflow. It’s important to consider TeamCity’s complexity as both a strength and, potentially, a weakness, given your team’s unique scenario.
Bamboo is a widely used CI tool, particularly when a team uses one or more other Atlassian products. Arguably the easiest to setup and integrate amongst all of the tools we’ve mentioned so far, Bamboo holds real appeal for those teams who may be less experienced in establishing continuous integration, and who want an easy, well-supported out-of-the-box solution for CI.
Most commonly paired with Atlassian’s issue tracking system, Jira, Bamboo offers more than just a continuous integration tool. Bamboo instead offers a view into commits, test results, and deployments tied to issues. This results in a highly visible view of what is happening in your development environment and offers a ‘big picture’ into the health and status of your project, even for non-technical users. Atlassian also manufactures a variety of enterprise-focused DevOps tools, such as the Confluence wiki system, communication media such as Trello and HipChat, and source code versioning system Bitbucket, all of which are extremely well-integrated with each other, if desired.
Using Bamboo is not without a number of trade-offs, however. The ecosystem of Bamboo plugins is much smaller than Jenkins, meaning that it offers less flexibility that other CI tools.
Due to the lack of plugins and extensions, Bamboo has what appears to be a more limited feature set as compared to the other continuous integration tools we’ve covered. Additionally, although many of Bamboo’s strengths are in part due to to excellent integration with other Atlassian tools, there is always some concern in relying on a single vendor’s vision of a DevOps pipeline. To do so introduces rigidity and lock-in, leaving it difficult to experiment with other toolchain components, even if they might be more suited to your team than only that which Atlassian offers.
The art of continuous integration is comprised of both developers continuously integrating their code and the tools that provide immediate feedback on the quality of their commits. This ‘cheap’ but hugely rewarding practice is a crucial component of delivering quality software in modern development. You cannot fully reap the benefits of one component without another, so planning out your team’s approach and process is crucial in determining your success with continuous integration.
Container technologies such as Docker, rkt, and LXC are an important part of the DevOps ecosystem. Containers are isolated environments that allow teams to run multiple applications on one server. Containers are a easier and faster alternatives to traditional virtual machines and have been quickly adopted by many DevOps teams. The following helps to define what containers are and why DevOps teams use them.
In order to understand containers, it’s worth quickly comparing them to virtual machines. Virtual machines (VMs) use virtualization software on a server to emulate hardware. VMs allow companies to consolidate an array of diverse applications on a single server. The downside is that VMs can be costly in space and processing. VMs take minutes to start up and can consume up to several Gigabytes of memory.
Containers, on the other hand, sit on top of a server and operating system. Containers encapsulate an application and all of its dependencies but still share the same underlying operating system with other containers.
This image from Docker portrays this concept well:
Because containers are so much smaller than VMs, they can be moved around faster and many more of them can be put on the same server together.
Containers decouple applications from operating systems which creates new flexibility in the management of application infrastructure.
Docker is an open source container software solution and the goliath of the container world. Docker’s adoption has been nearly exponential over the last couple of years. Docker was originally built on top of LXC, but many new features and functionalities have been added during its evolution.
The Docker Container Engine gives developers all the tools they need to build, migrate, track, secure, deploy, and test containers. Thousands of companies use Docker to containerize their applications and the various services that make up their applications.
LXC (short for Linux Containers) is one of the tools in the suite of Linux Container tools. It is credited with starting what’s been called the ‘container revolution.’ LXC and Docker have some similarities and some major differences. Docker is more of a platform and suite of tools for working with containers (previously LXC containers). LXC is more of a method for creating separate operating system level Linux containers. LXD is a newer project that builds on LXC with more functionality. LXC/LXD are meant to run on Linux only, while Docker can run on Linux, Windows, or Mac.
Rocket (rkt) is a newer container technology, and an alternative to Docker, that is focused on architecture security and simplicity. Rkt was created by CoreOS, a company that supports the development of container tools. It’s difficult to explain the core differences between rkt and Docker without resorting to heavy jargon. Suffice it to say, rkt believes that Docker does not follow Unix best practices (especially in regards to security) in its architecture.
Unlike most container solutions, rkt does not involve any exterior daemon (background processes). Rkt, like LXC, is meant to be run with Linux, and works quite easily with the popular orchestration tool, Kubernetes.
The simple takeaway with Rkt is that it is a more securely and simply architected Linux-based alternative to Docker. With Docker becoming the dominant container tool, it is good to have some healthy competition in the marketplace.
Docker is the dominant container technology as of 2017. LXC/LXD and rkt offer viable alternatives for companies looking to avoid the potential security/architecture concerns surrounding Docker.
DevelopIntelligence offers a variety of expert-led, hands-on DevOps training courses for DevOps teams. If your team needs training on any of these topics, contact us about having a course custom-built for your team.
Cluster management/manager tools are essential for controlling a large number of containers; an essential technology in any DevOps stack in 2017. Without cluster management, it would be virtually impossible to manage the entire cluster server, quickly resolve failures, and to pair containers with resources.
There are five main cluster manager services that will be discussed to illustrate why they are used and the advantages and disadvantages of each service.
What do cluster managers do in DevOps?
To understand the function of a cluster manager, you first have to understand what a cluster is. Essentially, a server cluster is multiple servers grouped together to communicate with each other. These clusters work together to allow for more availability, reliability, and scalability.
With these, users can achieve more than what is possible with a single server and help protect against multiple types of failures. This includes guarding against application, hardware, or site failures.
A quick note for those who are unsure: load balancing is the act of optimizing resources by distributing the workloads across different resources, while high availability means that the system has the lowest possible level of downtime so it is always accessible.
Kubernetes is “an open-source system for automating deployment, scaling, and management of containerized applications.”
This is the most popular container orchestration system right now. It’s rather opinionated, which means that its users are restricted to performing certain actions in the way that this container tool deems best.
However, Kubernetes still offers user choice; they do not limit the types of applications supported, choose application frameworks, restrict supported language runtimes, and more.
This cluster manager is important because it includes a range of features, from scheduling and running applications containers to providing the tools necessary to build a container-centric development environment.
Kubernetes has large scaling potential. A user can choose to manually make the application smaller or larger or to have the cluster manager scale it automatically depending on CPU usage.
Containers are also placed automatically by Kubernetes, depending on some guidelines and restrictions. A few more important features include:
- If a problem is detected with your application, Kubernetes will automatically rollback the change
- This system also updates, or “rolls out changes,” to your application automatically while simultaneously monitoring the application’s health
- Any containers that fail will be restarted; any nodes that die will be replaced and rescheduled, and any containers that don’t respond to your health check will be killed
This cluster manager is one of the systems that comes with the most features, partially because it is open-source and accepts many third-party resources. It is able to function on private, public, multi-cloud, and hybrid cloud environments.
Kubernetes is highly cluster-focused and offers more features than many other services. Before choosing this product, however, make sure that you agree with its opinions, as it’s one of the options with less user choice.
Mesos is a bit different from the other popular cluster managers on this list because its essential function is to provide abstraction.
It allows you to abstract “CPU, memory, storage, and other computing resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.”
In simpler terms, Mesos can function as an open-source cluster manager. Its difference is that its aim is to isolate resources to make it easier to share them across your applications, frameworks, or networks. With this management service, it’s possible to treat your data center as though it is one single pool of resources.
Mesos is also different because of its two-level scheduler architecture, separating the way it handles the lower and upper levels of its infrastructure. This structure is what makes Mesos’ scalability almost in a league of its own.
It is not opinionated at all, which allows users to fulfill almost any need as they deem fit; however, keep in mind that load-balancing and other advanced features will all need to be managed by the administrator.
It does simplify the admin’s job by having one master machine that controls the other virtual or physical machines. The master looks for available resources and distributes tasks to the available agents.
Using Mesos for a cluster management service is a particularly great choice if you need to scale up because it has been tested to easily add up to tens of thousands of nodes and can handle thousands of hosts.
Before choosing this product, though, make sure you will be able to handle all unopinionated and advanced features within your team.
Mesosphere DC/OS is essentially built on top of the Mesos kernel to add more functionality and features. Mesosphere is used by large corporations, such as Time Warner Cable and Verizon, because of the big data and massive scale support from Mesos combined with the added features of Mesosphere.
Mesosphere works as an operating system for every machine in your datacenter and is a secure environment that helps to automate and manage operations. With it, you can run containerized apps and data services in production on any infrastructure more simply.
Essentially, Mesosphere is about freeing your IT team from labor-intensive operations. Mesosphere’s out-of-the-box container orchestration offers “automatic workload recovery, security, networking, service discovery, storage, and more.”
Users are able to automate scripts used to configure and maintain their infrastructure. Additionally, in the case of node failure, there is built-in redundancy and automatic resource allocation.
Its resource manager is powered by Mesos, so the data is abstracted to allow for more efficient resource distribution. Using Mesosphere, you’re able to use DC/OS on bare-metal, virtual, and cloud as long as there is a modern Linux distribution.
Essentially, if you’re comfortable with the way Mesos abstracts resources to better manage and allocate them, but you’d prefer more features and automation, Mesosphere is a good choice.
It’s important to note that if you’re using Docker Swarm, you’ll be locked into Docker. Docker is a popular app for developers to build, ship, and run distributed applications.
If you’re familiar with Docker, Swarm will be simple to understand as it functions using the Docker API. The tools used with Docker can also be used with Swarm.
This manager is a clustering and scheduling tool for Docker containers. In Docker’s terms, “a swarm is a cluster of Docker engines, or nodes, where you deploy services.” It is possible to run both swarm services and standalone containers on the same Docker instances.
With this service, an entire swarm can be built from a single disk image. This is because Docker handles any specialization of the nodes at runtime rather than during deployment.
Swarm is quite opinionated (although less so than Kubernetes) and is limited by the Docker API. Thus, if something is not supported in Docker, it won’t be supported in Swarm.
Docker Swarm has relatively high scalability, automatically adapting to your desired number of tasks. It also has built-in load balancing and rolling updates.
These rolling updates help in two ways. First, if there is a problem, it’s easy to roll back a task to a previous version. Second, Swarm ensures higher availability by preventing service outages by failed rollouts.
If you’re comfortable with the Docker API and are looking for quick deployments and more simplicity, Docker Swarm is a good choice. Yet, remember that it is more limited and doesn’t offer the same level of scalability as many other cluster managers.
Nomad is a newer and less-used cluster manager. It’s important to realize that Nomad only provides cluster management and scheduling, while the other tools mentioned are more full service.
One way this simplification has paid off is that Nomad does not require any external services for coordination or storage. Also, they only have only a single binary for both clients and servers, and a single workflow is provided to deploy applications.
It was created to be distributed, highly available, and simple to operate. Although this manager is lightweight, Nomad “scheduled one million containers on 5,000 hosts in under five minutes;” an impressive feat.
In Nomad, users submit the jobs, then Nomad takes over scheduling, deploying, and upgrading the applications. Applications are able to run on any public or private cloud, and it comes with Multi-Datacenter and Multi-Region support.
Nomad also has the ability to run containerized, virtualized, and standalone applications with its support for task drivers. This means that you can not only run Docker containers, but also virtual machines and more. While it is only focused on cluster management and scheduling, this includes automatically handling machine failure.
Because of its newer and smaller status, this product doesn’t have the tried-and-true guarantee that its more popular competitors do; however, if you are looking for a simple cluster management and scheduling service with large scalability and the ability to express large or complex applications, Nomad could be a good choice for you.
As you can see, cluster managers offer a lot of diverse features and tools that are necessary for DevOps stacks today. It would be impractical to attempt to schedule each cluster, regulate resources in a balanced way, quickly scale up or down, and more, without a cluster management service. In fact, it often times is unattainable.
Each of these services offer their own advantages and disadvantages, which are outlined in their sections. Ultimately, the choice of which cluster manager to utilize is dependent on your particular needs, size, and team.
Monitoring and reporting tools are an essential part of the modern DevOps stack. Monitoring and reporting is all about finding the signal in the data (and sometimes noise) that a tech stack is generating.
What do monitoring and reporting tools do in DevOps?
Now that teams can track everything and gather large amounts of data, the challenge becomes defining the signal that is most relevant to the success of their people and business. Below are some of the common metrics that most of these tools focus on tracking:
Disk space, CPU, Bandwidth, and memory. These things can have implications across your stack.
Most products and services in today’s world work best when they are always on.
Query times, page loads, response times, download speeds. The efficiency of users is often tied to customer experience externally and productivity internally.
Cache, Database, network, app stack. Think of this as the amount of delivery at every layer.
SLAs (Service Level Agreements)
You may have contractual obligations with financial repercussions regarding how your software performs across reliability, security, and availability metrics. Monitoring these metrics with robust tracking is key to auditing the truth.
Key Performance Indicators (KPIs)
The success of your product will often rely on relaying information about its KPIs. These might include things such as the number of concurrent users or the average order value.
Signups, page views, downloads, installs, bounce rates, click rates, registrations.
Permissions, access, cost containment, intrusion detection.
How are your log files behaving over time? Understanding the trends in the noise of your technology is key to identifying anomalies symptomatic of issues.
Across the different domains of system information, there are many dimensions of scale and complexity which no tool definitively masters. There are many tools which do specific things very well and many that, over time, have grown to be quite comprehensive. Here are the monitoring and reporting tools which deserve attention in 2017.
Splunk turns your system’s log files into a searchable set of data. An early tagline referred to Splunk as the “Google” of log files. Advanced systems can often generate mountains of logs and Splunk has an easy to use GUI and basic visualizations to easily mine and monitor log activities in a simple visual way.
One of the most valuable aspects of Splunk are the Splunk queries. While these queries are complicated and not easily connected to dependencies, they allow developers access to production data without having to access production machinery. This makes finding and fixing real time issues easy without the risk of disrupting the production of your product.
Splunk is not going to generate reports that you would want to give to an auditor; but, it can turn a heap of log files into simple data to quickly navigate. If you want to empower your developers to troubleshoot, it does not get better than Splunk.
New Relic has grown to be a full stack application performance management tool. You can easily plug and play New Relic’s tools to measure everything from performance in CPU to front-end web performance, and even back-end response times between interfaces.
New Relic has some of the best visualization and dashboarding in the market which, for non-technical stakeholders, creates a very straightforward monitoring environment. New Relic has announced partnerships with AWS, as well as a new pricing scheme which lets users choose their solution based on the size of the instance on which their app is run.
New Relic isn’t the robust on-premise installation solution that will cover your compliance and security issues, but it has greatly simplified the barriers to getting production monitoring off the ground. If your company is growing quickly and security and compliance are not the top priority, New Relic will be invaluable.
Nagios is an established tool for network monitoring. It has been around for some time and with that comes a massive community of customized plugins which extend its functionality. For example, Pnp4Nagios generates great visualizations.
Nagios runs on a framework of text configuration files which can be difficult to automate; but if you have Linux experts who can handle complex installations, you can get started with Nagios core for free. Nagios is a solid tool and, for the fundamental use case, “Why is a service up/down or running at capacity,” it would be hard to find a better solution for no cost.
The more technical limitations of Nagios is that it cannot illustrate the impact of network latency and dropped packets to the performance of the application. This can leave you a little blind as to how the issues in your technology are truly impacting the end user’s experience. Nagios is a free tool and it goes quite far with its capabilities. It is probably the best starting point for any organization beginning to worry about network performance.
PagerDuty is not a traditional monitoring and reporting solution. It is better described as a very good alerting platform. It will accomplish such things as monitoring customer-facing web issues and connecting the user with your support team. Or it can alert your engineers by SMS when your main server goes down. PagerDuty has a great system for helping organizations integrate its people and processes into the DevOps strategy. On top of this, PagerDuty has a vast amount of integrations that will work with any system that has e-mail capability, Slack, JIRA, or other common tools.
PagerDuty integrates with Splunk and Nagios. If your organization needs people to act on some of the important measures being monitored, PagerDuty does this beautifully.
- Elasticsearch is a search and analytics engine. It has a unique query language which combines speed and sophistication. If you need to sift through instructed time series data of any size, Elasticsearch is incredible.
- Logstash is a data collection pipeline. It is built to efficiently process a growing list of log files. While it works great with many output formats, Elasticsearch is the main contender.
- Kibana is a data visualization platform that allows you to create maps, histograms, and to essentially bring your data to life with custom dashboards.
Together, as a package, these three tools give you an entire end-to-end solution that can generate visualization of “Big” data for free in an easy-to-implement package. Depending on how log management fits into your core business, if you have SLAs or need commercial grade support, ELK might not be the scalable solution you need. If you want a solid way to get all of your logs aggregated into one place, see the process flow, and perform queries and visualize, look no further.
As you can see, there are many monitoring and reporting tools to choose from in 2017. Each have different advantages, disadvantages, ease-of-use and price points.
Infrastructure as a service (IaaS) and cloud platforms are an important part of the DevOps ecosystem. These cloud platforms allow companies to buy memory, data storage, server time and computing power as they need it. This lets companies and teams focus on building their product/service (vs. building infrastructure from scratch). The following section compares and contrasts the major cloud services (both private and open source) in 2017.
OpenStack is an open source cloud computing system developed by NASA and RackSpace Hosting. It is managed by the non-profit OpenStack foundation and more than 500 companies have joined the project. Openstack’s modularity allows for easy plug and play of various components for processing, storage, and networking.
Nova is the primary computing resource controller. Nova is the workhorse of OpenStack and is in charge of provisioning and managing virtual machines. Neutron lets OpenStack manage networks across the cloud. Cinder, similar to Amazon’s EC2, provides storage for compute instances and storage of snapshots so that they can be called upon whenever they are required.
Swift is the OpenStack equivalent to AWS’ S3 service and provides a way to store redundant data in the form of a backup.
AWS is the goliath of the cloud computing world. In 2015, AWS was larger than its next 14 competitors, combined. AWS provides a comprehensive set of tools and services to suit everyone from a single developer to large enterprises.
AWS EC2 is the primary computing service provided by AWS. It provides users with multiple configurations of processing power, memory and storage options to choose from based on workload. Users are charged by the hour based on the resources they have chosen. Snapshots of EC2 instances can be saved for easy retrieval with the help of the Elastic Block storage functionality. Extra storage is handled by the S3, used in conjunction with EC2.
Similar to OpenStack’s Swift API, AWS S3 provides redundant cloud storage to store images and user data. It has high scalability and uses open source interfaces such as BitTorrent for server management. AWS Lambda is an event driven serverless computing platform. Lambda gives developers computing without the hassle of setting up their own servers/services. AWS RDS is a tool used to store and manage relational databases written in multiple languages. It makes it easy to scale resources according to the size of the database and can automate database management tasks.
Google Cloud Platform is comparable to AWS by Amazon and is written in Python, Java, Go and Ruby. Unlike AWS, it makes use of open source interfaces for added functionality.
Google Compute Engine (GCE) is the computing platform that lets users create virtual machines using Google’s Execute Units, where 2.75 EUs is equal to one core of Intel’s Sandy Bridge processor. It uses OAuth for user authentication to launch the VMs. GCE uses a RESTful API to provide access across clusters.
BigQuery is Google’s tool for dataset analysis and uses the RESTful API service to interact with Google Storage. BigQuery is used for creation and deletion of tables encoded in JSON or SQL. Being a Google service, it works well with other services such as Google Spreadsheets and is compatible with all RESTful API-based services.
BigTable is used for database storage and forms the basis of the Google Cloud Datastore in Google Cloud. It is a highly scalable service, with storage up to multiple petabytes. It uses row key, column key, and timestamps to sort and keep track of the data stored in the database. BigTable spreads it across multiple servers, ensuring high scalability.
Azure is Microsoft’s proprietary cloud platform that competes fairly directly with AWS in the cloud computing space. It claims to be the largest cloud provider with support in over 38 regions, more than AWS’ 14 and Google Cloud’s 21. AWS offers multiple availability zones within each region to make up for the lower number of zones. Azure consists of Compute, Data Management and Storage Services.
Cloud Foundry is an open source platform that is available in commercial as well as non-commercial form. CloudFoundry acts as a Platform as a Service over other IaaS. It is used to manage application hosting across various cloud platforms. The architecture supports all languages. It is used to deploy application containers across AWS, Azure, Google Cloud, OpenStack and many others.
An application image is run separately in a container called Warden. These containers run on virtual machines activated by Cloud Foundry’s BOSH cloud management tool.
RackSpace is a cloud management service that helps enterprise customers manage their cloud applications across services. It also provides a complete cloud service under the RackSpace Cloud banner. RackSpace Cloud includes: Cloud Servers is similar to AWS EC2 and provides computing power along with memory for applications to run; Cloud Files is an equivalent to AWS’ S3 and acts as a content delivery network. The maximum allowed file size is 5GB. Files do not feature a native OS, which makes mounting of virtual drives impossible without third party software; Sites allows for hosting web pages by leasing compute cycles, with charges being levied on the number of compute cycles an application uses. It features support for APIs like PHP 5, Perl, Python. NET 2.0+ and Microsoft SQL Server 2.0.
Cloud computing is an intensely competitive industry and this competition provides many attractive services for companies as they build their stacks, services, and products.
Any discussion of DevOps in 2017 is sure to include a lengthy segment on virtualization technologies. Virtualization tools help teams ship products faster by supporting robust testing, rapid delivery, and infrastructure automation. There are more virtualization tools in use now than ever before, and in this section we will take a look at some of the post popular tools(VMware, KVM, Xen, VirtualBox, and Vagrant) and the strengths and weaknesses of each platform.
What do virtualization do in DevOps?
Virtualization, in terms of computing, is simply software that separates computing environments from the actual physical components of the infrastructure. Physical components of a workstation or server can be broken down into different layers such as CPU, network, file system, operating system, and storage. Using virtualization, each of these layers can be ‘virtualized’ or created with software to accomplish these functions without being tied to a single physical machine.
This means that with operating system-level virtualization tools, it is possible to use a single piece of hardware to host several ‘virtual machines’
These virtual machines have their hardware emulated and operating systems configured so that they are self-contained compute instances much like an actual physical server or desktop computer. At a technical level, virtualization is made possible by a hypervisor(also known as a virtual machine monitor, or VMM), which is installed on the ‘host’ server to provide resources to the ‘guest’ virtual machine.
Because of the nature of virtualization technology, these systems allow for efficiencies not offered by a traditional physical infrastructure. A common theme in DevOps virtualization utilization is ease of testing and simulating different test and production environments quickly and efficiently. However, deploying and managing virtualization requires careful evaluation and skillful technical implementation- which makes choosing the right tool all the more important.
One of the oldest virtualization platforms on the market, VMware was founded in 1998. To talk about VMware is to talk about an entire portfolio of virtualization products, with backup management, cloud, desktop, virtualized networking, and server applications, amongst more. VMware’s desktop virtualization systems include VMware Fusion and VMware Player, and server virtualization products VMware vSphere and VMWare Server.
VMware offers the sort of vast documentation and wide range of integrations that come with being first among widely adopted virtualization platforms. Vmware workstation offers many of the desktop-centric integrations that Virtualbox does, and the two programs nearly interchangeably for the same portable development environment use-case. With a particular node toward DevOps, VMware has Chef plugins available for VM and cloud orchestration, and their ‘Software-Defined Data Center’ promises a natively integrated wholly virtualized stack.
One downside to such a comprehensive portfolio of programs, hooks, services, and applications, is that it’s quite easy to fall headfirst into full stack adoption of VMware tools.
While they provide an excellent toolset, over reliance on a singular vendor can introduce frailty to any system if vendor lock-in occurs.
As opposed to a full virtualization platform or suite of virtualization tools, KVM, or Kernel-based Virtual Machine refers only to the specific hypervisor itself. As its name implies, this KVM hypervisor leverages the Linux kernel, and only runs on x86 hardware which have hardware virtualization capabilities.
KVM is a hypervisor without a platform attached to it. Instead, it offers primitive hypervisor support for numerous DevOps tools. These include Vagrant, Cobbler, and Ansible. It is not an ‘out of the box’ virtualization solution.
Xen is another open source hypervisor, this one with a unique microkernel design. Through a process called ‘paravirtualization’, guests run a modified version of their operating system for those hardware architectures without virtualization support. For those with hardware-assisted virtualization, Xen offers native guest OS types, and calls this a hardware virtual machine(HVM).
One of the most common implementations of the Xen hypervisor is the XenServer product. XenServer is an enterprise-class server virtualization platform with excellent performance and a full-featured management interface. Like a physical server, DevOps tool chains typically harness Xen’s power for provisioning and system automation with tools such as Puppet, Chef, or Ansible in conjunction with the XAPI toolchain.
With the performance gain of using Xen however, come a few drawbacks. X86 guest OS’ have a maximum limit of 16GB per guest and high availability is only an option with the enterprise product.
Developed in 2007, and acquired by Oracle in 2010, VirtualBox is a hypervisor for x86 computers which is also free and open source. Designed and tailored for desktop usage, VirtualBox uses software-based virtualization, with hardware virtualization is available on only certain CPU architectures.
VirtualBox stands out for its excellent GUI and ease of use. Users can install multiple machines of various OS types using a wizard, and the guest VM can access host machine facilities such as a shared clipboard, shared folders and even USB devices, if compatible. Additional “guest additions” are available for installation inside certain guest OS types, improving virtual machine performance. Virtual machines created with VirtualBox can be transported in several formats, which can then be shared and installed on other VirtualBox instances. All these features make VirtualBox an incredibly popular virtual machine platform and go-to tool in the DevOps engineer’s toolset.
Limitations of VirtualBox include the overall performance of guest virtual machines, in that they will never near the performance of a hardware based implementation. Additionally, while the nature in which VirtualBox accomplishes its device emulation is impressively easy to setup, the intricacies of this emulation can throw a wrench into troubleshooting virtual machine errors and configurations.
As opposed to the previous virtualization platforms we’ve discussed, Vagrant is not a hypervisor or full virtualization platform. Instead, like the name implies, Vagrant seeks to make virtual machines, like those created with VirtualBox or VMware, easier to migrate from place to place.
Vagrant was created by Hashicorp, which also produces many popular DevOps tools such as Terraform, Packer, and Vault. Vagrant’s primary use is as a utility to configure and create highly portable environments for development. Virtual machines managed by Vagrant are initially configured with a list of details which provide the same easy workflow for creating, moving, running, and destroying virtual machines regardless of their underlying details. Using the .box format, virtual machines managed by Vagrant address the procedural overhead of working with several virtual machines frequently by providing excellent portability. Additionally, the side effect of this portability is that it’s far easier to maintain consistent VM configurations across teams by using a centralized .box repository.
A common criticism of Vagrant is that it is difficult to set up, which has some merit. The ‘VM within a VM’ layers of Vagrant can also be bewildering to novice users. The gamble is that although this initial stage is costly in terms of setup time, that the time savings Vagrant provides will be worth it. This is a far safer gamble with users who are already accustomed to the ins and outs of virtual machine usage.
With the definition of ‘device’ changing in the face of cloud computing, IoT, and mobile platforms, the importance of harnessing the power of virtualization in the testing toolchain will only continue to increase. Consistent and portable development and operational environments are core to reaching the goals of the DevOps movement: that by enabling improved collaboration and stability brought to the table by virtualization tools, DevOps teams can ship better products faster than ever.
In today’s world, configuration management is necessary for any DevOps stack. Whether you have 10 servers or 10,000, this tool greatly assists in managing a number of different machines, assisting with things like provisioning environments and deploying applications.
This section will cover five of the most popular configuration managers, which all fulfill common needs while having their own particular advantages and disadvantages.
What does configuration management do in DevOps?
Configuration management tools assist administrators by maintaining and updating the hardware and software as necessary in all servers. It keeps the information, such as installed software, hardware configurations, and the network addresses, easily accessible.
Having this information readily available helps your team know which upgrades are necessary and when, if the upgrades are compatible with the given operating system, and enables them to deploy applications more simply. For example, the configuration manager will help your team confirm that the system’s state fits with the state that was described by the provisioning scripts you created.
With a manual approach, it could take weeks for your team to deploy critical applications. With a configuration manager, tasks such as these become more automated. This not only helps save time, but it also reduces the chance that your team could make a mistake.
You can increase your ease and speed of provisioning new servers by automating the process with a configuration manager. Because of this, you are also more protected from failures because you have the option to simply and quickly deploy a new server if one fails.
There are a number of benefits that configuration managers offer, and each tool’s specific assets are described below.
Puppet is a leader in configuration management, offering the largest variety of features and is more all-encompassing than most other configuration managers.
Puppet works by installing a master server on your main machine, and then installing client agents on each system you plan to manage. It’s a pretty simple set-up; but, the learning curve can depend on your team’s knowledge.
Every module and configuration uses Puppet-specific languages based on Ruby. This means that if your team already has experience with Ruby, this tool doesn’t take much time to get the hang of; however, if they don’t know Ruby, system administrators who are comfortable working at the command line can pick up this specific language more easily than learning Ruby itself.
Puppet provides the foundation for necessary DevOps practices through managing the infrastructure as code. This means your team can work with tools and processes software, such as version control, continuous integration, and automated testing, when managing your infrastructure.
With infrastructure defined as code, your team can automate testing and deployment across different environments, lowering downtime and mistakes. Automated tests can be created once and run repeatedly, saving time while still being able to find any issues within your infrastructure.
Put simply, Puppet provides a multitude of different tools to allow you to “define the desired state of your infrastructure and what you want it to do.”
Getting rid of these manual steps helps prevent mistakes, provides more reliable and faster testing, allows you to see and control every system more simply, and deploy faster. Because of the large amount of work that Puppet takes on, it can run a bit slower than other options.
Of the popular configuration managers, Chef is also one of the most customizable. There are two prerequisites to using Chef: knowledge of the programming language Ruby, which this tool is written in, and understanding how Git works.
If developers know Ruby well, they can easily customize Chef’s “ingredients,”, such as creating different modules. More than 800 different Chef modules can be used for free.
Chef’s architecture is similar to that of Puppet, as both have a master server and agents installed on the other servers; however, the difference lies in that a workstation must also be installed with Chef to control the master. Agents can then be installed from that workstation. These agents cannot be changed as quickly as those with Puppet because they must be configured to check in with the master regularly.
Chef also helps you turn infrastructure into code in order for it to be more easily tested and changed. The Chef DK provides tools such as Test Kitchen and InSpec to let you easily write infrastructure tests and run them in an isolated environment.
For those who are well-versed in Ruby, Chef offers a lot of modules and configuration recipes. With support for multiple environments, a large database command line interface, and testing mode, Chef is a great choice for those who need a large database and have development-centric infrastructures.
Ansible claims that its ultimate goal is to do nothing more than make your team’s life easier. It requires no node agent installation, is built in the sys-admin and developer-friendly language of Python, and has human-readable automation.
The Ansible architecture is agentless, so functions are performed over SSH. It also accepts sudo credentials to run commands as root on the systems. Because of this agentless architecture, your nodes are more secure and tend to require less maintenance.
Your team can write their own Ansible modules in almost any language; the only requirement is that the output is valid JSON. This manager also offers many pre-built modules.
Ansible is your choice if you want something that is simple and powerful. It is self-documenting and in an easy-to-read automation language. As Ansible puts it, “Automation shouldn’t be more complex than the tasks it’s replacing.”
This tool helps simplify app deployment, workflow orchestration, the app lifecycle, and of course, configuration management. Ansible claims that they are the only automation engine that can automate the entire application lifecycle and continuous delivery pipeline.
Be sure to keep in mind, though, that this configuration manager doesn’t offer the same number of features as Puppet or Chef. Instead, it is much more simplified and focused on automation.
CFEngine is one of the oldest configuration management systems out today. It is written in C and made to have a very small memory footprint, fewer dependencies, and to run faster. In fact, the execution time is less than one second and a model-based configuration change can be deployed across 50,000 servers in 5 minutes.
A good understanding of C is necessary to use this tool; otherwise, it has a rather steep learning curve. CFEngine also uses its own declarative language for configuration information.
To create your desired configurations, CFEngine’s library assists in building your state. First, you model the state, test the changes, then confirm the configuration. You can also set this to have automatic repairs when necessary.
You define your infrastructure’s desired state and configuration, then CFEngine ensures that it complies. CFEngine uses autonomous agents on each node to implement and report back regarding your desired state. Updates are automatically rolled out to every node in your infrastructure.
With this tool’s responsiveness, critical fixes can be repaired quickly and infrastructure changes can be made within five minutes. Their self-healing capabilities built upon Promise Theory uniquely fix undesired drift automatically.
If your team is well-versed in C or you can spare the time and expenses to train them, CFEngine is a great choice. Along with more complexity comes smaller agent footprints, fewer dependencies, increased speed, simple scalability, and greater control.
SaltStack was also developed in Python, so it’s a good choice for system administrators who don’t have as much programming knowledge.
This option isn’t open source, so it’s less customizable. It operates faster than Puppet and Chef although it also utilizes a master server and deployed agents (in SaltStack, called minions) to control and communicate with the other servers in your architecture.
Even though it’s not open source, Salt still gives its users many modules for particular software, operating systems, and cloud services. It manages heterogeneous computing environments and orchestrates well on any cloud service. Additionally, deployment of almost any infrastructure and software stack can be automated.
SaltStack minions are created after a client makes a request of the master server. When the request is accepted, they simply follow the commands from the main system and report back with results.
According to SaltStack’s website, they are the industry’s first intelligent orchestration for the software-defined data center. It offers full-stack application orchestration, continuous code integration and deployment, and automated configuration drift detection and remediation. With these tools, SaltStack is one of the best choices for scalability.
SaltStack is typically only a good choice if you have a complex, large-scale enterprise data center environment. This manager has plenty of documentation to assist with its intelligent orchestration, automation, and security. If you need a large-scale solution and have a team more apt at Python than Ruby, SaltStack is likely the right choice.
Puppet, Chef, and CFEngine are typically more geared towards developers, whereas Ansible and SaltStack fit in a bit better with system administrators.
Split between these two deciding factors, if you have experts in C and prefer a lighter footprint, fewer dependencies, and greater speed, CFEngine should be a great choice for you. Otherwise, you should look into Puppet or Chef.
Puppet is simple to learn, particularly if your team knows Ruby well, and offers the most robust features. It is slightly slower but great for heterogeneous environments. While Chef offers fewer features, it’s more customizable to those who know Ruby. It’s a bit more of a learning curve, but offers more flexibility and works well with large databases.
If your system administrator is looking for simplicity, higher security, and automation, Ansible is a great selection. With its ease of use, this is a popular tool for those who have an inexperienced team. If, however, you need something more complex and highly scalable, your likely choice is SaltStack.
Each of these have their own benefits and downfalls. It’s important to choose the best for your situation, as a configuration manager is imperative to have in any successful DevOps stack today.
Hopefully, this guide helped you understand how a variety of technologies fit together to become a whole called DevOps.
Should you need more help understanding DevOps or any of the various technologies within it, DevelopIntelligence offers a variety of learning solutions, consulting, and expert-led, hands-on DevOps training courses for teams and companies such as yours.