Adam on DevOps

Log retention in ELK stack

Adam Brodziak — Sun, 07 May 2023 15:32:41 GMT

Developers kept complaining that they can't find recent logs in Kibana. It happened before for many reasons (worthy of another post), but this time was different. There was no evident problem with the log structure or FluentD log shipper anymore.

We've noticed that one app went haywire and started sending logs like crazy. Because of that disks on Elasticsearch nodes got full and ES started to reject new logs. when the disk gets full Elasticsearch switches all indexes into read-only mode.

Curator for Logstash

Curator is a solution that allows you to set how many days you want to keep your log indexes in Elasticsearch. That is the most popular configuration.

Keeping the last 2 weeks of logs using Curator was our setup. Normally it worked just fine. With predictable log influx, it can be managed with Curator. You can calculate how many days of logs to keep not to overflow the storage on ES nodes.

The problem is when more events are arriving due to some problem with the app or some kind of DoS attack for example. In such case, fresh logs are the most valuable to detect the attack progress or how an outage is spreading across the system.

The other option is to set to remove (or apply any other supported action) if an index grows to a certain size in gigabytes. That gets closer to an ideal scenario where we maximize disk space utilization. That is if you know your disk size for the whole cluster upfront and want to manage those values across clusters (dev, test, prod).

What I wanted was simple. Keep as much logs as available space allows, but do not drop log events when disks are full (more on that later).

Curator does not have such mode, unfortunately. I've been looking around for alternatives but found nothing.

The solution is Bash script

Fortunately checking for disk usage on nodes is fairly easy in Elasticsearch API. So with a few curl calls and a sprinkle of bash scripting here's the solution to avoid lost data because of full disk.

#!/bin/bash# Newline\tab as only separator, required for for loopIFS=$'\n\t'# Fail on first errorset -euo pipefailELASTIC_URL=${ELASTIC_URL:=localhost:9200}# At 90% usage ES will try to move shards to other nodes. See `disk.watermark.high` in docs.DISK_WATERMARK=88NODES_UTILIZATION=$(curl --fail-with-body -s -X GET "$ELASTIC_URL/_cat/allocation?h=disk.percent&pretty")for DISK_USAGE in $NODES_UTILIZATION; do    if [ "$DISK_USAGE" -gt "$DISK_WATERMARK" ]; then        OLDEST_INDEX="$(curl --fail-with-body -s -X GET "$ELASTIC_URL/_cat/indices/logstash-*?h=index&s=index" | head -n 1)"        curl --fail-with-body -s -X DELETE "$ELASTIC_URL/$OLDEST_INDEX"        exit 0    fidone

As you can see it is pretty straightforward. One caveat: it's using --fail-with-body param added to curl 7.76.0 version, so it might not be available in older Linux distributions. That is just to show the error response from ES server for debugging.

Run script periodically

Logstash indexes are created daily. Actually, the index name follows logstash-YYYY-MM-DD format by default. This is also the assumption in the script above in _cat/indices/logstash-* GET query.

However, to make the script efficient it should be run more often than once a day. The reason is some app could go haywire with logging and fill up storage in the evening. In such cases we have lost data on what happened around that failure.

The solution is simple. Make the script run by cron every hour. It worked for us flawlessly.

Why such disk usage values?

Why delete an index when circa 90% disk usage is reached? It is related to how Elasticsearch behaves where very little storage space is left.

The official Elasticsearch docs are not very clear, so let me briefly explain what happens when usage reaches a given level for default values.

Assuming on any given node disk is being filled:

at 85% - ES will stop allocating shards to that node, see disk.watermark.low setting.
at 90% - ES will try to re-allocate shards to other nodes, see disk.watermark.high setting.
at 95% - ES enforces read-only index block, see disk.watermark.flood_stage setting.

Preventing reaching 90% is the goal here, but even that could not help. Imagine one node disk is over 90%, so ES will try to move shards, but it will fail. Most likely other nodes will be over 85% already, so allocating is blocked. That is for equal disk sizes and shards being spread evenly - something to strive for anyway.

Let's assume Elasticsearch could move a shard from a node that is filling up to another one. Now think of the load that moving a giant slab of data (shards with logs are pretty big) from one ES node to another. Such operation can grind the cluster to a halt. We don't want that.

To be on the safe side we target 85% disk usage then. If that level is reached nothing active is being done, the node is just cordoned (to use Kubernetes lingo). Elasticsearch will not try to shuffle shards around and we have some room to spare before it does.

Is it that simple?

Well yes, but actually no ;) The idea behind it is so brilliantly simple that I was sure somebody has implemented it. However, I have found nothing, not even a post on some obscure blog ;)

On the other hand log aggregation in ELK stack is not an easy job. You have to define log even structure, decide what is indexed and what is not, and create a template for indexes. That is, if you have control over logging clients in apps, if not it gets much worse.

On top of that shard replicas, hot and cold indexes, and archived indexes are probably on your mind too. That's a lot and something that deserves another blog post. Let me know if you're interested.

Make your logs like Pokemons. Gotta Catch 'Em All

IT maturity levels

Adam Brodziak — Mon, 03 Oct 2022 18:09:47 GMT

This post was written for my colleagues that only ever worked at software house. It was supposed to be short and sweet, but giving a kick for self-reflection too.

Why does a company need IT?

Today we're going to look at IT as a whole, a bit more broadly than just software development. As it happens, modern IT is a business unit that is supposed to (at the very least) support the organization in executing its strategy. Software development itself is not a business unit, unless we're talking about a company that sells software. But I'm not about that...

So what else counts as IT:

systems maintenance
support (helpdesk)
proxy (purchasing, licensing).
hardware

There may be more, depending on the size of the company.

IT maturity levels

But let's focus on this: what does IT (and therefore software) give to an organization to fulfill its mission? That depends on the level of development of that IT. From the beginning.

Level 0: No IT

It's 2022 and some businesses work without any IT involvement. Reality check.

Level 1: IT is a cost

A typical situation when, for example, a company needs a server to run a website that someone did. The company pays and that's it. The sad reality.

More investment in IT = more cost.

Level 2: IT cuts costs

Newly purchased invoicing software means that accounting has less to do and more time to drink coffee. Since coffee is expensive, this leads to a reduction in FTEs. The result is a reduction in costs for the company.

IT expenditures only make sense up to the amount of expected savings.

Level 3: IT makes a profit

The implementation of an online e-commerce store has been a success and customers are buying vacuum cleaners like crazy. Salespeople have less work to do because the customer chooses the model and color himself. No one gets fired because salespeople work on commission.

IT expenditures fall down according to the decreasing marginal utility function.

Level 4: IT creates a new market.

Our mobile app makes it so that passenger and driver can figure out where they are and where they are going. Customers no longer want to wait for a questionably fresh cab that is unknowable-where. Sales people stroll on Facebook making viral videos, accountants transfer profits to tax havens.

More expenses = more revenue (to some point, of course).

Well, where's the software development?

I don't know if you've noticed, but it's only at Level 4 that there is software that is owned by the company. For an organization like Uber, software is not only a competitive advantage, but is even essential to the company's existence. In other words: Uber would not be possible without their proprietary, unique system.

The other levels have been simplified. For each of the problems in Levels 1-3, it is possible to find a better or worse existing solution that can be bought. Sure, the cash may be in the millions, but the product is already in place and possibly needs to be implemented. I'm talking about all those SAS, SAP and similar businesses.

What does this mean for a software house?

That's a very good question :) I myself am curious about your opinions at what level software house (SH) operates. Specifically, at what level are the projects you work in? A separate question is at what level does SH want to be? Feel free to comment!

Cloud-Native Platforms

Adam Brodziak — Fri, 19 Aug 2022 15:19:23 GMT

When talking about prominent technology trends it's good to ask ourselves who is going to benefit from that. In the case of Cloud-Native Platforms those are billion-dollar businesses (AWS, GCP, Azure), owned by trillion-dollar organisations (Amazon, Alphabet, Microsoft). Promoting cloud-native meme is in the best interest of their shareholders.

But the cloud-native platform, in practice, is a modern way to build complex distributed systems. Truth to be told, it has more to do with advanced system architecture and practical software engineering, than the cloud. It just so happens those systems are being deployed to the cloud, nowadays. Is that going to happen in the future the same way?

Cloud revolution was an important-break through. The timeline is unfortunate though: cloud offering started circa 2005, while containerization exploded a decade later. In fact it's the containers that offered scalability, portability and cost-efficiency that VM or cloud instances promised, but could not deliver.

Experienced corporations start to realize the cost of cloud and risks related to vendor lock-in. As an effect companies started to revisit their IT strategies regarding cloud. Some decided to invest in their own data centers, using software-based networking and container orchestrators as main building blocks. That is one of the trends.

Other trend started with multi-cloud approach. Currently implementing such solution is complicated, because various cloud providers have incompatible APIs. There's a hope to develop standardized layer on top of those, called sky computing. I'd like to see that happening, but I dare to ask a question: is it in the best interest of the cloud behemoths?

Why bother investing in hardware sitting in some data centre or building a portability layer between cloud offerings? The so called vendor lock-in is not only about could outages that we've experienced last few months. It's also about decisions where cloud providers refuse to host your business due to political reasons. Politics change, so who knows what would be future line of thought?

That brings us to what stands behind cloud-native mnemonic: distributed architecture and solid software engineering. When choosing your next software solution vendor ask yourself a question whether they want to sell specific cloud solution, or rather if they understand how to build resilient distributed system that is independent and gives you freedom.

DevOps skills for Medium, Senior and Architect levels

Adam Brodziak — Sun, 13 Feb 2022 19:21:42 GMT

I've been asked to prepare an outline of DevOps skills for different levels of experience (and salary). To be honest I have no idea how those should be, as DevOps is such a broad area that it's almost impossible. Also DevOps is a little bit different, so the skills described below are biased towards what we do at my company Future Processing. In this list I've focused on technical skills, because for "soft skills" we've got separate matrix applicable for all positions.

The levels in questions could be defined as follows (just to give you some context):

Medium has some experience and is able to deliver simple, well defined tasks. Usually requires mentoring from more experienced colleagues.
Senior is someone who can work on their own, being able to deliver projects rather just than tasks.
Architect can design and deliver complex projects and also advise, train and mentor colleagues and clients.

As you many notice JUnior level is missing, on purpose. I strongly believe there's no Junior DevOps but rather Junior Ops or Junior Dev that gains experience. After some time they can transition to Medium DevOps to earn their stripes. Such process is something that I went myself and see in the wild.

Most of the skills listed for particular level do apply for the level above too. So Architect should know everything Senior does. In some cases I've used the same skill to describe expected change of attitude or understanding when evaluating for promotion.

Next to each skill there's a grade stating how important certain skill is:

(5) means critical, must have on this level.
(3) means important, but can live without if other skills are strong.
(1) means optional, nice to have but not required.

Development

Dictionary for the terms used:

Programming language is any general purpose language like Python, Java or Go.
Software development life cycle (SDLC) is all the steps needed to deliver a software product including: design, development, build, test, deploy.
Git actually means any (distributed) version control system, but it's shorter just to write git ;)

Medium

(3) Knows at least basics of one programming language.
(3) Knows what SDLC is and can explain steps.
(1) Is aware of Agile and Waterfall approach to SDLC.
(3) Can use the basic git commands (commit, pull, push).

Senior

(3) Is fluent in at least one programming language.
(1) Can point out problems in existing software development life cycle.
(1) Is able to judge if project is Agile or Waterfall and explain the consequences.
(3) Knows various branching models in git and how to use them.

Architect

(5) Took active part in writing a complex, enterprise-grade system.
(5) Can read and provide hotfix in more than one scripting languages.
(3) Can propose and design optimizations in software development life cycle.
(1) Can design SDLC in the Agile or Waterfall approach.
(5) Knows git-flow is a lie and trunk-based development is the way to achieve proper CI setup.

CI/CD

Dictionary:

CI/CD stands for Continuous Integration / Continuous Delivery (or Deployment) - an approach to test, build and deliver software.
CI system (Continuous Integration) is a software (or service) that runs test, build and other automation around SDLC. Examples are Jenkins, GitLab, GitHub Actions, AWS CodeBuild.
Package manager is a system to manage dependencies (i.e. libraries, modules) during software development, used to build a software artifact. Examples are Maven or Gradle in Java/JVM, NPM or Yarn for JavaScript or TypeScript.

Medium

(5) Knows the CI/CD terms.
(5) Can setup a simple CI pipeline (few steps) based on existing setup.
(3) Can use at least one package manager.

Senior

(5) Can tell the difference between Continuous Integration and Delivery.
(5) Can design and deliver CI pipeline (with many steps) based on requirements.
(3) Can use many package managers and is fluent in at least one.

Architect

(5) Can differentiate between Continuous Delivery and Deployment and advice which one is better.
(3) Can propose and build CI setup (many pipelines) based on developer team needs.
(3) Can point out common pitfalls in package manager usage and optimize it.

Observability

Log aggregation is process of gathering logs from many nodes (typically in a cluster) into a single place for processing. Examples are ELK stack (Elasticseach, Logstash, Kibana), Grafana Loki, Splunk, AWS CloudWatch.
Metrics visualization is a way to use graphs to display application or system metrics over time. Example tools are Grafana, Kibana.
Alert is a notification of some issue (incident) in the system.
Runbook is a tutorial that describes how to react on an alert to troubleshoot or mitigate an incident.
Trace is a detailed information of what caused software issue (i.e. exception stack trace). Example of distributed tracing software is OpenTelemetry, Jaeger, Grafana Tempo.

Medium

(5) Knows the benefits of log aggregation and can it use it.
(5) Can use metrics visualization to troubleshoot some problems.
(3) Knows how to react to alert using existing runbooks and can update / create runbooks.
(1) Knows what is trace and why it is useful.

Senior

(5) Can build the log aggregation system using self-hosted solution based on provided design.
(5) Can build metrics visualization in some popular tool.
(3) Can tell the difference between log even and metric and is able to advice dev team on that.
(3) Can advice on actions that will prevent alert from firing on known issues.
(1) Uses traces to discover problems with the software to help dev teams fix them.

Architect

(5) Can provide many designs of log aggregation and explain their pros and cons.
(5) Can design and advice on system or application metrics, where tracking those will prevent some issues.
(3) Can advice dev team on alerts that will help them react before issue happens.
(1) Can setup trace aggregation system.

Containers

Definitions:

Container is a way to run application in a cloud-native way. Example container runtimes are Docker and containerd.
Orchestrator is a system that manages containers in a cluster. Examples are Kubernetes, ECS (Elastic Container Service).
VM stands for Virtual Machine, a way to isolate apps before containers came along.

Medium

(5) Understands containers and how do they work. Can run an app in container.
(3) Knows there is an orchestrator and what it does.
(3) Understands the difference between Docker container and image.
(1) Understands te difference between container and VM.

Senior

(5) Knows one container runtime in-depth (i.e. attaching volumes, config options, health-check).
(5) Can setup and manage orchestrator in production.
(5) Understands the difference between Docker container and image.
(1) Understands te difference between container and VM.

Architect

(5) Can advice dev team how containers should be build and run (i.e. 12factor app).
(3) Can design a platform based on orchestrator that is the best suited based on requirements.
(3) Understands that container is just a Linux process with some degree of isolation.

Operations

Dictionary:

Shell script is a Bash or PowerShell script. Used to automate actions.
System signals are a way to manage processes by the kernel. Examples are SIGTERM, SIGKILL, SIGHUP.

Medium

(5) Can write a simple shell script.
(5) Can check state of processes on the system.
(3) Knows system signals and how to use them.

Senior

(5) Can write significantly complex scripts (i.e. deployment pipeline), using conditionals and loops.
(3) Is able to debug process (i.e. logs, strace).
(3) Can advise dev team how to handle system signals in the app.

Architect

(3) Knows when shell script is not enough and proper programming language should be used.
(3) Can advise if distributed tracing solution would help.
(3) Is able to propose a fix in code based on signals not being handled.

Networks and cloud

DNS stands for Domain Name System ahd how the name resolution works.
RBAC stands for Role-Based Access Control. Examples are Kubernetes RBAC or IAM Roles.
Proxy is software that passes network traffic. HTTP reverse proxy is a layer 7 proxy, it can act as a load balancer too. Layer 7 refers to ISO/OSI network model.
IaaS, PaaS and SaaS are Infrastructure / Platform / Software as a Service.

Medium

(5) Knows how DNS works, for the basic name resolution.
(5) Understands the concept of RBAC and can explain rules evaluation.
(3) Knows the concept of proxy (i.e. HTTP reverse proxy) and load balancer.
(3) Knows the difference between IaaS, PaaS and SaaS.

Senior

(5) Knows all the bits that are involved in DNS name resolution (i.e. in Linux).
(5) Can setup RBAC rules according to specification and good practices (i.e. the least possible permissions).
(5) Can setup proxy server using some popular tools (i.e. Nginx).
(3) Knows the difference between layer 4 and layer 7 load balancer (or proxy).
(3) Knows varius classes of cloud offerings (i.e. object storage, load balancer) and how to use them.

Architect

(3) Understands the performance implications of DNS in a large cluster.
(5) Able do design RBAC setup (roles, policies, groups) for the whole system (i.e. cluster).
(3) Can design a cluster using both layer 4 and layer 7 proxies or load balancers where appropriate.
(1) Can leverage PaaS / SaaS offerings (i.e. load balancers) to achieve business goals (i.e. time-to-market, cost reduction).

Infrastructure as Code

Dictionary:

Infrastructure as Code (IaC) is a concept to keep setup and configuration of infrastructure as code in git, so it can be developed just as applications are.
Provisioning is a way to reach desired state of software and configuration on the sever. Example tools are Ansible, Puppet, cloud-init. Those tools are not used in the era of container orchestration, but the concept is still relevant.
Configuration management is about providing config for app (in many ways), but also a system to deliver and distribute the configuration.
12factor is a set of good practices for modern apps https://12factor.net/

Medium

(5) Knows IaC concept and its benefits.
(5) Understands the server should be managed (spin up, provision) automatically, by some tool.
(3) Can use various ways to configure app (i.e. env vars, files).

Senior

(5) Can prevent common pitfalls of IaC setup using tools and processes (i.e. git branches, code review).
(5) Is able to manage server (spin up, provision) automatically, by some tool.
(5) Understands configuration is separate from app deliverable (12factor app).
(3) Can setup configuration delivery using existing tool (i.e. orchestrator or some open-source or SaaS).

Architect

(5) Can advise on GitOps drawbacks and put guardrails into place.
(3) Understands the provisioning tools do not fit the cloud-native era. Can advise on how to migrate out of them.
(3) Can design configuration management system for the cluster.
(1) Can coach dev teams to adhere to config best practices (12factor app).

Others

DORA metrics (DevOps Research & Assessment) are 4 basic metrics of software delivery performance, proven to work using scientific approach.

Medium

(1) Understands DevOps role in SDLC.

Senior

(3) Understands how DevOps practices affect SDLC.

Architect

(1) Can use metrics (i.e. DORA) to lead and track SDLC optimization.

Tell me what do you think

All of the skills and the grades are highly subjective, I realise that. Also as I mentioned they reflect the environment my company is operating, so it (DevOps, skills, grades) might be completely different for you. Even though I'd like to know your opinion and I'm open to feedback what I'm missing, what is wrong or unclear. Feel free to leave a comment or hit me on Twitter :)

Wasted 10 years with Bash

Adam Brodziak — Wed, 06 Oct 2021 15:09:42 GMT

Originally I published this 3 years ago on Medium, but keeping a copy here too.

When you're thinking of shell on any Linux system you probably think Bash (Bourne Again Shell). On Windows too actually, as Git Bash is quite popular (despite it has got an ugly terminal emulator) and WSL (Windows Subsystem for Linux) uses Bash too by default. At least those have quite modern 4.x versions of Bash, while MacOS X ships an outdated 3.2 version and there are tutorials how to upgrade Bash on Mac to 5.0 which was released days ago.

The Bash 5.0 upgrade on Mac looks quite complicated and can yield unexpected errors, so that begs the question: why not use better shell instead?

My history with Bash

My story with Linux started in 2007 and for 10 straight years I've been using Bash by default, as it came with Ubuntu. It's not that bad, as Ubuntu at least ships autocomplete configuration by default, so you can use Tab to complete command or it's params. Just try to type cd /ho to see it in action. I've learned that very qickly and blessed Ubuntu for making it work for me.

Over the years I've learned some useful things, like Ctrl+R to search history or Ctrl+K to clear line. I did not know any better (cnconscious incompetence), becuase for me shell==Bash. Even after finding that there are other shells and relying on Bash quirks in scripts is bad practice I haven't looked for greener pastures. Somehow it did not even occur to me to alter Bash stupid defaults (case sensitive file completion - why?!), besides maybe increasing history buffer (huge productivity boost BTW). There is sensible Bash configuration that you should definately check out!

Fish - The user-friendly command line shell

It was late 2017 when I stubled upon Fish Shell. The tipping point was aptly named The fish shell is awesome post by Julia Evans. Then I've watched few videos and was totally hooked. Fish comes with all bells and wistles out of the box, setup for you. No need to configure anything, install plugins, etc.

As you can see Fish (on the right) has got the best tab-complation compared to ZSH and Bash.

Of course Fish community created plugins for more advanced features - I can recommend the following:

rafaelrinaldi/pure - Pure-fish port of sindresorhus/pure prompt
franciscolourenco/done - Automatically receive notifications when a long process finish
jethrokuan/z - Pure-fish rupa/z-like directory jumping

Notice those plugins are Pure-fish implementations of 3rd party scripts. The reason is: Fish ofers sane scripting (unlike other shells some say), but it's not POSIX compliant. This is probably the biggest drawback (or downside) of using Fish. Over the years I've accumulated oneliners, scripts and habits from Bash (i.e. using && to join commands), but most were POSIX-compliant. Which led me to ZSH...

Note: Fish 3.0.0 released month ago have added && support. It is that important :)

ZSH (Z shell) - designed for interactive use

ZSH has been recommended to me by my collegues at work as better Bash alternative, becuase it is POSIX-compliant. That means my Bash habits still work: && to join commands, way of exporting ENV variables, storing command output in var using backticks, etc.

Since I've been using Fish before I've started to look for plugins that replicate features that Fish provides out of the box. To get some of the Fish coolness install the following:

zsh-autosuggestions - Fish-like fast/unobtrusive autosuggestions for zsh.
zsh-syntax-highlighting - Fish shell-like syntax highlighting for Zsh.
zsh-history-substring-search - This is a clean-room implementation of the Fish shell's history search feature

There's plenty of awesome ZSH plugins, but I tend use only few, not to overload my .zshrc file. The reason: it is on me to make sure everything works well. I'm using Antigen plugin manager that is supposedly solving plugins installation issues (see motivation section), but problems still happen. To be honest I have never tried installing oh-my-zsh directly, because it does not ship Fish-like plugins listed above and configuring custom plugins in OMZ is awful.

Bash as a scripts runtime

OK, so we've established that there are better interactive shells than Bash, but Bash is still useful for scripts. It's becuase of it's ubiqoutness obviously - it's available on (almost) every Linux/Unix system those days. However Bash has it's gotachas as scripting language, so beware. Here are few links to make your life easier with Bash scripts:

http://redsymbol.net/articles/unofficial-bash-strict-mode/
https://zwischenzugs.com/2018/01/06/ten-things-i-wish-id-known-about-bash/
https://github.com/dylanaraps/pure-bash-bible

Partying words

If you're working with shell do yourself a favour and try something more modern than Bash. Do not waste a decade as I did. My suggestions are as follows:

Try Fish as it's awesome out of the box. If you know something even cooler - let me know!
If you like to tinker or need POSIX shell give ZSH a go. No idea which plugin manager to recommend though ;)
Use Bash sensible config if you really need Bash, i.e. on remote host you do not control.

Let me know if I missed anything or there's even better shell that I'm not aware of. Basically rising awareness of Bash alternatives is my point here, so I'm open to learn.

How DevOps tools affect culture

Adam Brodziak — Thu, 08 Jul 2021 15:25:18 GMT

First let me tell you a story.

Imagine you're an account manger trying to help customer solve their problem. It seems to be a bug in the software system. They use a version that is over year old (couple releases ago) with some custom feature and client-specific configuration. So far the attempts to replicate the bug were futile.

The idea is to re-create the system setup in the same way the customer has it, so we could reproduce the problem, debug the system and provide a fix. However the person who installed it no longer works at the company, 100-page operating manual is out of date and there's no record of what exactly has been customized.

Does it sound familiar?

Basically that was the situation our customer was dealing with. People tried to cope with that producing those 100+ pages operating manuals, writing down how they configured the system (if they still remembered what worked after trying 8th time), gathering fact sheets and diagrams that might even reflect how the system is setup.

Documentation is great, but who likes to write it? Not to mention keeping it up to date and verifying that what is written there works?

I'm going to explain how using DevOps tools changed how people changed approach to the work and how did they felt. Then I'm going to unveil how we have done it.

Empowerment

In the previous life our account manger would need to ask people responsible for various components to assemble them in a way that this particular customer setup looked like. We assume he was lucky to actually find out the configuration and feature customization, of course. If someone was on holidays then our poor AM was out of luck and needed to wait.

By using DevOps tools out AM could do all of that himself. All version of software components were stored in artifacts repository. Even if deleted, they could be re-created from source code exactly as they were, due to reproducible code builds on CI server. Configuration is stored alongside code, so the full setup was in source code repository.

Assembling the customer specific setup was as easy as checking out a branch with configuration code for this customer and running release pipeline for this branch on one of testing environments. Scripts would deploy correct component versions and apply customer-specific configuration automatically.

Better yet such configuration branch can be turned into pull request and provided to someone for review. There's no need to check if the code and configuration is valid - automatic tools on CI server already did that. It's more about asking others if those exact changes should have been applied for this customer use case in first place.

At first it was quite overwhelming to use source control and coding tools for people like account managers. However we've learned fast that sometimes it's easier to read YAML configuration than to process 100+ pages manual to find one specific setting. Also asking someone for review was great learning and collaboration experience.

Confidence

Hunting esoteric bug for a customer makes a good story, but it's not an everyday work. Normally we'd have much smaller features, fixes and improvements releases on day-to-day basis. The configuration of the system also evolves at steady pace. Since the scope of those was small the release was no longer a scary process.

Also the fact that it was done on a daily basis contributed to increased confidence. The typical flow of creating pull request with changes, automatic validation by CI, review by peers, finally approve, merge and deploy has been harnessed.

By incorporating review process we not only gain an additional verification before deploy. This is also a learning and collaboration event - both parties are able to contribute the improvements. I saw improvements and simplifications introduced as a result of change review process.

The great benefit of DevOps approach is that if something is wrong you learn that early in the process. No longer need to wait weeks to see if change yields expected result on production setup. Personally I've used to forget why this change have been made before I got feedback whether it worked or not months later. If not the work had to be started over.

Feedback

Fast feedback loop is the essence of DevOps approach.In case of typo in configuration that is going to be caught by automatic validation in minutes. Misapplication of some setting is pointed out during review in few hours. Automatic deploy routine tells you if the change worked the same day.

Change review process saved me many times from breaking the system or doing something stupid, because I've missed something. Code review comments are invaluable way of learning about the system and about your craft too. The necessary bit is to hide your ego and take feedback as it is. Simple, but not easy.

In DevOps flow, you (the owner of the change) is responsible of deploying it. This is how you learn how th system actually works or in which ways it breaks. Observing deployment progress and behaviour of the system (i.e. how metrics and logs do change right after that) becomes second nature after a while.

Observability is the new hot trend in DevOps world.Observability without the observer is just an empty slogan.

Lead time reduced from months to days

Such dramatic shift of delivery pace was the observed benefit of the approach we've taken.

Cultural change was enabled by DevOps practices and tools we've used.

This is what tools we've used and how.

Infrastructure stack

Our rule of thumb was that the whole infrastructure setup has to be automated, no exceptions. It started from virtual machines and networks in AWS cloud that were spin up by Terraform - cloud-agnostic infrastructure as code tool. We've chosen Terraform, because customer infrastructure could be on various providers: in some cases it was cloud, in others it was on-premise dedicated or virtual machines.

Once the basic virtual machine was running Ansible that applied operating system configuration and installed necessary tooling. We kept that layer thin, having only a few small roles in Ansible. This decision improved manipulability and security by having narrow attack surface.

The heavy lifting has been done by Docker Swarm orchestrator. Every application had a dedicated Docker image with all the runtime dependencies and Swarm managed a workload over a fleet of VM nodes.

Why Docker Swarm? Back then it was under heavy development at Docker Inc company. Swarm was much simpler compared to Kubernetes and back then K8s was not that fully-featured yet.However in 2021 I'd discourage using Docker Swarm for a greenfield project.

Infrastructure as Code

Having all the infrastructure as code in one big repository was a big enabler for collaboration and shared responsibility. Gone were the days of "this is my special machine, so don't touch it" approach. Transparency was a key value to fight knowledge silos that would otherwise happen.

Additional benefit was the ability to view and compare all the development and testing environments all at once. It helped our teams to wrap their heads around what feature is being deployed or tested at which stage. Our QA engineers developed their own tools to make it easier.

Event though the infrastructure was pretty big and complex, there was only one dedicated person to manage all of that. Well, that's not entirely true - the whole team was responsible to the software and environment it run on. What I'm trying to say is: due to efficient DevOps tools one high-class specialist was enough to manage it.

As DevOps states there's no distinction on dev and ops, so everyone was encouraged to perform configuration changes, deploys and contribute to infrastructure setup. And this is what happened. I've personally added the ability to verify artifact version before it gets deployed, because I missed that feature in the deploy.sh script.

Executable documentation

People with various skills installed, configured and operated our system. That's why we focused on using tools that are easy to reason about. Majority of that were YAML configuration files for Docker Swarm. Those have exactly the same format as Docker Compose - a tool that makes it very easy to install that on any machine running Docker.

Similar story with deploy.sh Bash script. It basically codes the steps an operator could re-type on the machine themselves. With additional comments that made it an executable documentation that was run every day - so we made sure it works. Gone are the days of re-typing commands from operating manual only to find out they do not work in this version of the system.

Clear separation of the various layers (VM via Terraform, OS via Ansible, apps via Docker Swarm) made it easy for customers to pick and choose how much they wanted to use it. That was extremely important for some closed setups where public cloud was out of question.

The bonus point I wanted to mention was a script to generate release notes from code and source control metadata. That was yet another attribute of sticking to approach that every commit message should refer Jira ticket, so changelog in release notes could be generated from that data. Also installation instructions were copy-paste of the scripts that we've prepared. Great documentation with minimal effort.

Is DevOps approach worth it?

Well, it depends. Such extensive automation pays off for any medium project (with dozens of people involved). For large projects I'd dare to say it is a requirement: otherwise we loose so much time trying to do basic things that are repeated every day (doing manual deploy, reading operational docs, finding where things are, etc). On the other hand we've applied the same principles (without extensive tooling) on a small team (5-7 devs) and got most of the benefits too.

Despite technical and process advantages of DevOps I'd say: do it for the people! This way you gain more engaged team that feels empowered and responsible for the product. In turn that leads to many learning experiences fed by honest feedback and increased confidence. I must admit it's a pleasure working in a team in such environment.

Terraform is terrible

Adam Brodziak — Thu, 27 May 2021 14:56:30 GMT

Here is my experience from running and upgrading a small Terraform project. As you might have guessed it was not great, but I'll try to focus on facts rather than opinions (even though some might sneak in). It will be mainly about the CLI client and it's versioning schema, but also some complaints about state management. I'm big proponent of CI/CD and Infrastructure as Code and I will try to explain how Terraform does not fit the picture.

The project is small, but manages 8 clusters. Contrary to typical case it's a SaaS: Atlas service that offers managed MongoDB on AWS in our case. Every project is using some of 7 modules that represent Atlas resources with necessary AWS bindings (i.e. secrets). When we started 0.12 was the newest version, so upgrade to 0.13 is part of the story.

Since the issues are about Terraform client mostly, the IaaS or SaaS used is not that relevant. However Atlas plugin had to change how internal configuration structure, which only added insult to injury.

State

Let's start with state, which is common pain point while working with Terraform. To some extend I understand the decision of using state, but it is inherently difficult to manage.

I dare to say that remote state has all the disadvantages of cache, but not many advantages. Sure, for a team working on a project remote state is a must. Actually I'd like a solution that would enforce using remote state, but there's none - we have to rely on state config will be copied over from existing project.

State config on AWS using S3 and DynamoDB

State configuration is another weak point. Typically S3 backend is used to store state, that's fine. But if you want to make it safe from many people overwriting each other changes by running terraform apply at the same time you need additional configuration for locks or mutex. You should definitely use that!

https://twitter.com/AdamBrodziak/status/1387764929286004745

In AWS realm DynamoDB is needed for locks. My guess is S3 does not have atomic operation to obtain a lock, so key-value database is needed. What I don't understand is: why I need both? Why not store state in DynamoDB directly? The state contents is just JSON, right? If you happen to work on Terraform let me know, please. For me that's a big overlook.

Anyway, below is sample state config using S3 and DynamoDB - both are needed to make it safe. Feel free to copy :)

terraform {  backend "s3" {    bucket          = "terraform-state-prod"    key             = "mongo-atlas/resources/DEVELOPMENT-resources.state"    region          = "eu-west-1"    dynamodb_table  = "terraform-state-prod"  }}

Remember to use remote state (with locks!) if you're working on a team or using Terraform on CI/CD pipeline. This is lesser of two evils (everyone having its own state!).

Discrepancies of state contents

Using locks avoid one way state can be corrupted: running more than one terraform apply at the same time. Other issues arise from HCL code, state and system discrepancies. Let's look at those.

First issue is the difference between Terraform state and the actual state of the system. Sometimes it's called configuration drift. One case is when someone adds something in the admin console, then it's invisible for Terraform. We had that for IP access list, but since all the entries are independent there was no conflict. The only drawback was that we no longer have single source of truth, so better avoid such practice and either manage things manually or via Terraform code.

Bigger problem is when someone changes something on the server manually, but the change is not reflected in Terraform. In such case it will be visible during terraform plan as change. You have to make a decision if to override it via terraform apply or reconcile to HCL code. In case of the latter there will be playing a detective investigation on who and why made a config change. My recommendation: avoid those cases at all cost and manage everything via Terraform!

Default values are stored in state

Another interesting case is when state has items that are not in the Terraform code. I guess that happens, because Terraform stores in state the evaluated state from the system (for us: Atlas service), with the created resource IDs and values for default parameters. We were quite surprised that plan says bi_connector will be removed. What puzzled me even more: bi_connector was not in the code!

What happened was: bi_connector is optional, so default values were stored in state. Since Atlas plugin was upgraded they've removed bi_connector attribute and replaced it with bi_connector_config to adhere to new HCL parser. That's an example how syntax breaking changes affect you in a surprising way, but more on that later.

Other cases of state corruption

Terraform stores a client version in the state too. It has got interesting implications: if you change the state with newer client (running terraform apply) it will enforce others (i.e. CI worker) to use the same version too. We've mitigated that by wrapping Terraform CLI in script that manages the version, so everyone will use exactly the same.

On top of all that state can get into messed up state when terraform apply process terminates (i.e. by hitting Ctrl+C). To be honest I haven't experienced that myself, just heard that on Terraform training. However I can guess why: there's no transactional support in applying changes by Terraform. I understand why, distributed transactions are hard, but still I don't like it.

Client

My biggest surprise was how bad the Terraform client is. Don't get me wrong: I love that it's a standalone Go binary, so it was easy to manage specific version. The Makefile script downloads Terraform binary, then runs terraform init and terraform plan for specific cluster. On the other hand I don't want to create such wrappers, just use whatever version is compatible. That is not possible due to versioning policy and instability of the tool, but more on that later.

Managing many workspaces

As I mentioned we have 8 clusters, each having resources configuration in its own directory. I needed to use the downloaded binary and run it in the cluster dir, easy. After looking at help there's a terraform apply [options] DIR param, so tried to use that. Unfortunately I've got an error about missing param values, like it would not read the .tfvars files from the same directory.Apparently the DIR at the end does not work as you'd expect, instead Hashicorp added -chdir global param to handle such use case. Param -chdir landed in 0.14 version, but I was upgrading to 0.13, so clever workaround of cd cluster/DEVELOPMENT && ../../terraform apply was necessary.

Since Makefile is excellent at managing dependency graph I thought: lets run terraform init only when necessary. That was quite easy, unless init fails for some reason and leaves local workspace partially initialized. Then only way to manage that in Makefile was to remove the partially initialized workspace and start over. That is how the overly familiar Makefile cleanup target was born, which nukes .terraform dir for every workspace and removes the binary too. I wish I better understood how to local Terraform workspace is created, because I sense there's a room for improvement :)

Running apply in CI/CD pipeline

Terrafrom client has been made for interactive use from the start. That's why it will force you to type yes in to confirm terraform apply changes and provide a prompt for missing parameter values (instead of just exiting with an error). To overcome that you have to resort to CLI params like -auto-approve or -var 'foo=bar' for example. Those are necessary for any CI/CD pipeline, but such UX design looks like an afterthought. The UX with -auto-approve on CI is crumbled in another way: terraform apply -auto-approve does not present changes that are going to be made. Why!? How to verify what kind of changes were made by looking at pipeline console then?

The solution is to run terraform plan just before apply, so we'll have a record of changes to the infra. There's another problem with that: there might be changes in the infra system state between plan and apply actions. Let's imagine a pull request workflow, where plan is run for a branch to verify that code diff has the desired effect on the infra and apply is run after merge to main branch. There could be hours, even days, between change plan and the actual application of them.

One solution to that problem would be to run plan command just before apply again on the main branch (after merge). Ideally there should be an option to prevent apply in case of change plan taking an undesired direction.Second solution could be using plan -out parameter, which saves change plan to file, so apply can pick up the exact change plan that was generated before. To be honest I haven't tried that yet, but I keep wondering why -out is not the default setting and why apply does not require change plan as input. Such design decisions keep me puzzled.

Versioning

Our project has started around August 2020, so Terraform 0.12 was the most recent version. I was happy about that, because 0.12 introduced syntax changes to how params and should be quoted. In reality many of the existing solutions used the old syntax, which still worked in 0.12 version. As a result code became a mix of new and old syntax, which was not a problem until the upgrade to 0.13 started throwing deprecation warnings.Of course I've learned about terraform 0.12upgrade command soon, but it kept throwing syntax errors on projects that had mix of old and new syntax. Our option was either to downgrade everything to 0.11 (which sounds silly) or upgrade syntax manually. I went with the latter, which was quite a lot of getting back and forth, because deprecation warnings do not show you all the occurrences of the problem, but just the first one and "there are 55 more" message. Not useful.

Of course terraform 0.12upgrade is only available in 0.12 version, so even if I wanted 0.13 first had to download older one to try to upgrade code. That was the reason that led me to creating Makefile wrapper for Terraform binary.Why bother with 0.13 upgrade after all? Well, we needed for_each syntax feature which was only in 0.13 version for modules. A side note: is it only me or adding new syntax for some cases in 0.12.6 version feels wired? The for_each feature was required to manage many Atlas user roles in a dynamic attribute.

The biggest problem is that even patch versions (x.y.Z) can introduce new features, so it might not be enough to have any 0.13.x version, but rather bind to the specific 0.13.7 for all clients. In addition client version is stored in state, so using newer by accident can enforce an upgrade for everyone.Managing Terraform versions is really cumbersome, to the extend that tools like terraform-switcher exist. I did not wanted yet another interactive CLI tool for our CI server, that's why I've went with Makefile that can be used on CI and locally. In my opinion Makefile is vastly misunderstood (and hence underused tool), but that's a story for another time.

Since the start of our project (August 2020) Hashicorp released 3 new major versions. The newest one is 0.15.4 and we've started on 0.12.5 back then. It means that over the last 6 months 3 breaking change releases have been made, one every second month! That is rapid change rate for something that should be stable and boring like infrastructure. The other surprising fact is that initial Terraform release was in 2014, so the project is almost 7 years old.

Parting thoughts

So far it was mostly about facts and my experiences around using and upgrading Terraform. Now time for a little opinion and thoughts about the project. My experience with managing Terraform is just last few months, previously I've mostly used Terraform setup by someone else.

Based on the rate of breaking changes in the last 6 months I'm worried about the stability of the product. In my opinion it should have a big BETA badge to warn about that, even despite being 7 years in development. Sure the 0.x.y versioning scheme might indicate that it's not ready for prime time and API breaking changes will happen. I understand that, even SemVer allows breaking changes for minor version bump (0.x.0) if it's not 1.0.0 yet. For me that looks like a lazy policy on the Hashicorp side that after years of development they still have a policy open to breaking changes. I thought even Facebook dropped "move fast and break things" attitude by now...

Even though development seems to be dynamic I see that legacy seems to creep in already. Just look at the DIR command line parameter that is going to be replaced by -chdir which is clearly stated in the commit message. Even for the Terraform documentation it is going to be quite a lot of work. What about all the solutions and workarounds existing the the wild, i.e. on StackOverlow or blog posts? This is going to have similar impact as using old syntax (0.11 and older) in new project, because someone found a solution somewhere. Without clearly communicated versioning policy it will never get in order.

The biggest surprise to me is that Terraform has ADOPT status on TechRadar April 2019 from ThoughtWorks. Maybe the timing plays a role here, since it was way before the major change in syntax in 0.12 version shipped mid-2020? I wonder if Terraform is still state of the art, or there are better Infrastructure as Code solutions recommended by ThoughtWorks or others?

DNS performance issues in Kubernetes cluster

Adam Brodziak — Tue, 27 Apr 2021 15:27:07 GMT

One day we've been noticing a lot of ERROR getaddrinfo EAI_AGAIN log events in our Kubernetes cluster. All NodeJS apps have been having this problem from time to time, because NodeJS runtime does not cache getaddrinfo() function results. Unlike JVM that does cache them, so Java apps were fairly silent.

That gave clear indication problem is on DNS server. Soon after I've notices that 1 out of 3 kube-dns pods is failing, so we were running at 2/3 capacity. Restarting would be enough of a fix, but being "SRE wannabe" I wanted to make sure we improve the situation for the future.

Googling for the problem

Soon I've found post listing potential causes for the issue among others:

NodeJS performance issues with dns.lookup() internal implementation (yeah, but I can't change that).
CPU throttling in K8s (unlikely, but very hard to pin down).
Linux networking race conditions in DNAT, fixed in 5.x kernel (we run older version, so it was probable cause).

DNS cache in app? Not for apps in Kubernetes

Above post gave two solutions. One was to install NPM package in Node app that would cache the DNS entries. Not a solution I'm particularly found of, as I prefer to have such a simple thing as domain name resolution to be available in a cluster. Also taking into consideration that DNS serves as a service discovery mechanism in Kubernetes cluster makes it even more important to keep up-to-date records.

NodeLocal DNSCache in Kubernetes cluster

Better solution was to use NodeLocal DNSCache in Kubernetes cluster. Essentially that runs DNS on every cluster node as a DaemonSet. Definitely the way to go for most cases, because it improves both performance and resilience for very little cost. Unfortunately it requires K8s 1.18 version, which we did not have :(

I don't know how domain name resolution works!

Something about this issue kept bugging me though, I thought I was missing something. Our Kube apps work in microservices fashion, so they communicate with many other services a lot. One of the main page components connects to 13 other services, but that is not unusual. All of those links are full URLs, domains setup to public ELB servers. Still you'd expect that kube-dns caches those names, so resolution is fast. Well yes, but actually no.

Enlightenment came with a post about options ndots setting in /etc/resolv.conf file. In there Marco Pranucci explains how DNS resolving works for non-qualified domain names and how options ndots:5 affects this. I encourage you to read it through (with comments!), but here's the gist and some corrections.

DNS for Kubernetes Pod and Service

Kubernetes creates internal domain names for Pod and Service objects for the purpose of service discovery pattern. On top of that the namespace is added to the domain as well, so you can have data service in the prod namespace. If pod in test namespace tries to connect to data host, DNS will not resolve it, but data.prod would be fine. However that allows adding data service to test namespace, so data would have different IP depending whether DNS query is fired from test or prod namespace.

My guess this dynamic nature and flexibility is the reason why Kubernetes injects the following in the /etc/resolv.conf for every pod:

nameserver 10.32.0.10search <namespace>.svc.cluster.local svc.cluster.local cluster.localoptions ndots:5

Why `ndots:5` affects name resolution performance?

For ndots:5 setting according to docs every domain that less than 5 dots in the name will not be send to DNS servers, but rather items from search config list will be appended to it first. So in most cases 3 local resolutions will be attempted before any query is send to DNS server! More on why it happens in this particular order read an excellent post on glibc getaddinfo() function internals.

Solutions proposed and my comment

First: Switch to Fully Qualified Domain Name (FQDN) for public domains is generally good advice. It will not only make name resolution faster, but also prevent security issue explained in RFC1535 (quite short for a RFC!). Can't see any drawback, even thought it looks like quick and dirty solution.

Second: Customize ndots with dnsConfig setting. That might makes sense for specific pods that are connection to public domains mostly. You'd have to be careful picking ndots value that would speed things up, but do not mess with Kube DNS setup for service discovery. In other words: there might be dragons.

What is the ultimate solution then?

As I've tried to explain domain name resolutions is very nuanced problem, much more awkward than I initially anticipated. Keeping in mind that DNS should be managed on the cluster I'd approach solutions in this particular order:

Setup NodeLocal DNSCache on the cluster.
Use Fully Qualified Domain Name (FQDN) for specific apps.
Set ndots to lower value for specific pods.
Try DNS cache in language runtime (JVM, NodeJS) or in code.

In the case of failure I've described at the beginning bring up all 3 kube-dns pods was enough. We probably still suffer from a lot of local resolutions due to ndots:5 settings. Would be nice to know if switching to FQDN made application faster, but that requires much more granular metrics. Maybe next time ;)

Ad-hoc documentation

Adam Brodziak — Tue, 20 Apr 2021 20:57:46 GMT

The promise: With just little bit more effort you can create an ad-hoc documentation that is searchable and useful.

Writing documentation is tedious process, that's why in Agile we don't write documentation, right? Wrong! The truth is as software developers we're typing quite a lot of docs already, be it code comments, git messages, pull request comments, chats, mails, Jira comments, etc. By doing those in a more thoughtful way we can make it useful for our future selves and for our colleagues.

Work on the message

Essentially it comes down to switching to low-context (vs high-context) communication style. Practical tips below.

Provide more details

When you're in the middle of doing something the context is fresh in your mind, so it makes sense to use shortcuts like:

Restarted and it's working.

That's completely valid message. But what it is about, do you know? Will you know in 6 months from now? How about this example:

Deleted the agent pod, so it got restarted and it's responding to HTTP requests again.

Now that makes sense even to you, dear reader, because it includes some context:

What exactly has been done: Deleted the pod.
Specific subject is mentioned: agent instead of it pronoun.
The effect is described: Restarted, responding.
How it was verified: Responding to HTTP instead of working.

Little effort, great effect.

Summarize often

Chats are usually in-the-moment, high-context conversations like:

Al: kubectl not workingBob: is it in /usr/bin?Al: yeahBob: what is the error?Al: `permission denied: kubectl`Bob: is it executable?Al: dunnoBob: try `chmod +x /usr/bin/kubectl`Al: another errorBob: use the `sudo` Luke!Al: worked <3

This conversation has the information to fix the problem, but is will you be able to find it in 6 months? Even if you find it how much time you're going to spend on tring to read that all and get the gist? Maybe write a quick summary:

After downloading kubectl put it in /usr/bin dir and make it executable using chmod +x /usr/bin/kubectl. Both require root privileges (use sudo). Otherwise you'd see permission denied error.

Some effort and you're becoming the owner of the solution.

Rephrase what other people said

In addition to summary you can also re-phrase some of the points. It servers two purposes:

You make sure both sides understand the same.
You add additional keywords to match future search terms.

This effort will show you as a good communicator.

Make it searchable

Information is useful only if it can be found. This is what made Google a giant. Similar story with Slack, which is an acronym from Searchable Log of All Communication and Knowledge.

Funnel comments into Slack

Slack has pretty good search capabilities and a lot of integrations with services. We've got Jira comment being syndicated to Slack automatically. The same can be done with other source of docs like: code review comments, git messages and other temporal texts.

I guess the notes, mails and documents can be copy&pasted to Slack as well. There might be good to have a reference for original document, but having a snapshot copy in Slack might be useful too.

Code comments and git messages

GitHub has got pretty good search across many projects. GitLab can do search only within single project, unfortunately. BitBucket has limited cross-project search capabilities. Any of that can be syndicated to Slack.

Sourcegraph has semantic code search, because it actually understands the code being indexed. It has some powers of IDE for search. The limitation is it can't search code and git at the same time.

Use task numbers and tags in the message

Quite obvious, but sometimes forgotten, is to put task number (i.e. Jira identifier) in the git commit, code comment, Slack message, etc. The same way regular tags would work, especially if we have some shared vocabulary of such tags, i.e. #versionBump indicating it's just a version increment.

Summary

With little additional effort you can start building institutional knowledge. Such documentation is contextual and temporal, so it made sense at that time and given circumstances. It wont replace permanent documentation (i.e, specifications, reference manuals), but is light and agile addition that is almost free.

Every time when you write a comment stop for a second and think about your future self while writing. You'll thank me later.

Dockerfile good practices for Node and NPM

Adam Brodziak — Wed, 27 Jan 2021 16:24:32 GMT

The goal is to produce minimal image to keep the size low and reduce attack surface. Also we want to make the docker build process fast by removing unnecessary steps and using practices outlined below to leverage internal build cache.

Besides pure Docker I'll present docker-compose tool, which is a tool to start many Docker containers that are required to run the application, i.e. frontend server, backend server, database.

NodeJS and NPM examples

Here I'll be using NodeJS and NPM in examples, but most of those patterns can be applied to other runtimes as well.

Laverage non-root user

Default NodeJS images have node user, but it has to be enabled. The best option is to use it before any NPM dependencies or code are added.

# Copy files as a non-root user. The `node` user is built in the Node image.WORKDIR /usr/src/appRUN chown node:node ./USER node

Node process no longer runs with root privileges. By such simple change you've increased security of the image a lot.

Set NODE_ENV=production by default

This is the most important one, as it affects NPM described below. In short NODE_ENV=production switch middlewares and dependencies to efficient code path and NPM installs only packages in dependencies. Packages in devDependencies and peerDependencies are ignored.

# Defaults to production, docker-compose overrides this to development on build and run.ARG NODE_ENV=productionENV NODE_ENV $NODE_ENV

For local development we can override it's value. Here's an example docker-compose.yml file that builds and runs our Docker image in development mode:

version: '3'services:  myapp:    build:      args:        - NODE_ENV=development      context: ./    environment:      - NODE_ENV=development

To start the application just type docker-compose up and it will build an image on first start and then run the container(s) defined in YAML.

Install NPM dependencies before adding code

The reason is simple: dependencies change way less often than code, so we can leverage build cache. The biggest difference can be seen if you have any C++ modules that require compiling during install.

# Install dependencies first, as they change less often than code.COPY package.json package-lock.json* ./RUN npm ci && npm cache clean --forceCOPY ./src ./src

The npm ci will install only packages from lock file for reproducible builds on CI server. I recommend using it by default. Have a read how it is different than npm install in the official docs.

The magic happens in && which will execute two commands in one run producing one Docker image layer. This layer will be then cached, so subsequent run of the same command (with the same package*.json) will use the cache.

Since build uses Docker image cache the NPM cache is not needed, so we can clean downloaded packages cache. This way resulting image is smaller.

$ docker build .Sending build context to Docker daemonStep 2/5 : COPY package.json package-lock.json* ./ ---> Using cache ---> 6fb28308975dStep 3/5 : RUN npm ci && npm cache clean --force ---> Using cache ---> 0a6bd71d2c2d

While we're at this I recommend adding node_modules line to .dockerignore file in order to avoid adding local version of modules to the resulting image. While npm ci would remove any existing node_modules directory, there's no point to increase the size of image layer.

Use node (not NPM) to start the server

Last, but not least, is to avoid npm start as command to start application in container. Using NPM seems reasonable, because this is how you used to run the application locally. However with Docker and Kubernetes it's a bit more complicated.

The main problem with npm start is that NPM does not pass SIGTERM OS signal to Node process. Because of that Node is not able to do cleanup before exit. Docker and Kubernetes send SIGTERM to container process when they want to stop it.

This can lead to many issues from hanging database connections to open file descriptors. Notice that it's not only your application code that might react to SIGTERM, but it might be the framework or some libraries.

The good practice is to simply call Node directly.

# Execute NodeJS (not NPM script) to handle SIGTERM and SIGINT signals.CMD ["node", "./src/index.js"]

Notice that we've used square brackets to denote exec form of CMD command. If the string would have been used instead the container would start sh -c as main process and OS signals would have been lost again.

Having node as main PID 1 process is also not ideal, but at least SIGTERM and other signals could be handled in application code. You can test it yourself using the simplest NodeJS server code:

const http = require('http');const port = process.env.PORT || 8000;http.createServer(function (req, res) {    res.end(req.url);}).listen(port);console.log(`Server running at http://localhost:${port}/ ...`);// Signal handlingprocess.on('SIGTERM', function() {    console.log('SIGTERM: shutting down...');});

Now try to execute docker container stop against newly created one. The change CMD line to use NPM and see that SIGTERM was not caught.

Such even handler is the place where you want to cleanup all the resources created or opened by the application.

In NestJS for example add app.enableShutdownHooks() call in bootstrap according to Nest docs.

Builder pattern

Let's say your use case is to turn SASS/SCSS into plain CSS using Ruby Compass compiler. It has different stack than the rest of Node app, so we will need separate Docker image. Here's how to use such separate temporary image for compilation step.

Modern Docker versions allow to use multi-stage builds. Essentially it allows to have many FROM clauses in Dockerfile, but only the last one FROM will be used as a base for our image. It means that all the layers of other stages will be discarded, so the resulting image is going to be small.

FROM rubygem/compass AS builderCOPY ./src/public /distWORKDIR /distRUN compass compile# Output: css/app.css

Docker build engine will save resulting files in a temporary image that can be used in COPY expression for our final image:

# Copy compiled CSS styles from builder image.COPY --from=builder /dist/css ./dist/css

Such expression will copy files from /dist folder, in our case css.app.css only. All the other image layers will be discarded for the

The same pattern can be used for any other compilation or transpilation tool, like Babel, Webpack, TypeScript, etc. In fact it makes sense whenever we have to install any development tool that should not be part of production build. The same applies for installing git, C++ compiler, development version of packages (packages with -dev suffix).

For some JavaScript projects you might notice that npm install or npm ci is done twice: in the builder and final image. It could mean that you mix frontend (i.e. React.js) and backend (i.e. Express.js) libraries in single package.json file. My advice is to separate those frontend and backend dependencies, but getting through exact strategies deserve another blog post. Let me know if you're interested.

Putting it all together

Here's an example Dockerfile for easy copy&paste for your project. It covers all the good practices we've discussed earlier.

# Separate builder stage to compile SASS, so we can copy just the resulting CSS files.FROM rubygem/compass AS builderCOPY ./src/public /distWORKDIR /distRUN compass compile# Output: css/app.css# Use NodeJS server for the app.FROM node:12# Copy files as a non-root user. The `node` user is built in the Node image.WORKDIR /usr/src/appRUN chown node:node ./USER node# Defaults to production, docker-compose overrides this to development on build and run.ARG NODE_ENV=productionENV NODE_ENV $NODE_ENV# Install dependencies first, as they change less often than code.COPY package.json package-lock.json* ./RUN npm ci && npm cache clean --forceCOPY ./src ./src# Copy compiled CSS styles from builder image.COPY --from=builder /dist/css ./dist/css# Execute NodeJS (not NPM script) to handle SIGTERM and SIGINT signals.CMD ["node", "./src/index.js"]

The Dockerfile above contains all the essential good practices for JavaScript project (either NodeJS server or some frontend). In case you're interested in more advanced optimizations check out the repository documenting more good defaults for Node on Docker: https://github.com/BretFisher/node-docker-good-defaults

Spread the knowledge about good practices in Dockerfile creation.

Czy Docker ma sens w 2021 roku?

Adam Brodziak — Wed, 06 Jan 2021 13:02:03 GMT

Na pocztku grudnia 2020 gruchna informacja, e Kubernetes 1.20 "deprecates Docker". Pki co oznacza to, e Kubernetes bdzie wywietla ostrzeenie. Waciwie "deprecates Docker" odnosi si do dockershim co dokadniej wyjaniam poniej.

Dopiero w wersji 1.22 wsparcie Docker zostanie usunite, co jest planowane na drug poow 2021 roku. I dlatego wanie uwaam e rok 2021 to pocztek koca Dockera.

Co to Docker i Kubernetes?

Docker pozwala zapakowa nasz aplikacj (np. plik JAR ze skompilowanym kodem Java) wraz ze rodowiskiem uruchomieniowym (np. OpenJRE JVM) w jeden obraz, z ktrego s tworzone kontenery. Waciwie wszystkie zalenoci z systemu operacyjnego s dodane do obrazu Docker. Pozawala to na uycie tego samego obrazu na laptopie programisty, rodowisku testowym i produkcji. W teorii.

Kubernetes jest orkiestratorem, co oznacza e zarzdza wieloma kontenerami i przydziela im zasoby (CPU, RAM, storage) z wielu maszyn w klastrze. Odpowiada te za cykl ycia kontenerw i czenie kilku w jedn cao (jako Pod). Zatem dziaa poziom wyej ni Docker zarzdzajc wieloma kontenerami na wielu maszynach.

Jeli kontener Docker to odpowiednik maszyny wirtualnej kiedy, to Kubernetes w wiecie kontenerw jest odpowiednikiem dostawcw hostingu czy usug chmurowych kiedy.Docker (a waciwie Docker Compose) pozwala nam uruchamia rne procesy i czy je w sie oraz przydziela storage w obrbie jednego komputera.Kubernetes pozwala na to samo w obrbie klastra, zoonego z wielu komputerw.

Kubernetes sprowadzi Dockera do poziomu komponentu ktry zajmuje si uruchamianiem kontenerw. Dziki wprowadzeniu standardu CRI (Container Runtime Interface) te komponenty s wymienialne. Obecnie tylko containerd oraz cri-o s zgodne z CRI. Docker wymaga adaptera dockershim ktrego wanie programici utrzymujcy Kubernetes chc si pozby.

Dlaczego Docker jest wany?

Docker jest kamieniem milowym jeli chodzi o popularyzacj konteneryzacji.Gdy usyszaem o nim pierwszy raz w 2013 w podcacie Coder Radio od zaoycieli dotCloud (pniej Docker Inc) zauwayem potencja.Ledwie rok pniej Docker umoliwi mi uruchomienie skomplikowanego systemu legacy na swoim komputerze - wtedy wiedziaem e nastpi przeom.

Przez kilka ostatnich lat Docker z pobocznego projektu w firmie dotCloud przerodzi si w biznes warty miliardy dolarw. Pomimo dofinansowania w wysokoci 280 mln USD od funduszy venture capital Docker Inc nie radzi sobie dobrze biznesowo i zosta kupiony przez Mirantis. Kwota akwizycji nie zostaa podana do publicznej wiadomoci, co jest ciekawe. Zgaduj e to bya okazja ;)

Gwnym produktem firmy Mirantis jest Kubernetes-as-a-service, gdzie konkuruj z VMWare oraz oczywicie dostawcami chmury. Kubernetes jest dla nich istotny do tego stopnia, e chcieli utrzymywa Docker Swarm tylko przez 2 lata, ale szybko wycofali si z tej deklaracji, zapewne pod naciskiem obecnych klientw. Osobicie znam firm ktra posiada du instalacj Docker Swarm i migracja do innego rozwizania to nieatwa sprawa.

Co to jest Docker Swarm?

Docker Swarm to orkiestrator wbudowany w dystrybucj Dockera. Mona powiedzie e to taki niby Kubernetes, ktrego obsuguje si tak prosto jak zwykego Dockera. Oczywicie dochodzi zarzdzanie node-ami, replikami, sieciami - jednak nadal jest to znacznie uproszczony widok klastra w porwnaniu do Kubernetes.

Zatem mona powiedzie, e Mirantis kupio konkurencj dla swojego flagowego produktu? Tak jakby. Obecnie Docker Swarm wyzby si ju chorb wieku dziecicego (np. bug z przydzielaniem zduplikowanych adresw IP), wic wyglda na stabilny produkt dla maych zespow. Problem w tym e na maych zespoach i maych klastrach zarabia si mao $$$.

Oprcz tego Docker Swarm jest zbyt prosty, po prostu. W naszym zespole jeden czowiek by w stanie stworzy i obsugiwa klaster Docker Swarm. Nie liczc wspomnianych bdw nie ma przy tym wiele pracy. Due aktualizacje przychodz razem z Docker i niezbyt czsto, wic kolejne zmartwienie odpada.

Jaki interes ma Mirantis (waciciel Docker Enterprise)?

Pewnie nasuwa wam si pytanie: skoro Mirantis zarabia na Kubernetes-as-a-service dla duych graczy, a Kubernetes usuwa wsparcie Dockera to jaki tu jest sens? Ano wanie.Z mojej perspektywy wyglda to tak, e firma ktra zarabia na Kubernetes nie ma powodu inwestowa w Docker od kiedy ten przestanie by wspierany przez Kubernetes.

Zastanwmy si jakie opcje ma Mirantis w temacie Dockera? Ja widz kilka kierunkw rozwoju, ale wszystkie kiepsko wr dla Docker:

Postawi na containerd, ale wtedy zdegraduj si jako dostawca komponentu blisko Linux kernel. Trudno bdzie na tym zarobi, szczeglnie jeli obecne kontrakty s na wsparcie w innych ni Linux systemach operacyjnych.
Rozwin Docker Swarm. Problem w tym e Swarm musiaby sta si tak zoony jak Kubernetes - jaka jest wtedy jego przewaga? Pki co Swarm nadaje si do maych projektw, ale to mae pienidze.
Zmieni Docker Engine w najlepsze narzdzie do rozwoju aplikacji dla Kubernetes. Co jak https://skaffold.dev moe? Ale wtedy nazwa Docker jak i dug techniczny Dockera (o tym pniej) bdzie ciy.

Moe sprzedawa Docker jako narzdzie dla programistw?

Ostatnia opcja jest ciekawa i mogaby uratowa Docker takiego jakiego znamy, jako wietne narzdzie dla deweloperw eby szybko postawi skomplikowany system w kontrolowanym, lokalnym rodowisku, ktre jest bardzo zblione do tego produkcyjnego. Niestety, sprzedawanie narzdzi dla programistw to trudny biznes i zazwyczaj mao lukratywny.

Wspomniane wczeniej VMWare nabyo t lekcj wraz z akwizycj Spring Source. W skrcie, firma Spring Source prbowaa sprzedawa Spring Framework programistom J2EE (Java) jako lepszy framework rozwoju aplikacji. To okazao si bardzo trudnym biznesem.

Punkt zwrotny to kiedy Spring zacz by sprzedawany jako platforma zgodna z J2EE dziaom wsparcia i IT. To tu s prawdziwe pienidze w wiecie enterprise software ;)Polecam obejrze co Rod Johnson (do niedawna Spring Source CEO) mwi na ten temat.

Z drugiej strony pjcie w stron budowy narzdzi dla programistw stawiaoby Mirantis jako konkurenta dla Docker Inc, a raczej tego co z tej firmy zostao. Zakadam e umowy podpisane podczas akwizycji zabraniaj firmom wchodzenia na swoje rynki, czyli Docker Inc zostanie przy wsparciu programistw i narzdzi dla nich, a Mirantis bdzie pracowao z klientami klasy enterprise sprzedajc im usugi wdroenia i wsparcia.

Mirantis dba o obecnych klientw enterprise

Kilka dni pniej firma Mirantis wydaa owiadczenie, e bdzie utrzymywa dockershim (adapter Docker do interfejsu CRI) wraz z firm Docker Inc. Jako powd podaj swoich obecnych klientw ktrzy maj bardziej zoone instalacje Kubernetes, ktre s zalene od rzeczy specyficznych dla Docker Engine. Co to zmienia? Sytuacja wyglda bardzo podobnie jak przy Docker Swarm. Mirantis bdzie miao jeszcze wicej dugu technicznego do utrzymania (o czym niej).

Musz podkreli e powysze to tylko moje spekulacje. Nie mam wgldu ani w umowy midzy Docker Inc a Mirantis, ani w ich strategi. Opieram si jedynie na oficjalnych informacjach prasowych i obserwacji rynku. Prba wczucia si w to co moe zrobi dua firma, bazujc na swoim dowiadczeniu, to ciekawe wiczenie umysowe. Pozwala spojrze z dystansu na firmy ktre stoj za technologi uywan przez nas. Polecam.

Jeli kogo zainteresowaa firma Mirantis to wyglda na to e ma biuro w Poznaniu i szuka ludzi do dziaw technicznych i sprzedaowych:https://www.mirantis.com/careers/

Kwestie dugu technicznego w Docker

Waciwie sytuacja rynkowa to wystarczajcy powd eby nie inwestowa wicej w Docker jako narzdzie do rozwizywania problemw biznesowych. Niestety jest jeszcze dug techniczny ktrego Docker nabawi si przez lata, pomimo kilku strategicznych refaktoringw (m. in. wydzielenie runc i containerd) w tym czasie.

Problemy z union file system

Docker ma ju ponad 7 lat historii i to legacy zaczyna ciy.Na pocztku Docker by interfejsem do funkcjonalnoci Linux kernel takich jak namespaces, union file systems (union FS) i control groups (cgroups). Z czasem gotowe rozwizania union file system, jak AUFS, przestay wystarcza. Docker Inc postanowi doda system plikw overlay do kernel. Okazao si e ten system by tak nieudany e bardzo szybko powsta overlay2 i ten wkrtce by oznaczony jako polecany.

Mimo rekomendacji Docker przez lata unikaem overlay i overlay2 jak ognia ze wzgldu na czste frustracje bdami i utrat danych. Oczywicie z czasem bdy zostay naprawione, ale w czasach Ubuntu 14.04 czy 16.04 LTS aktualizacje kernel nie byy tak czste. Rwnie dodanie obsugi brtfs (ktry ma funkcj union FS) nie poprawio sytuacji, bo nadal pamitam awari na jedynej maszynie w klastrze ktra uywaa brtfs jako systemu plikw.

Ostatnio usyszaem stwierdzenie e caa konstrukcja sytemu plikw w kontenerze i uycie union FS to "elegant hack" i powiem szczerze e to bardzo dobre podsumowanie.

Problemy z kontem root i zalenociami

Inn niefortunn decyzj byo uruchomienie demona Docker na uytkowniku root, czyli administratorowi ktry moe wszystko na danej maszynie. To powoduje e atak typu "container breakout" jest duo bardziej grony, ni gdyby demon dziaa z mniejszymi uprawnieniami. Zreszt, samo uycie demona jest te legacy, bo Podman (alternatywa Dockera od Red Hat) nie wymaga adnego demona do uruchamiania kontenerw.

Domylnie proces w kontenerze te jest uruchomiony z uprawnieniami root. Przez to atwiej o atak typu "privilege escalation" i przejcie kontroli nie tylko nad aplikacj, ale nad ca maszyn na ktrej dziaa kontener. Obecnie jest to uznawane za z praktyk i zaleca si tworzenie uytkownika z ograniczonymi uprawnieniami, ale nie wszystkie obrazy uywaj takiej konfiguracji.

Kolejny problem wynika z powyszych i traktowania kontenera jako "lekkiej maszyny wirtualnej". Chodzi mianowicie e kady kontener ma peny userspace danej dystrybucji Linux. Mimo e host serwer dziaa na CentOS to jeden kontener nakada na to wszystkie katalogi z Ubuntu, a inny z Debian. Przez tak konstrukcj znacznie zwiksza si pole ataku na dany kontener. Co gorsze czsto jest to inne pole ataku ni system hosta (CentOS vs Ubuntu).

Innymi sowy: nawet jeli nasz kontener to prosty, statycznie skompilowany microservice napisany w Go to "cignie" za sob cae Ubuntu (na przykad). Rozwizaniem nie jest uycie maych obrazw Alpine Linux, no chyba e uywamy te Alpine na maszynie hosta. Lepszym rozwizaniem s distroless images ktre pozbywaj si wikszoci zbdnych bibliotek z Debiana.

Jak to wyglda na innych systemach operacyjnych?

Cay czas mwiem o Docker na Linux, bo to jest natywny system operacyjny dla Docker. Warto pamita e Docker na pocztku to bya prosta nakadka na mechanizmy udostpniane przez Linux kernel takie jak namespaces, cgroups (control groups), union file systems. Obecnie runc zajmuje si t warstw, ale to integralna cz Docker Engine.

Osobicie nie mam dowiadczenia z Docker na systemach innych ni Linux. Z tego co wiem to dziaanie opiera si, w taki czy inny sposb, na wirtualizacji Linux kernel na tych systemach. We wczesnych latach sam odpalaem TinyCore Linux (zajmuje ledwo kilka MB) na VirtualBox eby przetestowa funkcje ktrych jeszcze nie byo w stabilnej wersji kernel. Lata miny, ale zasada pozostaje taka sama.

Jeli chodzi o Windows 10 to Microsoft sporo inwestuje w "developer experience". Tak naprawd WSL i WSL2 polega na wciganiu niemal caych dystrybucji Linuksa, ktre staj si integraln czci systemu, jak inne aplikacje. To powoduje e Docker powinien dziaa dobrze na Windows, bo Microsoft ma w tym interes eby przycign programistw.

Pamitajmy te o tym e wikszo systemw w chmurze Microsoft Azure dziaa w oparciu o Linux. Zatem ma sens eby takie same narzdzia dziaay w chmurze i na laptopie programisty. Czy to znaczy te, e Microsoft zainwestuje w natywne kontenery na Windows, eby efektywnie dziaay w Azure? Szczerze nie mam pojcia, ale chtnie dowiem si o produkcyjnych uyciu Docker na Windows.

Co do Apple to nie widz eby byli zainteresowani rynkiem programistw, mimo e par lat temu na konferencjach programistycznych MacBook to by powszechny widok. Pki co syszaem e Docker na MacOS ssie, gwnie ze wzgldu na opnienia w synchronizacji plikw. W dobie procesorw M1 w architekturze ARM dochodzi jeszcze problem cross-kompilacji na x86 i ARM. Jestem bardzo ciekaw jak ten temat si rozwinie.

Jakie s alternatywy dla Docker?

Jak ju wspomniaem Docker jest kamieniem milowym, bo spopularyzowa pojcie konteneryzacji. Nie bya jednak an pierwsz technologi (istniay ju jails na FreeBSD czy chroot w Linux) ani jedyn. Konkurentw pojawio si cakiem sporo, cz z tych technologii zostaa ju zapomniana, inne przejte i wdroone jako cz wikszego rozwizania.

Tak naprawd konkurencyjne technologie konteneryzacji to cae ekosystemy zarzdzane przez firmy technologiczne lub fundacje:

rkt z CoreOS (deprecated), przejty przez Red Hat
cri-o od Red Hat, wraz z Podman + buildah to nowa generacja narzdzi
containerd "wycignity" z kodu Docker, zarzdzany przez CNCF (Cloud Native Computing Foundation)

Warto wspomnie o kaniko od Google, ktre pozwala budowa obrazy Dockera bez root (podobnie jak buildah od RedHat). Skoro Kubernetes te wywodzi si z Google, wcale nie zdziwi si jak Google wypuci alternatyw do containerd zgodn z CRI.

Tak naprawd to s osobne ekosystemy.Wida tutaj e duzi gracze jak Red Hat czy Google maj wiele do ugrania z tortu wdraania Kubernetes. Z kolei Mirantis ma tylko Docker.Po co uywa Docker, jak mona wymieni na nowsze lejsze komponenty?

Co to oznacza dla mnie?

IMHO to zaley od roli oraz tego jak gboko siedzimy w Docker jako takim. Chodzi mi tutaj przede wszystkim o wykorzystywanie Dockera do granic moliwoci. Czsto niezgodnie z dobrymi praktykami, bo takie dopiero si tworzyy jak technologia z jednej strony upowszechniaa si, a z drugiej dorastaa.

Przez lata Docker Engine rozrs si i wyewoluowa w moduow architektur. Powstay rne implementacje takich komponentw jak logowanie. Pozwolio to na standaryzacj oraz uproszczenie architektury aplikacji. Wystarczyo e kada aplikacja logowaa na linuksowe strumienie stdout oraz stderr a Docker zajmowa si zbieraniem i lokalnym storage logw. Udostpnia te interfejs w postaci polecenia docker logs do odczytu tych logw.

To bardzo wygodne dla programistw mie jedno narzdzie do przegldania logw z aplikacji napisanych w Java, Node, PHP czy innych jzykach. Z kolei dla ludzi zajmujcych si utrzymaniem systemw (IT Ops) istotne s te inne rzeczy jak: gwarancja czy nie stracimy logw, ich retencja, jak szybko logi zapeni dysk. To zupenie inny zestaw problemw, ktrych Docker nie rozwizuje.

Jedna maszyna kontra klaster

To co wietnie sprawdza si w przypadku jednej maszyny w klastrze ju niekoniecznie. Przykad to docker service logs ktre jest odpowiednikiem przegldarki logw dla Docker Swarm (orkiestratora klastra wbudowanego w Docker). Niestety w tym wypadku czsto widzimy logi nie po kolei, co jest prawdopodobnie spowodowane rnicami czasu midzy poszczeglnymi maszynami w klastrze.

Uycie NTP do pewnego stopnia niweluje problem ronicy czasw, ale nie jest remedium. W przypadku logw i ich kolejnoci lepiej uy centralnego agregatora, ktry moe nada timestamp w momencie odbioru logu. Jednak to rozwizanie to ju zupenie inny kaliber, cho zazwyczaj konieczny w systemie rozproszonym jak klaster.

Reasumujc: wietne jest to e Docker wiele rzeczy upraszcza i standaryzuje. Niestety te uproszczenia sprawdzaj si tylko jeli dziaamy na jednej maszynie (jak logi). Kiedy wchodzimy na poziom klastra sprawy si mocno komplikuj i te same uproszczenia zaczynaj uwiera.

eby wyjani niedopasowanie rozwiza Dockera posu si nomenklatur frameworka Cynefin.Mamy problem systemu rozproszonego, ktry w swojej naturze jest zoony (Complex) i prbujemy aplikowa rozwizania z dla natury skomplikowanych systemw (Complicated).

Innymi sowy: rozwizania wybrane przez Docker ktre sprawdzaj si na jednej maszynie niekoniecznie s dobre jeli dziaamy w kontekcie klastra gdzie mamy wiele maszyn.

Programista

Z perspektywy programisty technicznie zmieni si niewiele. Nadal bdziemy budowa obrazy Dockera, bo s one zgodne ze standardem OCI (Open Container Initiative). To powoduje e kady zgodny CRI bdzie w stanie uruchomi te obrazy, czy to lokalnie czy w klastrze.

Najwaniejsze zmiany moim zdaniem s w sposobie mylenia o kontenerach. Pora odej od analogii kontenera jako "lekkiej maszyny wirtualnej" i wzi pod uwag zaoenia aplikacji Cloud-Native. Najwaniejsze rzeczy to jeden kontener to jeden proces oraz to e zasoby, takie jak pliki, s ulotne.

Pora zacz myle o kontenerze jako instancji aplikacji odpalonej gdzie tam w chmurze. Wzi pod uwag e bdzie wiele kopii tej aplikacji i nigdy nie wiadomo na jakiej maszynie taki kontener wylduje. Konsekwencj jest to e nie mona polega na lokalnych plikach, bo nowa instancja kontenera nie bdzie miaa dostpu do tych ktre zapisaa poprzednia.

Druga kwestia to jeden proces w kontenerze. Zarzdzanie czasem ycia kontenera, load balancing pomidzy instancjami trzeba zostawi orkiestratorowi (jak Kubernetes). Byem wiadkiem problemu przy migracji bazy danych cz kontenerw nie dostaa nowego adresu bazy, bo w kontenerze zamiast bezporednio Node by odpalony PM2 (process manager dla Node) i restart kontenera nie mia podanego efektu.

Jeli docelowym rodowiskiem deploy jest Kubernetes to polecam te zainteresowa si rozwizaniami ktre pozwalaj w wygodny sposb odpala aplikacje na lokalnym klastrze Kubernetes. Mam tu na myli narzdzia jak Skaffold (od Google), Draft (Microsoft), Tilt czy KubeVela.

Docker Compose byby spoko jeli docelowym rodowiskiem jest Docker Swarm, bo uywaj tego samego formatu plikw YAML. Z kolei Kubernetes te niby uywa YAML, ale to zupenie inna bajka. To jest bardzo dynamicznie rozwijajcy si rynek, na ktrym bd poszukiwa czego dla siebie.

SRE / IT Ops

Dla SRE, IT Ops (czy jak nazywa si ludzie ktrzy utrzymuj infrastruktur) sprawa jest bardziej skomplikowana jeli Docker Engine jest uywany w Kubernetes. By moe wystarczy uy containerd jako implementacji CRI (Container Runtime Interface) i po sprawie. W tym wypadku wiele zaley od tego ile zalenoci od Docker Engine "przecieko" do infrastruktury.

Przykadem niech bdzie Docker-in-Docker (DinD) wykorzystane do budowania obrazw na serwerze Continuous Integration (CI). Ju w 2015 roku DinD na CI byo rozpoznane jako za praktyka, ale zanim ta wiedza zdya zosta przyjta, praktyka pokazaa e DinD to byo tzw. quick-win w rodowisku CI w klastrze.

Oczywicie firma Mirantis bardzo chtnie wysucha naszych rozterek wzgldem uzalenienie skomplikowanego systemu CI od Docker-in-Docker czy innego legacy. W kocu zobowizaa si utrzymywa dockershim i wanie na tym zarabia. Jestem tylko ciekaw jak duo kosztuje takie zdjcie problemu z gowy.

Akurat jestem tym osobicie zainteresowany, bo pracuj przy zoonym CI ktry uywa DinD do budowy obrazw Dockera. Zdaj sobie spraw e rok 2021 to czas przygotowania tranzycji, czy to do innego CRI czy do dockershim. Obawiam si e ta druga opcja bdzie oznaczaa zdanie si na ask firmy Mirantis, co moe by ryzykiem na poziomie strategicznym. Zobaczymy.

Konsultant

Dla konsultantw taka tranzycja to wietna wiadomo. Po latach beztroskiego uywania Dockera w zespoach deweloperskich nadchodzi czas porzdkw, nauczania i aplikowania dobrych praktyk. Wszystko po to by uatwi przejcie na bardziej restrykcyjne rodowiska jak Kubernetes.

Ja osobicie po 6 latach uywania Docker (w tym 3 lata Swarm) ucz si nowych runtime, orkiestratorw i zarzdzania klastrem. To s trendy ktre dopiero zaczynaj si pojawia w orbicie zainteresowa korporacji, nie liczc gigantw technologicznych.

Z drugiej strony firmy takie jak Mirantis czy VMWare maj ywotny interes eby wdraa i utrzymywa klastry, pobierajc za to sowit opat. Tak samo wszyscy dostawcy chmury: AWS, Azure, GCP oferujcy hostowany Kubernetes. Do powiedzie e niezaleny dostawca Linode od 2019 roku oferuje Linode Kubernetes Engine (LKE).

Podsumowanie

To czy jest si czym przejmowa, e Kubernetes odchodzi od Docker? I tak, i nie. Wydaje mi si e biznes chmurowy bdzie rs i coraz wicej firm bdzie migrowa do chmury. W takim rodowisku konteneryzacja aplikacji i skalowanie horyzontalne (na wiele maszyn) jest naturalnym kierunkiem. By moe bdziemy musieli y z komplikacjami ktre niesie uywanie klastrw.

Jeli skalowanie horyzontalne jest niezbdne to Kubernetes wiele rzeczy upraszcza, mimo e sam w sobie wydaje si skomplikowany. W istocie jest rozbudowany, bo problem ktry ktry rozwizuje (alokacja zasobw w klastrze) jest zoony w swojej naturze (tzw. essential complexity). W tym wypadku Kubernetes daje nam podstawowe narzdzia i nomenklatur eby poradzi sobie z t zoonoci.

To tylko kwestia czasu kiedy pojawi si rozwizania upraszczajce Kubernetes. Wszyscy dostawcy chmury ju oferuj usug zarzdzanego klastra Kubernetes, co zdejmuje sporo zada operacyjnych z barkw IT Ops. Dziki temu e konfiguracja Kubernetes jest w postaci deklaratywnych plikw YAML moliwe bdzie zbudowanie narzdzi ktre pozwol "wyklika" klaster. Pokusz si o stwierdzenie e Kubernetes YAML bdzie dla klastrw tym czym HTML by dla World Wide Web.