Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes image REUSE status

Adrian Sturm · Stefan Hynek · Michelle Weidling

Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

Typical problems of legacy hosting I

  • fragmented landscape of outdated technologies and deployment solutions
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

Typical problems of legacy hosting II

  • security risks
  • technical debt
  • time consuming maintenance and onboarding
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

General strategies to tackle these problems I

  1. Rehosting (“Lift and Shift”)
  2. Replatforming (“Lift, Tinker and Shift”)
  3. Refactoring
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

General strategies to tackle these problems II

(based on a diagram by Red Hat)

Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

How to get there I: Preparation

  • untangle multi-process applications
  • make applications configurable by environment variables or config-files
  • log to stdout
  • avoid writing to the filesystem or at least be aware of where you write
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

How to get there II: Containerization

container image declaration and build

  • packages software with all dependencies and environmental requirements
  • provides a layer of abstraction for unified handling of heterogeneous applications
  • reproducible builds and deployments may now be possible (if desirable)
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

How to get there III: Orchestration

Kubernetes

  • de-facto industry standard
  • optimal resource usage
  • self-healing
  • secret and configuration management
  • independent of application runtime environments
  • declarative configuration and state of deployments
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

How to get there IV: Continuous Deployment

  • Kubernetes deployments comprise of declarative resources in .yaml-files
  • all Kubernetes-resources for a single application can be packaged as a Helm Chart
  • Helm Charts can then be used by continuous deployment tools like ArgoCD
  • ArgoCD synchronizes the desired and the current state of the application
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

ArgoCD gitOps


© CNCF

Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

You've got there

... What now?

  • make use of the possibilities
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

Start the migration yourself

  • never again develop for an outdated platform
  • Step-by-step (how to get there)
    • prepare
    • containerize
    • package
    • deploy continuously
  • define development policies
Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

Essential Further readings

Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

Thank you

A typical problem of legacy applications is a fragmented landscape of outdated technologies and deployment solutions due to the lack of shared responsibilities and a missing common understanding on how to deploy software. There are multiple package formats (deb, war, rpm ...), multiple target platforms (virtual machine, application server, servlet-container) and multiple deployment approaches (manual install, package manager, declarative/imperative) and all have custom configurations and requirements on the systems that are hard to comprehend and often times poorly documented. Our general recommendations to tackle these problems are the use of Containers, Kubernetes, and Continuous Deployment.

As a result, the legacy software is prone to security risks as updates are difficult to make. Each setup is so individual and specialized that it is hard to reproduce the state and test a viable build before doing any updates on the live systems. Additionally, improvements on the deployment processes are not sustainable because they are not transferrable and, overall, no synergies between different deployment systems can be utilized. Fluctuation in the staff can easily lead to loss of knowledge because not every research software engineer can be expert in every deployment method that is in use.

To tackle these problems, three general strategies have emerged in the recent years: - to 1. Rehosting means moving applications to the cloud as-is. - to 2. Replatforming is the modification of an application to better support the cloud environment. - to 3. Refactoring, in terms of the migration to the cloud, means re-architecting applications to become cloud-native.

As you can see, there is a huge migration time and cost increase from Rehosting to Refactoring. We are going to consider “Replatforming” as the strategy for migration because this will most likely be the strategy you and we are applying most frequently.

In order to start the transformation towards orchestrated container deployments, the application needs to be properly prepared. You have to untangle multi-process applications because containers should always only run a single process because a container runtime will always only watch the first process started in a container – this is the process with the PID=1. The application needs to provide a simple configuration interface that is well-documented, for example by variables or config files. Log to stdout because this is where your container runtime will collect it. Keep your applications stateless if you can, and avoid writing to the filesystem. If you have to write to the filesystem, think about what kind of data has to be persisted. The overall goal is to make the software more portable and independent from a specific host system.

With the application prepared, it can now be wrapped in a container image. This means, the manual preparation of the runtime environment for the application is dropped in favor of declarative description of all used environment variables and dependencies in, for example, a Dockerfile. After the build, the container image can already be used to deploy the application in a way that is more standardized, reproducible, and easier to maintain than before.

The next step in the transformation of the deployment is orchestrating the containerized applications with Kubernetes. Why Kubernetes? Because it is the industry standard. Kubernetes makes it easy to efficiently use and distribute the available computing ressources and scale your applications as needed. It provides mechanisms to self-heal currupted container deployments. It has very basic but built-in capabilities for secret and configuration management. It works independent of your application runtime environments and uses the host kernel, network and filesystem. All Kubernetes resources can be created declaratively and report their status declaratively. This means that the desired and the actual state of a deployment can be easily compared, and, actually, Kubernetes itself provides some means to move the current state of a resource to its desired state. For example, if it finds a container resource that is in a "crashed" state, it will try to move it into a "running" state again.

Now, the question arises on how to manage the resources that comprise our deployments. Our suggested option is to utilize Helm Charts with gitOps to achieve a high level of self-documentation and to have a single source of truth for the states of the applications in the cluster. Helm Charts describe a collection of Kubernetes resources in the form of .yaml files and support templating. These Helmcharts can be referenced in a control repository that is monitored by a deployment operator: ArgoCD.

In the control repository you can describe your application’s deployment and configuration as Kubernetes resource files. The ArgoCD operator compares the desired state described in the control repository to the current state in the Kubernetes cluster. It monitors changes to the control repository and applies them to the current state. It also detects divergences of the current state – for example, in case of manual deletion of a resource from the cluster – and automatically attempts to restore the desired state.

With all this at hand, there are tons of possibilities to improve your cluster and make your life as a devops-developer easier. Vault provides a secure way to inject secrets into Kubernetes resources and container deployments. CertManager takes care of the issuing and timely renewal of certificates. FLOOD (FluentBit, FluentD, Opensearch, OpensearchDashboards) logging stack provides a centralized way of aggregating and inspecting log messages (blog post on lab.sub upcoming) Sentry is a tool for aggregating and inspecting errors and exceptions with SDKs for the most widely used frameworks and languages.

So how can you, as a team of research software engineers start achieving all this? You go step-by-step: Implement the configurability requirements mentioned in "Preparations". Makes it easier to run your application in different deployment environments, e.g. local, dev, production. Containerize your application and deploy the container on your current platform. It enables your application to run on another platform. Wrap your container in a helm chart and deploy it in a Kubernetes cluster. Benefit from the advantages of the cloud. Manage the deployment of your helmcharts with a CD-Tool (ArgoCD) => relax. Work out common policies for new applications from your experience in practice.