A typical problem of legacy applications is a fragmented landscape of outdated technologies and deployment solutions due to the lack of shared responsibilities and a missing common understanding on how to deploy software. There are multiple package formats (deb, war, rpm ...), multiple target platforms (virtual machine, application server, servlet-container) and multiple deployment approaches (manual install, package manager, declarative/imperative) and all have custom configurations and requirements on the systems that are hard to comprehend and often times poorly documented. Our general recommendations to tackle these problems are the use of Containers, Kubernetes, and Continuous Deployment.

As a result, the legacy software is prone to security risks as updates are difficult to make. Each setup is so individual and specialized that it is hard to reproduce the state and test a viable build before doing any updates on the live systems. Additionally, improvements on the deployment processes are not sustainable because they are not transferrable and, overall, no synergies between different deployment systems can be utilized. Fluctuation in the staff can easily lead to loss of knowledge because not every research software engineer can be expert in every deployment method that is in use.

To tackle these problems, three general strategies have emerged in the recent years: - to 1. Rehosting means moving applications to the cloud as-is. - to 2. Replatforming is the modification of an application to better support the cloud environment. - to 3. Refactoring, in terms of the migration to the cloud, means re-architecting applications to become cloud-native.

As you can see, there is a huge migration time and cost increase from Rehosting to Refactoring. We are going to consider “Replatforming” as the strategy for migration because this will most likely be the strategy you and we are applying most frequently.

In order to start the transformation towards orchestrated container deployments, the application needs to be properly prepared. You have to untangle multi-process applications because containers should always only run a single process because a container runtime will always only watch the first process started in a container – this is the process with the PID=1. The application needs to provide a simple configuration interface that is well-documented, for example by variables or config files. Log to stdout because this is where your container runtime will collect it. Keep your applications stateless if you can, and avoid writing to the filesystem. If you have to write to the filesystem, think about what kind of data has to be persisted. The overall goal is to make the software more portable and independent from a specific host system.

With the application prepared, it can now be wrapped in a container image. This means, the manual preparation of the runtime environment for the application is dropped in favor of declarative description of all used environment variables and dependencies in, for example, a Dockerfile. After the build, the container image can already be used to deploy the application in a way that is more standardized, reproducible, and easier to maintain than before.

The next step in the transformation of the deployment is orchestrating the containerized applications with Kubernetes. Why Kubernetes? Because it is the industry standard. Kubernetes makes it easy to efficiently use and distribute the available computing ressources and scale your applications as needed. It provides mechanisms to self-heal currupted container deployments. It has very basic but built-in capabilities for secret and configuration management. It works independent of your application runtime environments and uses the host kernel, network and filesystem. All Kubernetes resources can be created declaratively and report their status declaratively. This means that the desired and the actual state of a deployment can be easily compared, and, actually, Kubernetes itself provides some means to move the current state of a resource to its desired state. For example, if it finds a container resource that is in a "crashed" state, it will try to move it into a "running" state again.

Now, the question arises on how to manage the resources that comprise our deployments. Our suggested option is to utilize Helm Charts with gitOps to achieve a high level of self-documentation and to have a single source of truth for the states of the applications in the cluster. Helm Charts describe a collection of Kubernetes resources in the form of .yaml files and support templating. These Helmcharts can be referenced in a control repository that is monitored by a deployment operator: ArgoCD.

In the control repository you can describe your application’s deployment and configuration as Kubernetes resource files. The ArgoCD operator compares the desired state described in the control repository to the current state in the Kubernetes cluster. It monitors changes to the control repository and applies them to the current state. It also detects divergences of the current state – for example, in case of manual deletion of a resource from the cluster – and automatically attempts to restore the desired state.

With all this at hand, there are tons of possibilities to improve your cluster and make your life as a devops-developer easier. Vault provides a secure way to inject secrets into Kubernetes resources and container deployments. CertManager takes care of the issuing and timely renewal of certificates. FLOOD (FluentBit, FluentD, Opensearch, OpensearchDashboards) logging stack provides a centralized way of aggregating and inspecting log messages (blog post on lab.sub upcoming) Sentry is a tool for aggregating and inspecting errors and exceptions with SDKs for the most widely used frameworks and languages.

So how can you, as a team of research software engineers start achieving all this? You go step-by-step: Implement the configurability requirements mentioned in "Preparations". Makes it easier to run your application in different deployment environments, e.g. local, dev, production. Containerize your application and deploy the container on your current platform. It enables your application to run on another platform. Wrap your container in a helm chart and deploy it in a Kubernetes cluster. Benefit from the advantages of the cloud. Manage the deployment of your helmcharts with a CD-Tool (ArgoCD) => relax. Work out common policies for new applications from your experience in practice.

Uncharted Waters Ahead. Moving Legacy Software Infrastructure to Kubernetes

Typical problems of legacy hosting I

Typical problems of legacy hosting II

General strategies to tackle these problems I

General strategies to tackle these problems II

How to get there I: Preparation

How to get there II: Containerization

How to get there III: Orchestration

How to get there IV: Continuous Deployment

ArgoCD gitOps

You've got there

Start the migration yourself

Essential Further readings

Thank you