Only registred users can make comments

Are You Compromising Your Python Applications Due To Poor Container Images

A common misconception is that containers are lightweight virtual machines, leading many to use traditional Linux distributions as base images.

This post aims to sort out this misconception and provide best practices for creating Python container images, focusing on security, optimization, and troubleshooting.

Understanding Containers

Unlike virtual machines, containers do not need an entire operating system to run—they already operate on top of an OS. Therefore, using traditional Linux distributions like Debian, Ubuntu, or CentOS as base images results in larger, less secure images with unnecessary components.

The Power of Multi-stage Builds

Multi-stage builds are a powerful feature of Docker that help reduce the size of your images by separating the building stage from the final image. This results in a significantly smaller final image.

Let's illustrate this with a real Python application using Flask, a popular web framework. Our application will be a simple "Hello, World!" web server.

Here's the application code (app.py):

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

And here's the requirements.txt file:

pip freeze > requirements.txt

content:

blinker==1.6.2
click==8.1.5
Flask==2.3.2
itsdangerous==2.1.2
Jinja2==3.1.2
MarkupSafe==2.1.3
Werkzeug==2.3.6

Now, let's create a Dockerfile for our application using multi-stage builds:

# Stage 1: Build
FROM python:3.9 AS builder
WORKDIR /app
COPY app.py requirements.txt ./
RUN pip install --user -r requirements.txt

# Stage 2: Run
FROM python:3.9-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app .
ENV PATH=/root/.local:$PATH
CMD ["python", "./app.py"]

In the first stage, we use a full Python image to install our dependencies.

In the second stage, we use a slim Python slim image and copy only the necessary files from the builder stage.

Scratch Images Are Not Suitable for Python

You may heard about Scratch Images https://hub.docker.com/_/scratch

It's an empty image, especially useful for building base images or for compiling static binaries. It doesn't contain any folders or files. In other words, it's completely devoid of a file system.

The perfect use case for 'scratch' images is when you're working with statically compiled languages like Go or Rust. These languages don't rely on an external runtime or dependencies to execute, and they compile down to a single binary. This means you can compile your application, copy the resulting binary into a 'scratch' image, and have a Docker image that contains nothing but your application.

Python, unlike languages like Go, does not compile down to a single binary. Python is an interpreted language, and it requires an interpreter to run Python scripts. The Python interpreter relies on many shared libraries, which are dynamically linked at runtime. These libraries are not included in the Python application itself, but are expected to be present in the system where the application runs.

Python implications with Scratch image

When you build a Python application, you're not creating a standalone executable. You're creating a .py file (or .pyc/.pyo if it's byte-compiled) that needs the Python interpreter to run. This interpreter, in turn, needs the standard library and many other dependencies that are part of a typical Python installation.

If you try to use a 'scratch' image for a Python application, the application won't run because the 'scratch' image doesn't contain a Python interpreter or any other dependencies required by Python. You would have to manually add not just your Python script, but also a Python runtime and all necessary dependencies to the 'scratch' image. This process is complex and error-prone, and it negates the simplicity and convenience that 'scratch' images are supposed to provide.

Therefore, for Python and other interpreted languages, it's more practical to use base images that already include a Python runtime and common dependencies, such as the official Python images provided by Docker, or minimal images like Alpine if size is a concern. These images provide a balance between convenience and efficiency, allowing you to leverage the power of Docker without getting bogged down in the details of Python runtime management.

Security and Optimization for Python: Alpine as viable option 

As explained the best option for base images is to use 'scratch', which is almost empty. Images based on 'scratch' contain only what is needed to run the application, saving space, memory, and CPU, and minimizing security vulnerabilities.

However, for Python and other languages that cannot be compiled to binaries, using 'scratch' may not be feasible. In such cases, Alpine or Wolfie are excellent alternatives. These images have a minimal footprint and are a good choice for base images.

Let's modify our Dockerfile to use the Alpine-based Python image:

# Stage 1: Build
FROM python:3.9-alpine AS builder
WORKDIR /app
COPY app.py requirements.txt ./
RUN pip install --user -r requirements.txt

# Stage 2: Run
FROM python:3.9-alpine
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app .
ENV PATH=/root/.local:$PATH
CMD ["python", "./app.py"]

We have now defined a multi-staged Dockerfile. There are also Python optimized slimed images.

Python Slim images are a variant of Docker images that contain a minimalistic and lightweight installation of the Python programming language. These images are designed to be more efficient in terms of size and resource. We will use one of those images aswe go further in this discussion. 

Why Multi-Stage with Slim or Apline images?

Even when using slim or Alpine images, multi-stage builds offer additional benefits.

They allow for the separation of build and run environments, reducing the final image size by excluding build-time dependencies.

This not only results in quicker downloads and less storage space usage, but also improves security by minimizing the attack surface.

Furthermore, multi-stage builds simplify the Dockerfile and the build process, making it easier to understand and manage, especially for larger applications with complex build processes. Thus, multi-stage builds provide an extra layer of optimization, enhancing the advantages of slim or Alpine images.

Comparisions

In this section, we will compare our two images in terms of size and security.

While this testing is not exhaustive, it aims to illustrate and validate some of the points we have discussed thus far. Keep in mind there are different versions of slim images based on different Linux distributions such as Debian and Alpine. 
The following comparisons are not aimed to shoot down alternatives, but to show how you can make your own choice when building your application images. 

Image sizes comparison 

In this short section, we will build two final app images based on Python3.9 + Python Slim and Alpine image. We'll use the same multi-stage build approach for both.

The first image will be based on the Dockerfile using Python images.

As the builder imafge, we will use python3.9 image and as a Runtime image, we will use the slim version. That is the beauty of multi-stage builds.

docker build -t my-flask-app-python-slim:1.0 .

Dockerfile:

# Stage 1: Build
FROM python:3.9 AS builder
WORKDIR /app
COPY app.py requirements.txt ./
RUN pip install --user -r requirements.txt

# Stage 2: Run
FROM python:3.9-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app .
ENV PATH=/root/.local:$PATH
CMD ["python", "./app.py"]

We will do the same with the Alpine images:

docker build -f Dockerfile_alpine -t my-app-alpine:1.0 .


Dockerfile:

# Stage 1: Build
FROM python:3.9-alpine AS builder
WORKDIR /app
COPY app.py requirements.txt ./
RUN pip install --user -r requirements.txt

# Stage 2: Run
FROM python:3.9-alpine
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app .
ENV PATH=/root/.local:$PATH
CMD ["python", "./app.py"]

In this section we will compare the image sizes with the following command:

docker image inspect --format='{{.Size}} {{len .RootFS.Layers}}' flask-app-alpine:1.0:

docker image inspect --format='{{.Size}} {{len .RootFS.Layers}}' flask-app-python:1.0:

Outputs:

flask-app-alpine:1.0:

Size: 57,238,129 bytes
Number of Layers: 8


flask-app-python:1.0:

Size: 172,344,679 bytes
Number of Layers: 8

After comparing the two it's clear that flask alpine:1.0 is smaller, in size than flask app python:1.0. This means it occupies disk space and can be downloaded faster. However both images have a number of layers suggesting a structure, in terms of layers.

Vulnerability Scans

In the following test, we'll be using an image scanner called Trivy from Aqua Security.

I'm not sponsored by Aqua security, the actual scanner is an open source scanner, but I do have Aqua T-shirt 😊

You can find the scanner here: https://github.com/aquasecurity/trivy

Alpine image result:

trivy image flask-app-alpine:1.0

2023-07-18T11:17:57.220+0200	INFO	Vulnerability scanning is enabled
2023-07-18T11:17:57.220+0200	INFO	Secret scanning is enabled
2023-07-18T11:17:57.220+0200	INFO	If your scanning is slow, please try '--scanners vuln' to disable secret scanning
2023-07-18T11:17:57.220+0200	INFO	Please see also https://aquasecurity.github.io/trivy/v0.43/docs/scanner/secret/#recommendation for faster secret detection
2023-07-18T11:17:57.242+0200	INFO	Detected OS: alpine
2023-07-18T11:17:57.242+0200	INFO	Detecting Alpine vulnerabilities...
2023-07-18T11:17:57.245+0200	INFO	Number of language-specific files: 1
2023-07-18T11:17:57.245+0200	INFO	Detecting python-pkg vulnerabilities...

flask-app-alpine:1.0 (alpine 3.18.2)

Total: 2 (UNKNOWN: 0, LOW: 2, MEDIUM: 0, HIGH: 0, CRITICAL: 0)
...

Python based image result:

In this part we'll scan the Python based image that we've produced earlier:

trivy image flask-app-python:1.0

2023-07-18T11:20:01.511+0200	INFO	Vulnerability scanning is enabled
2023-07-18T11:20:01.511+0200	INFO	Secret scanning is enabled
2023-07-18T11:20:01.511+0200	INFO	If your scanning is slow, please try '--scanners vuln' to disable secret scanning
2023-07-18T11:20:01.511+0200	INFO	Please see also https://aquasecurity.github.io/trivy/v0.43/docs/scanner/secret/#recommendation for faster secret detection
2023-07-18T11:20:03.651+0200	INFO	Detected OS: debian
2023-07-18T11:20:03.652+0200	INFO	Detecting Debian vulnerabilities...
2023-07-18T11:20:03.668+0200	INFO	Number of language-specific files: 1
2023-07-18T11:20:03.668+0200	INFO	Detecting python-pkg vulnerabilities...

flask-app-python:1.0 (debian 12.0)

Total: 195 (UNKNOWN: 1, LOW: 146, MEDIUM: 21, HIGH: 25, CRITICAL: 2)
...

Without going into the details of the findings, we can easily state that the Python based image has many more vulnerabilities.

Potential Drawbacks of Using Alpine with Python

While Alpine images are beneficial due to their small size and security features, there are potential drawbacks when using them with Python.

Alpine uses musl libc instead of glibc, which can lead to compatibility issues with some Python packages that have C extensions.

The absence of precompiled binaries for all Python packages can increase build times.

Alpine's minimal design means it lacks many standard debugging and troubleshooting tools, and installing these can negate some of the size benefits. Additionally, if you're accustomed to Debian-based or Red Hat-based Linux distributions, there can be a learning curve when switching to Alpine.

Therefore, it's crucial to thoroughly test your applications when changing base images and choose the one that best suits your needs.

Troubleshooting with Ephemeral Containers in Kubernetes 

To troubleshoot problems, in a running application Kubernetes offers a feature called containers. These temporary containers run alongside others within a pod. Have the access to volumes, PID namespace and network.

Lets imagine that we have a Flask application running in a pod. We're trying to figure out why its not responding to requests. We suspect that there might be an issue with how the application interacts with the file system. We're uncertain.

One potential approach is to examine the application logs; however they may not provide information about the problem. This is where ephemeral containers prove beneficial.

To begin troubleshooting you would initiate a container, within the pod:

kubectl debug pod/my-pod -it --image=busybox

To create a container, within the my pod pod utilize the busybox image. Initiate an interactive shell. This will grant you the ability to execute troubleshooting commands specifically tailored for your Flask application.

Once you're, inside the container you can examine the file system verify environment variables or execute any helpful commands for diagnosing issues. For instance you can list the contents of the directory where your application is expected to write files:

ls -l /path/to/directory

Or you could use curl or wget to make requests to your application from within the same network namespace, helping you diagnose networking issues:

curl http://localhost:5000

Remember, ephemeral containers are temporary and will be deleted when you exit the shell or when the pod is deleted.

Summary

When creating Docker images for your Python applications, remember that containers are not virtual machines—they do not need an entire operating system to run. Use multi-stage builds to reduce the size of your images, and consider using 'scratch', Alpine, or Wolfie as your base image. If you need to debug or troubleshoot, use ephemeral containers instead of bloating your images with unnecessary tools.

By following these best practices, you can create optimized, secure Docker images for your Python applications. This not only improves your application's performance but also enhances its security, providing peace of mind in today's complex and threat-filled digital landscape.

About the Author

Aleksandro Matejic, a Cloud Architect, began working in the IT industry over 21 years ago as a technical specialist, right after his studies. Since then, he has worked in various companies and industries in various system engineer and IT architect roles. He currently works on designing Cloud solutions, Kubernetes, and other DevOps technologies.

In his spare time, Aleksandro works on different development projects such as developing devoriales.com, a blog and learning platform launching in 2022/2023. In addition, he likes to read and write technical articles about software development and DevOps methods and tools.

You can contact Aleksandro by visiting his LinkedIn Profile

 

Comments