Understanding Container Security: A Guide to Docker and Pod Security

Kubernetes docker security

Written By Aleksandro Matejic
Published 2025-06-06
Reading Time 11 min

Understanding Container Security: A Guide to Docker and Pod Security

Container Security Guide 2025

Container security has become one of the most critical concerns for DevOps engineers as containerized workloads increasingly power mission-critical applications.
In 2024-2025, the container security market reached $2.3 billion with a projected 22.3% CAGR through 2033 (ref), reflecting massive industry investment in securing containerized environments.

However, vulnerabilities like the recent "Leaky Vessels" series and sophisticated supply chain attacks demonstrate that security cannot be an afterthought in container deployments. This guide provides DevOps engineers with some practical, actionable strategies for implementing robust security across both Docker containers and Kubernetes pods.

Modern container security requires a multi-layered approach that extends beyond basic configurations to encompass advanced security contexts, admission controllers, network policies, and runtime protection.

The challenge lies not just in understanding these security mechanisms, but in implementing them effectively across development, staging, and production environments while maintaining operational efficiency.

Docker security architecture and fundamentals

Docker's security model relies on four fundamental Linux kernel technologies (namespaces, cgroups, capabilities and seccomp) that create isolated execution environments. Understanding these mechanisms is essential for implementing effective container security strategies.

Linux namespaces provide process-level isolation by creating separate instances of global system resources. Each container receives isolated views of PIDs, network stacks, filesystem mounts, hostnames, and IPC resources. This means processes in containers cannot see or affect processes in other containers or the host system.

Control groups (cgroups) complement namespaces by limiting resource consumption and preventing denial-of-service attacks where single containers exhaust system resources.

However, containers share the host kernel, creating potential attack vectors. Unlike virtual machines with separate kernels, container breakouts through kernel vulnerabilities can affect all containers on the host. This shared kernel architecture necessitates additional security layers beyond basic isolation.

Docker's Enhanced Container Isolation (ECI), available with Docker Business subscriptions, provides significant security improvements. ECI automatically runs all containers in dedicated Linux user namespaces, maps root users in containers to unprivileged users in the Docker Desktop VM, and intercepts sensitive system calls for validation. Even privileged containers become restricted to their namespace, dramatically reducing container-to-host attack surfaces.

The critical distinction between root in containers versus root on the host system often confuses DevOps engineers. Root in a container (UID 0) operates within namespace isolation with Docker-dropped capabilities, limited to specific allow listed capabilities like CHOWN, DAC_OVERRIDE, and NET_BIND_SERVICE. Root cannot load kernel modules, access raw sockets by default, or see host processes directly. In contrast, host root has complete system access, and unrestricted capability sets, and can bypass all permission checks.

User namespace remapping provides additional security by mapping container UIDs to different host UIDs. Without user namespaces, root in container equals root on host (UID 0 = UID 0), creating security risks. With user namespaces enabled, root in container (UID 0) maps to an unprivileged user on host (e.g., UID 100000), ensuring even container breakouts provide no host privileges.

Example - Enable user namespace remapping in daemon.json:

{
  "userns-remap": "default"
}

# Configure subuid and subgid
echo "dockremap:231072:65536" >> /etc/subuid
echo "dockremap:231072:65536" >> /etc/subgid

These commands configure how user and group IDs are mapped between containers and the host.

Extending Docker security capabilities

Linux capabilities break down root privileges into granular permissions, allowing fine-grained access control instead of binary root/non-root decisions. Docker drops most capabilities by default, providing only essential ones like CHOWN, DAC_OVERRIDE, FSETID, NET_RAW, and NET_BIND_SERVICE.

Production deployments should follow the principle of least privilege by dropping all capabilities and adding only necessary ones.

Secure container deployment example:

docker run -d \
  --name secure-app \
  --user 1001:1001 \
  --read-only \
  --tmpfs /tmp:rw,size=100m \
  --security-opt=no-new-privileges:true \
  --cap-drop=ALL \
  --cap-add=NET_BIND_SERVICE \
  --memory=512m \
  --cpus="0.5" \
  nginx:1.24-alpine

Rootless Docker mode eliminates the privileged Docker daemon attack surface by running the Docker daemon as a non-root user. This architecture uses user namespaces for container isolation and requires no SETUID binaries except newuidmap/newgidmap. While rootless mode has limitations including storage driver restrictions and network performance considerations, it provides excellent security for multi-tenant environments.

Install and configure rootless Docker:

dockerd-rootless-setuptool.sh install
export DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock
systemctl --user start docker

The shell script is part of Docker's rootless mode setup. The dockerd-rootless-setuptool.sh script comes from the official Docker installation and is used to configure Docker to run without root privileges. (ref)

Advanced security profiles provide additional protection layers. Seccomp profiles restrict system calls available to containers, blocking potentially dangerous syscalls while allowing necessary ones.

AppArmor and SELinux integration provide mandatory access control with profile-based security policies and label-based access control respectively.

Minimizing Docker security risks in production

Production Docker deployments require comprehensive security hardening across multiple dimensions. Image security forms the foundation of container security, requiring vulnerability scanning, minimal base images, and secure software supply chains.

Modern vulnerability scanners like Docker Scout, Trivy, and Snyk provide comprehensive image analysis.

Trivy comprehensive scanning:

trivy image --format json --output results.json nginx:latest
trivy image --severity HIGH,CRITICAL nginx:latest
trivy fs --security-checks vuln,config /path/to/project

Output:
trivy image --format json --output results.json nginx:latest

2025-06-06T17:37:11.026+0200	INFO	Need to update DB
2025-06-06T17:37:11.026+0200	INFO	DB Repository: ghcr.io/aquasecurity/trivy-db
2025-06-06T17:37:11.026+0200	INFO	Downloading DB...
65.17 MiB / 65.17 MiB [-----------------------------] 100.00% 22.27 MiB p/s 3.1s
2025-06-06T17:37:14.808+0200	INFO	Vulnerability scanning is enabled

...

Dockerfile security best practices significantly reduce attack surfaces. Always use specific image versions rather than latest tags, create non-root users, and implement multi-stage builds for compiled languages.

Secure Dockerfile example:

FROM node:18-alpine3.18

# Create non-root user early
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001 -G nodejs

WORKDIR /app

# Copy dependency files first (better caching)
COPY package*.json ./

# Install dependencies as root, then clean up
RUN npm ci --only=production && \
    npm cache clean --force && \
    rm -rf /tmp/* /var/cache/apk/*

# Copy application code
COPY --chown=nodejs:nodejs . .

# Remove unnecessary files and set permissions
RUN chmod -R o-rwx /app && \
    chmod -R g-w /app

EXPOSE 8080

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

USER nodejs:nodejs
ENTRYPOINT ["node", "server.js"]

Network security requires careful consideration of container communication patterns. Custom networks with disabled inter-container communication prevent lateral movement.

Create isolated networks:

docker network create --driver bridge \
  --subnet=172.20.0.0/16 \
  --opt com.docker.network.bridge.enable_icc=false \
  secure-network

Runtime security monitoring with tools like Falco provides real-time threat detection by monitoring system calls and Kubernetes API activity. Falco's rule-based engine detects anomalous behavior including privilege escalations, shell spawning in containers, and suspicious file system access.

Kubernetes Pod Security Standards fundamentals

Kubernetes Pod Security Standards define three distinct security policies that provide comprehensive coverage for containerized workloads. Understanding these levels is crucial for implementing appropriate security policies across different environments and use cases.

The Privileged level provides entirely unrestricted policies suitable for system administrators and infrastructure-level workloads. This level bypasses typical container isolation mechanisms and should only be used by trusted users for critical system components.

Baseline policies prevent known privilege escalations while maintaining compatibility with common containerized applications. Key restrictions include prohibiting privileged containers, blocking host namespace sharing, restricting HostPath volumes, and limiting capabilities beyond the default set.

The Restricted level implements heavily restricted policies following current Pod hardening best practices. Beyond Baseline restrictions, Restricted policies enforce non-root user execution, prohibit privilege escalation, require dropping ALL capabilities, mandate RuntimeDefault seccomp profiles, and enforce read-only root filesystems.

Restricted-compliant pod configuration:

apiVersion: v1
kind: Pod
metadata:
  name: restricted-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: nginx:1.21
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
        add: ["NET_BIND_SERVICE"]
      seccompProfile:
        type: RuntimeDefault
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

Security context configuration

Security contexts provide granular control over pod and container security settings. Pod-level security contexts apply to all containers within a pod, while container-level contexts override pod-level settings for specific containers.

Critical security context fields include user and group controls (runAsUser, runAsGroup, runAsNonRoot), privilege controls (privileged, allowPrivilegeEscalation, readOnlyRootFilesystem), and security profiles (seccompProfile, appArmorProfile, seLinuxOptions).

apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    runAsNonRoot: true
    fsGroup: 2000
    fsGroupChangePolicy: "OnRootMismatch"
    supplementalGroups: [4000, 5000]
    seccompProfile:
      type: RuntimeDefault
    seLinuxOptions:
      level: "s0:c123,c456"
  containers:
  - name: app
    image: nginx:1.21
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
        add: ["NET_BIND_SERVICE"]
      appArmorProfile:
        type: RuntimeDefault

Linux capabilities management requires careful consideration of application requirements. The principle of least privilege dictates dropping all capabilities and adding only necessary ones. Common patterns include web servers requiring NET_BIND_SERVICE for port binding, file management services needing CHOWN and DAC_OVERRIDE, and network utilities requiring NET_ADMIN and NET_RAW.

Pod Security Admission and advanced policies

Pod Security Admission (PSA) is Kubernetes' built-in admission controller that enforces Pod Security Standards. PSA operates in three modes: enforce (reject violating pods), audit (allow pods but log violations), and warn (allow pods but display warnings).

apiVersion: v1
kind: Namespace
metadata:
  name: production-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.25
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.25
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.25

Advanced policy engines like OPA Gatekeeper and Kyverno provide sophisticated policy enforcement beyond PSA capabilities. OPA Gatekeeper uses Rego language for complex validation rules, while Kyverno offers Kubernetes-native YAML-based policies with superior ease of use.

Gatekeeper constraint template for required labels:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        properties:
          labels:
            type: array
            items:
              type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels
        violation[{"msg": msg}] {
          required := input.parameters.labels
          provided := input.review.object.metadata.labels
          missing := required[_]
          not provided[missing]
          msg := sprintf("Missing required label: %v", [missing])
        }

Kyverno provides a more intuitive alternative for Kubernetes-native policy management. Kyverno policies use familiar YAML syntax without requiring specialized language knowledge, making them easier to work with.

Example - Kyverno ClusterPolicy

# Kyverno policy for Pod Security Standards compliance
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: pod-security-standards
  annotations:
    policies.kyverno.io/title: Pod Security Standards
    policies.kyverno.io/category: Pod Security Standards (Restricted)
    policies.kyverno.io/severity: high
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: check-security-context
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "Pods must run as non-root user with read-only filesystem"
      pattern:
        spec:
          securityContext:
            runAsNonRoot: true
          containers:
          - securityContext:
              allowPrivilegeEscalation: false
              readOnlyRootFilesystem: true
              capabilities:
                drop: ["ALL"]
  - name: generate-network-policy
    match:
      any:
      - resources:
          kinds:
          - Namespace
    generate:
      kind: NetworkPolicy
      name: "default-deny-all"
      namespace: "{{request.object.metadata.name}}"
      data:
        spec:
          podSelector: {}
          policyTypes:
          - Ingress
          - Egress

Network security and micro-segmentation

Network policies provide essential micro-segmentation for Kubernetes environments. Default deny-all policies establish secure baselines, while specific ingress and egress rules enable necessary communication patterns.

Default deny-all network policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
# Database access policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
      tier: database
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend
          tier: api
    ports:
    - protocol: TCP
      port: 5432
  egress:
  - to: []
    ports:
    - protocol: TCP
      port: 53  # DNS
    - protocol: UDP
      port: 53  # DNS

Advanced network security implementations using Cilium provide Layer 7 policy enforcement with HTTP method and path filtering. Service mesh integration adds encryption, authentication, and advanced traffic management capabilities.

Advanced secrets management and service accounts

External secrets management systems provide superior security compared to native Kubernetes secrets. HashiCorp Vault, AWS Secrets Manager, and Azure Key Vault integration through External Secrets Operator enables centralized secret management with automatic rotation.

Example - External Secrets Operator configuration:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: vault-backend
  namespace: production
spec:
  provider:
    vault:
      server: "https://vault.example.com"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "production-role"
          serviceAccountRef:
            name: external-secrets-sa

Service account security requires careful RBAC configuration with minimal permissions. Disable automountServiceAccountToken when Kubernetes API access is unnecessary, and use projected volumes for secure token mounting when API access is required.

Current threats and future considerations

The 2024-2025 threat landscape includes sophisticated supply chain attacks, AI-powered social engineering, and cryptojacking targeting containerized workloads. Recent vulnerabilities like "Leaky Vessels" (CVE-2024-21626) demonstrate continued risks in container runtimes and build systems. (ref)

Emerging security technologies include AI-driven threat detection, zero-trust architecture implementation, and enhanced runtime security monitoring. Organizations must balance security with operational efficiency through automated policy enforcement, continuous compliance monitoring, and integrated security tooling.

Implementation roadmap and best practices

Successful container security implementation requires a phased approach. Immediate actions include patching to latest Docker and Kubernetes versions, implementing vulnerability scanning in CI/CD pipelines, and deploying basic security contexts. Short-term initiatives encompass zero-trust architecture, compliance frameworks, and runtime security monitoring. Long-term objectives include platform engineering with security-by-design, behavioral analytics, and automated incident response.

Container security in 2024-2025 demands comprehensive, multi-layered approaches combining traditional security practices with emerging technologies. Success requires integrating security throughout the container lifecycle, from development to runtime, while fostering collaboration between development, security, and operations teams through effective DevSecOps practices. The rapid evolution of both threats and defensive technologies makes continuous learning and adaptation essential for maintaining strong security postures in increasingly complex cloud-native environments.