Docker 简明教程
Docker - Image Layering and Caching
Docker 映像层是 Docker 架构的基本组件,用作 Docker 映像的构建模块。作为添加到最终图像的只读层,每个映像层都表示 Dockerfile 中的一条不同指令。
Docker image layers are fundamental components of the Docker architecture, serving as the building blocks for Docker images. As a read-only layer that adds to the final image, each image layer represents a distinct instruction from a Dockerfile.
紧随基础层(通常是 Ubuntu 等操作系统)之后,会将更多层添加到进程中。这些层的一些示例是应用程序代码、环境设置和软件安装。
Following a base layer – typically an operating system like Ubuntu – further layers are added to the process. Application code, environment settings, and software installations are examples of these layers.
为了在各个层之间保持隔离和不可变性,并使它们能够堆叠并显示为单个文件系统,Docker 采用联合文件系统。分层的效率和可重用性优势很大。Docker 通过分层缓存来确保共享各种映像的公共层得到重复使用,这减少了生成时间和存储需求。
In order to maintain isolation and immutability between each layer and enable them to stack and appear as a single file system, Docker employs a union file system. The efficiency and reusability benefits of layering are substantial. Docker ensures that common layers shared by various images are reused through layer caching, which reduces build time and storage requirements.
此外,由于有此分层缓存,使得映像分发变得更有效,因为在更新期间只需要传输新增层。此外,层的不可变性确保一旦创建了某个层,它就不会发生更改,从而简化了版本控制并保证了在不同环境中的一致性。
Additionally, because of this layer caching, image distribution is made more efficient, as only the only newly added layers need to be transferred during updates. Furthermore, layers' immutability ensures that once a layer is created, it never changes, simplifying version control and guaranteeing consistency across various environments.
Components of Docker Image Layers
Docker 映像中的每一层都表示从 Dockerfile 中获取的一组指令。这些层分为三组:基础层、中间层和顶层。每组在创建映像的过程中都有一个特定功能。
Every layer in a Docker image represents a set of instructions taken from the Dockerfile. These layers are divided into three groups: base, intermediate, and top layers. Each group has a specific function in the process of creating an image.
Base Layer
在 Docker 映像中,基础层通常是最小操作系统或支持该应用程序所需的运行时环境,它构成 Docker 映像的基础。它大部分时间都是从已经存在的图像(例如 node、alpine 或 Linux)创建的。由于它为后续所有层发挥作用奠定了框架,因此该层至关重要。
The minimal operating system or runtime environment required to support the application is usually found in the base layer, which forms the basis of a Docker image. The majority of the time, it is created from an already-existing image, like node, alpine, or Linux. Since it establishes the framework for all upcoming layers to function in, this layer is essential.
为了提供一个标准化的起点,基础层经常包含众多应用程序共享的必需库和依赖项。通过确保其应用程序具备可靠且一致的基础映像,开发人员可以在各个环境中简化开发和部署过程。
To provide a standardized starting point, the base layer frequently contains necessary libraries and dependencies shared by numerous applications. It is possible for developers to simplify the development and deployment process across various environments by ensuring that their applications have a dependable and consistent base image.
Intermediate Layer
基础层之上添加的层称为中间层。每个中间层都与一条 Dockerfile 指令(例如 RUN、COPY 或 ADD)相关。这些层包含某些应用程序依赖项、配置文件和其他补充基础层的必要元素。
The layers that are added on top of the base layer are called intermediate layers. A single Dockerfile instruction, such as RUN, COPY, or ADD, is correlated with each intermediate layer. Certain application dependencies, configuration files, and other essential elements that supplement the base layer are included in these layers.
在中间层中可以执行任务的一些示例包括安装软件包、将源代码传输到映像中或配置环境变量。
Installing software packages, transferring source code into the image, or configuring environment variables are a few examples of tasks that could be done in an intermediate layer.
必须逐步建立应用程序环境,而这就需要中间层。由于每一层都是不可变的,添加或修改一层会导致创建新层,而不是对已经存在的层进行更改。由于每一层都是不可变的,因此效率会提高,冗余会减少,因为每一层在各个映像中都一致且可重复使用。
The application environment must be gradually built up, and this requires intermediate layers. Since each layer is immutable, adding or modifying one causes the creation of new layers rather than changes to already existing ones. Because each layer is immutable, efficiency is increased and redundancy is decreased as each layer is consistent and reusable across various images.
Top Layer
Docker 映像中的最后一层是顶层,也称为应用程序层。此层包含应用程序的实际代码以及使其发挥作用所需的任何最后一刻设置。基础环境和中间层所做的细微调整相结合,在顶层中创建了一个完成的、可执行的应用程序,这是此前各层工作的结果。
The last layer in the Docker image is the top layer, also known as the application layer. This layer contains the actual code for the application as well as any last-minute setups required for it to function. The base environment and the small adjustments made by the intermediate layers are combined to create a finished and executable application in the top layer, which is the result of all the work done by the layers that came before it.
为了将一个镜像与另一个镜像区分开,最顶层是独属于容器化应用程序的。当执行镜像以创建容器时,在运行时最直接交互的内容就是这一顶层的内容。
To differentiate one image from another, the top layer is unique to the containerized application. It is the contents of this top layer that are most directly interacted with during runtime when the image is executed to create a container.
What are Cache Layers in Docker Images?
为了最大化和加快 Docker 镜像的创建速度,缓存层是 Docker 中镜像构建过程的必要组成部分。它们被设计为尽可能地重复使用以前构建的层。该机制使得减少创建 Docker 镜像所需的定期时间和计算能力成为可能,并提高了效率。
In order to maximize and expedite the creation of Docker images, cache layers are an essential component of the image build process in Docker. They are designed to reuse previously built layers whenever possible. Reducing the amount of time and computational power needed to create Docker images on a regular basis and improving efficiency are made possible by this mechanism.
当你构建 Docker 镜像时,Docker 会依次执行 Dockerfile 中的每个命令。对于每个命令,Docker 会验证该指令是否从未在相同的上下文中执行过。如果是,Docker 无需创建新层——它可以重复使用已经创建的层。这个过程被称为“ Layer caching ”。由于缓存层包含了构建过程中创建的所有中间层,因此 Docker 可以跳过尚未更改的步骤,从而显著加快构建过程。
Docker executes every command in the Dockerfile one after the other when you build a Docker image. Docker verifies that an instruction has never been executed with the same context before for each one. If so, Docker doesn’t need to create a new layer – it can reuse the one that was already created. "Layer caching" is the term for this procedure. The build process can be accelerated considerably by using Docker to skip steps that haven’t changed because the cache layers contain all intermediate layers created during the build process.
How do Cache Layers Work?
Instruction Matching −在评估完 Dockerfile 中的每个指令后,Docker 会搜索一个与之匹配的缓存层。匹配与否取决于上下文(例如包含在 COPY 指令中的文件或 RUN 指令中的确切命令)以及指令本身。
Instruction Matching − Docker searches for a cached layer that matches each instruction in the Dockerfile after evaluating each one. The context—such as the files included in a COPY instruction or the precise command in a RUN instruction—and the instruction itself determine whether two things match.
Layer Reuse −如果 Docker 在其缓存中发现匹配,则它会重复使用当前层,而不会构建新层。因此,Docker 避免重复执行指令,从而节省时间和资源。
Layer Reuse − Docker reuses the current layer rather than building a new one if it discovers a match in its cache. As a result, Docker avoids repeating the instruction, saving both time and resources.
Cache invalidation −当指令的上下文发生变化时,这个过程称为使指令失效。例如,如果在 COPY 指令中使用的一个文件被更改并且没有找到匹配的缓存层,则 Docker 将不得不重建该层以及所有后续层。
Cache invalidation − It is the process of invalidating an instruction when its context changes. Docker will have to rebuild the layer and all subsequent layers, for instance, if a file used in a COPY instruction is changed and there isn’t a matching cached layer found.
Benefits of Cache Layers
Build Speed −构建时间缩短似乎是主要优势。Docker 可以显著加快构建过程,特别是对于具有大量层的大型镜像。
Build Speed − The shorter build time seems to be the main advantage. Docker can expedite the build process considerably by reusing existing layers, particularly for large images with numerous layers.
Resource Efficiency −重复使用层会最大程度地减少需要处理和存储的数据量,并且节省计算资源。
Resource Efficiency − Reusing layers minimizes the amount of data that needs to be processed and stored and conserves computational resources.
Consistency −通过重复使用已经过测试和验证的层,缓存层可以确保构建的一致性,并降低在重建期间引入新错误的风险。
Consistency − By reusing layers that have already been tested and validated, cache layers guarantee consistent builds and lower the risk of introducing new errors during rebuilds.
Cache Layers: Limitations and Considerations
虽然缓存层提供了很多好处,但它们也有一些局限性 −
While cache layers provide many benefits, they also have some limitations −
Cache Size −缓存可能会占用大量磁盘空间,而且难以有效地管理缓存。虽然缓存层有很多优点,但它们也有一些缺点。
Cache Size − The cache can take up a lot of disk space, and it can be difficult to manage the cache efficiently. Although cache layers have many advantages, they also have some drawbacks.
Cache invalidation −由于 Dockerfile 或构建上下文的修改,重建层可能变得有必要。
Cache invalidation − Rebuilding layers from scratch may be necessary as a result of modifications to the Dockerfile or build context.
Security −过度依赖缓存的层而又不进行验证可能会使用户的的信息面临风险,因为旧的或较弱的层被重复使用了。
Security − Relying excessively on cached layers without verification may put users' information at risk if old or weak layers are reused.
Tips to Maximize Layer Caching in Dockerfiles
确保不常更改的命令被组合在一起,并且最大程度地减少早期层的更改,是最大化 Dockerfiles 中的层缓存的关键。通过此技术,Docker 可以在将来的构建中重复使用尽可能多的层。以下是对 Dockerfile 结构进行层缓存优化的建议实践 −
Making sure that commands that change rarely are grouped together and that changes to the early layers are minimized are the keys to maximizing layer caching in Dockerfiles. As many layers as possible can be reused by Docker in future builds thanks to this technique. For optimal layer caching, the following are recommended practices for Dockerfile structure −
Start with a Stable Base Image
为 Dockerfile 选择一个稳定且维护良好的基础镜像。这有助于在构建之间保持基础层的统一性。
As the base image for your Dockerfile, pick one that is stable and well-maintained. This contributes to maintaining the consistency of the base layer between builds.
FROM ubuntu:20.04
Group and Order Instructions by Volatility
按指令更改的频率排序,从最少的开始。这样一来,即使在 Dockerfile 更新之后,Docker 也能缓存额外的层。
Sort instructions by how often they change, starting with the least. Because of this, Docker can cache additional layers even after the Dockerfile is updated.
Install Dependencies Together
为了最大程度地减少层数并保证这些命令作为一个单独的层被缓存,合并包安装命令。
In order to minimize the number of layers and guarantee that these commands are cached as a single layer, combine package installation commands.
RUN apt-get update && apt-get install -y \
curl \
vim \
git \
&& apt-get clean
Separate Application Code and Dependencies
在不同的指令中添加应用程序代码和依赖关系。这样,对代码的更新不会使依赖关系缓存失效。
In separate instructions, add the application code and dependencies. In this manner, updates to the code do not cause the dependency cache to become invalid.
# Install application dependencies
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r /app/requirements.txt
# Copy application code
COPY . /app
Use Multi-Stage Builds
利用多阶段构建来维护最终镜像精简且没有多余层。中间阶段可以创建工件并缓存依赖关系。
To keep the final image lean and free of extra layers, make use of multi-stage builds. Artifacts can be created and dependencies cached by intermediate stages.
# Build stage
FROM golang:1.16 as builder
WORKDIR /app
COPY . .
RUN go build -o myapp
# Final stage
FROM alpine:3.13
COPY --from=builder /app/myapp /usr/local/bin/myapp
CMD ["myapp"]
Minimize the Number of Layers
为了最大程度减少层数,在必要时合并命令。
To minimize the number of layers, combine commands where necessary.
RUN apt-get update && \
apt-get install -y curl vim git && \
apt-get clean
Use .dockerignore File
如果镜像不需要任何文件或目录,则排除它们以避免在这些文件发生更改时使缓存失效。
If any files or directories are not required for the image, exclude them to avoid the cache being invalidated when these files change.
# .dockerignore
.git
node_modules
dist
Dockerfile
Explicit Versioning
如果镜像不需要任何文件或目录,则排除它们以避免在这些文件发生更改时使缓存失效。要在安装程序包时确保使用缓存,即使程序包的最新版本发生更改。
If any files or directories are not required for the image, exclude them to avoid the cache being invalidated when these files change. To guarantee that the cache is used even if the package’s most recent version changes, use specific versions when installing it.
RUN apt-get install -y nodejs=14.16.0-1nodesource1
Example Dockerfile
以下是整合这些实践的示例 Dockerfile −
Here is an example Dockerfile that incorporates these practices −
# Base image
FROM python:3.9-slim
# Install dependencies
RUN apt-get update && apt-get install -y \
build-essential \
libssl-dev \
libffi-dev \
python3-dev \
&& apt-get clean
# Copy and install Python dependencies
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r /app/requirements.txt
# Copy application code
COPY . /app
# Set the working directory
WORKDIR /app
# Set the entry point
CMD ["python", "app.py"]
通过遵守这些指南,可以优化 Docker 的层缓存以实现更快的构建和更经济的资源使用。
You can optimize Docker’s layer caching for faster builds and more economical resource usage by adhering to these guidelines.
Conclusion
总之,为了最大化容器化的收益(例如更快的构建、更有效的资源利用和可靠的应用程序部署),对 Docker 镜像进行分层和缓存是必不可少的。
To sum up, in order to maximize the benefits of containerization - such as quicker builds, more effective use of resources, and reliable application deployments - it is imperative to layer and cache Docker images.
开发人员可以通过利用 Docker 镜像的分层结构和仔细排列 Dockerfiles 来优化层缓存,最小化构建时间和提高缓存层的可重用性。
Developers can optimize layer caching, minimize build times, and improve the reusability of cached layers by taking advantage of the hierarchical structure of Docker images and carefully arranging Dockerfiles.
用于层缓存优化的最佳技术包括使用多阶段构建、利用稳定基准镜像、根据易失性对指令进行分类和排序以及分离应用程序代码和依赖项。
The greatest techniques for layer caching optimization include using multi-stage builds, utilizing stable base images, classifying and ordering instructions according to volatility, and separating application code and dependencies.
通过仔细评估这些方法,Docker 用户可以提高其工作流程的生产力,优化其开发流程,并生成更可靠且可扩展的容器化应用程序。
By carefully evaluating these methods, Docker users can increase the productivity of their workflows, optimize their development processes, and produce containerized applications that are more dependable and scalable.
FAQs
Q1. How can I optimize the Dockerfile for better layer caching?
为了优化 Dockerfiles 以获得更好的层缓存,重要的是组织指令以最大程度地减少对早期层的更改并按组将不常更改的命令分组在一起。在创建稳定的基准镜像后,按更改频率的递减顺序排列指令。
It’s important to organize instructions to minimize changes to early layers and group commands that change infrequently together in order to optimize Dockerfiles for better layer caching. After creating a base image that is stable, arrange the instructions in decreasing order of frequency of change.
为了防止因代码更改而导致缓存失效,请将应用程序代码和依赖项保持分开。利用多阶段构建以减少多余的层并维护精简的最终镜像。最后,要在安装程序包时保持缓存可重用性即使程序包版本发生更改,也请使用显式版本控制。
To prevent cache invalidation as a result of code changes, keep the application code and dependencies apart. Make use of multi-stage builds to reduce superfluous layers and maintain a lean final image. Lastly, to maintain cache reusability even when package versions change, use explicit versioning when installing packages.
Q2. What are the limitations of Docker layer caching?
尽管 Docker 层缓存有很多优点,但它也并非没有缺点。对构建上下文或 Dockerfile 指令进行更改会导致缓存失效,这可能会导致构建时间增加,因为 Docker 会从头开始重建层。控制缓存大小可能会很困难,因为缓存层会占用磁盘空间,并且可能需要定期清理它们以释放存储空间。
Although Docker layer caching has many advantages, it is not without drawbacks. Changes to the build context or Dockerfile instructions may cause cache invalidation, which could cause build times to increase as Docker reconstructs layers from the beginning. Keeping the cache size under control can be difficult because cached layers use up disk space and may need to be regularly pruned in order to free up storage.
此外,由于过度依赖缓存层而没有进行充分的验证,可能导致重用过时或有漏洞的层,这会构成安全风险。
Furthermore, reusing outdated or vulnerable layers due to an over-reliance on cached layers without adequate verification may pose security risks.
Q3. How can I troubleshoot Docker build issues related to layer caching?
如果您遇到与 Docker 构建相关的问题层缓存问题,请首先分析构建日志以检测任何缓存未命中或缓存失效消息。查找可能导致缓存失效的构建上下文或 Dockerfile 指令中的修改。
If you encounter Docker build problems related to layer caching, begin by analyzing the build logs to detect any cache misses or cache invalidation messages. Look for modifications in the build context or Dockerfile instructions that could have caused cache invalidation.
评估 Dockerfile 结构以验证其是否遵循优化层缓存效率的最佳实践。尝试各种 Dockerfile 设置,如重新排序指令或重新排列命令,以查看它们是否能提高缓存效率。
Evaluate the Dockerfile structure to verify that it adheres to optimal practices for enhancing layer caching effectiveness. Try out various Dockerfile setups, like reordering instructions or rearranging commands, to see if they enhance caching efficiency.
最后,请参阅 Docker 文档和社区论坛以获取进一步的故障排除指南和建议。
Lastly, refer to Docker documentation and community forums for further troubleshooting guidance and recommendations.