Java 简明教程
Java - Garbage Collection
The lifecycle of a Java object is managed by the JVM. Once an object is created by the programmer, we need not worry about the rest of its lifecycle. The JVM will automatically find those objects that are not in use anymore and reclaim their memory from the heap.
What is Java Garbage Collection?
Garbage collection 是 JVM 执行的一项主要操作,针对我们的需求进行调整能大幅提升我们应用程序的性能。现代 JVM 提供了各种垃圾回收算法。我们必须了解应用程序的需求,才能决定使用哪种算法。
Garbage collection is a major operation that JVM does and tuning it for our needs can give massive performance boosts to our application. There are a variety of garbage collection algorithms that are provided by modern JVMs. We need to be aware of our application’s needs to decide on which algorithm to use.
您无法像在非-GC 语言(如 C 和 C++)中那样在 Java 中以编程方式取消分配对象。因此,您无法在 Java 中具有悬空引用。但是,您的引用可能是 null(引用 JVM 永远不会存储对象的内存区域)。每当使用 null 引用时,JVM 都会抛出 NullPointerException。
You cannot deallocate an object programmatically in Java as you can do in non-GC languages like C and C++. Therefore, you cannot have dangling references in Java. However, you may have null references (references that refer to an area of memory where the JVM won’t ever store objects). Whenever a null reference is used, the JVM throws a NullPointerException.
请注意,尽管 GC 的缘故,在 Java 程序中很少发现内存泄漏,但它们确实会发生。我们将在本章末尾创建一个内存泄漏。
Note that while it is rare to find memory leaks in Java programs thanks to the GC, they do happen. We will create a memory leak at the end of this chapter.
Types of Garbage Collectors
现代 JVM 中使用了以下 GC
The following GCs are used in modern JVMs
-
Serial collector
-
Throughput collector
-
CMS collector
-
G1 collector
上述每种算法都会执行同样的任务——找出不再使用的 objects 并回收它们在堆中占用的内存。对此问题,一种朴素的方法是计算每个对象拥有的引用计数,并在引用计数变为 0 时释放该对象(这也称为引用计数)。为什么这是一种朴素的方法?考虑一个 circular linked list 。它的每个节点都将有一个引用,但整个对象不会在任何地方被引用,理想情况下它应当被释放。
Each of the above algorithms does the same task - finding objects that are no longer in use and reclaiming the memory that they occupy in the heap. One of the naïve approaches to this would be to count the number of references that each object has and free it up as soon as the number of references turn 0 (this is also known as reference counting). Why is this naïve? Consider a circular linked list. Each of its nodes will have a reference to it, but the entire object is not being referenced from anywhere, and should be freed, ideally.
Memory Coalescing
JVM 不仅释放内存,还将小块内存合并成更大的内存。这样做是为了防止内存碎片整理。
The JVM not only frees the memory, but also coalesces small memory chucks into bigger ones it. This is done to prevent memory fragmentation.
简单地说,典型的 GC 算法执行以下活动 -
On a simple note, a typical GC algorithm does the following activities −
-
Finding unused objects
-
Freeing the memory that they occupy in the heap
-
Coalescing the fragments
GC 在运行时必须停止应用程序线程。这是因为它在运行时会移动对象,因此这些对象不能被使用。此类停止称为“暂停整个世界”,我们在调整 GC 时要做的就是减少此类暂停的频率和持续时间。
The GC has to stop application threads while it is running. This is because it moves the objects around when it runs, and therefore, those objects cannot be used. Such stops are called "stop-the-world" pauses and minimizing the frequency and duration of these pauses is what we aim while tuning our GC.
下面显示了内存合并的简单演示:
A simple demonstration of memory coalescing is shown below:
阴影部分是需要释放的对象。即使在全部空间被回收之后,我们最多只能分配大小为 = 75Kb 的对象。这样即使我们有 200Kb 的可用空间,如下所示:
The shaded portion are objects that need to be freed. Even after when all the space is reclaimed, we can only allocate an object of maximum size = 75Kb. This is even after we have 200Kb of free space as shown below:
Generations in Garbage Collection
大多数 JVM 将堆分为三代-the young generation (YG), the old generation (OG) and permanent generation (also called tenured generation)。
Most JVMs divide the heap into three generations − the young generation (YG), the old generation (OG) and permanent generation (also called tenured generation).
我们来看一个简单的示例。Java 中的 String 类是不可变的。这意味着每当需要更改 String 对象的内容时,你都需要完全创建一个新对象。假设你在一个 loop 中对字符串进行了 1000 次更改,如下面的代码所示:
We shall look into a simple example. The String class in Java is immutable. This means that every time you need to change the contents of a String object, you have to create a new object altogether. Let us suppose you make changes to the string 1000 times in a loop as shown in the below code −
String str = "G11 GC";
for(int i = 0 ; i < 1000; i++) {
str = str + String.valueOf(i);
}
在每个循环中,我们创建了一个新字符串对象,在上一个迭代中创建的字符串变得无用(即,它不被任何引用引用)。该对象的 lifetime 仅一个迭代 - 它们将很快被 GC 收集。此类短暂生命期的对象存储在堆的年轻代区域。从年轻代收集对象的处理称为次要垃圾回收,它总是会导致“暂停整个世界”。
In each loop, we create a new string object, and the string created during the previous iteration becomes useless (that is, it is not referenced by any reference). T lifetime of that object was just one iteration – they’ll be collected by the GC in no time. Such short-lived objects are kept in the young generation area of the heap. The process of collecting objects from the young generation is called minor garbage collection, and it always causes a "stop-the-world" pause.
Minor Garbage Collection
随着年轻代的填满,GC 进行了次要垃圾回收。舍弃死对象,并将活对象移动到老年代。在此过程中,应用程序线程停止。
As the young generation gets filled up, the GC does a minor garbage collection. Dead objects are discarded, and live objects are moved to the old generation. The application threads stop during this process.
在这里,我们可以看到这样的生成设计所提供的优势。年轻代只是堆的一小部分,并且很快填满。但处理它的时间比处理整个堆的时间短得多。因此,这种情况下的“暂停整个世界”较短,但更频繁。即使更频繁,我们也应该始终以更短的暂停为目标,而不是较长的暂停。
Here, we can see the advantages that such a generation design offers. The young generation is only a small part of the heap and gets filled up quickly. But processing it takes a lot lesser time than the time taken to process the entire heap. So, the "stop-the-world" pauses in this case are much shorter, although more frequent. We should always aim for shorter pauses over longer ones, even though they might be more frequent.
Full Garbage Collection
年轻代分为两个空间 - eden 和 survivor space。在 eden 收集中存活的对象被移到幸存者空间,在幸存者空间中存活的对象被移到老年代。年轻代在收集时会进行紧缩。
The young generation is divided into two spaces − eden and survivor space. Objects that have survived during the collection of eden are moved to survivor space, and those who survive the survivor space are moved to the old generation. The young generation is compacted while it is collected.
随着对象被移到老年代,它最终会填满,并且必须收集和压缩。不同的算法采用不同的方法。其中一些会停止应用程序线程(因为与年轻代比较,老年代非常大,因此会导致较长的“暂停整个世界”),而另一些则在应用程序线程不断运行时同时执行此操作。此过程称为完全 GC。两个这样的收集器是 CMS 和 G1。
As objects are moved to the old generation, it fills up eventually, and has to be collected and compacted. Different algorithms take different approaches to this. Some of them stop the application threads (which leads to a long "stop-the-world" pause since the old generation is quite big in comparison to the young generation), while some of them do it concurrently while the application threads keep running. This process is called full GC. Two such collectors are CMS and G1.
Tuning Garbage Collectors
我们可以按需调整垃圾回收器。以下是可以根据情况进行配置的区域:
We can tune Garbage collectors as well as per our need. Following are the areas which we can configure based on the situations:
-
Heap Size Allocation
-
Generation Sizes Allocation
-
Permagen and Metaspace Configurations
在理解其影响时,让我们详细地了解每一个部分。我们还将根据可用内存、CPU 配置和其他相关因素讨论建议。
Let’s understand each in detail while understanding their impact. We’ll also discuss the recommendations based on available memory, CPU configurations and other relevant factors.
Heap Size Allocation
堆大小是我们 Java 应用程序性能的一个重要因素。如果它太小,它将经常被填满,结果,GC 将不得不频繁地回收它。另一方面,如果我们只是增加堆的大小,尽管它需要被更少地回收,但暂停时间将增加。
The heap size is an important factor in the performance of our Java applications. If it is too small, then it will get filled frequently and as a result, will have to be collected frequently by the GC. On the other hand, if we just increase the size of the heap, although it need to be collected less frequently, the length of the pauses would increase.
此外,增加堆大小会对底层操作系统造成严重影响。使用分页,操作系统让我们应用程序看到比实际可用内存多得多的内存。操作系统利用磁盘上的交换空间管理它,将程序的非活动部分复制到其中。当需要这些部分时,操作系统会将它们从磁盘复制回内存中。
Further, increasing the heap size has a severe penalty on the underlying OS. Using paging, the OS makes our application programs see much more memory than is actually available. The OS manages this by using some swap space on the disk, copying inactive portions of the programs into it. When those portions are needed, the OS copies them back from the disk to the memory.
让我们假设一台机器有 8G 的内存,JVM 看到了 16G 的虚拟内存,JVM 不会知道系统中实际上只有 8G 可用。它只会从操作系统中请求 16G,一旦获得该内存,它将继续使用它。操作系统必须交换大量数据,这对系统来说是一个巨大的性能损失。
Let us suppose that a machine has 8G of memory, and the JVM sees 16G of virtual memory, the JVM would not know that there is in fact only 8G available on the system. It will just request 16G from the OS, and once it gets that memory, it will continue using it. The OS will have to swap a lot of data in and out, and this is a huge performance penalty on the system.
接下来是此类虚拟内存完全 GC 期间将发生的暂停。由于 GC 将对整个堆进行回收和压缩操作,因此它将不得不等待很长时间才能对虚拟内存进行磁盘交换。如果是并发收集器,后台线程将不得不等待很长时间,因为数据将从交换空间复制到内存中。
And then comes the pauses which would occur during the full GC of such virtual memory. Since the GC will act on the entire heap for collection and compaction, it will have to wait a lot for the virtual memory to be swapped out of the disk. In case of a concurrent collector, the background threads will have to wait a lot for data to be copied from the swap space to the memory.
因此,现在的问题是如何决定将堆大小设为最优值。第一条规则是:永远不要向操作系统请求比实际存在的更大的内存。这将完全防止频繁交换的问题。如果一台机器安装并运行了多个 JVM,那么它们合计的总内存请求会小于系统中实际的 RAM 。
So here the question of how we should decide on the optimal heap size comes. The first rule is to never request the OS more memory than is actually present. This would totally prevent the problem for frequent swapping. If the machine has multiple JVMs installed and running, then the total memory request by all of them combined is less than the actual RAM present in the system.
你可以使用两个标记控制 JVM 内存请求的大小:
You can control the size of memory request by the JVM using two flags −
-
-XmsN − Controls the initial memory requested.
-
-XmxN − Controls the maximum memory that can be requested.
这两个标记的默认值取决于底层操作系统。例如,对于在 MacOS 上运行的 64b JVM,-XmsN = 64M 和 -XmxN = 1G 的最小值或总物理内存的 1/4。
The default values of both these flags depend upon the underlying OS. For example, for 64b JVMs running on the MacOS, -XmsN = 64M and -XmxN = minimum of 1G or 1/4th of the total physical memory.
请注意,JVM 可以自动调整两个值之间。例如,如果它注意到 GC 发生得太多,只要它在 -XmxN 之下并且满足所需的性能目标,它就会不断增加内存大小。
Note that the JVM can adjust between the two values automatically. For example, if it notices that too much GC is happening, it will keep increasing the memory size as long as it is under -XmxN and the desired performance goals are met.
如果你确切地知道你的应用程序需要多少内存,那么你可以设置 -XmsN = -XmxN。在这种情况下,JVM 不需要计算堆的“最佳”值,因此,GC 进程变得更有效。
If you know exactly how much memory your application needs, then you can set -XmsN = -XmxN. In this case, the JVM does not need to figure out an "optimal" value of the heap, and hence, the GC process becomes a little more efficient.
Generation Sizes Allocation
你可以决定你想要将多少堆分配给 YG,以及你想要将多少堆分配给 OG。这两个值以下面的方式影响我们应用程序的性能。
You can decide on how much of the heap do you want to allocate to the YG, and how much of it you want to allocate to the OG. Both of these values affect the performance of our applications in the following way.
如果 YG 的大小非常大,那么它将被收集的频率会更低。这将导致更少的对象提升到 OG。另一方面,如果你将 OG 的大小增加得太大,那么对其进行收集和压缩会花费太多时间,这会导致长时间的 STW 暂停。因此,用户必须在这两个值之间找到平衡。
If the size of the YG is very large, then it would be collected less frequently. This would result in lesser number of objects being promoted to the OG. On the other hand, if you increase OG’s size too much, then collecting and compacting it would take too much time and this would lead to long STW pauses. Thus, the user has to find a balance between these two values.
下面是可以用来设置这些值的标记:
Below are the flags that you can use to set these values −
-
-XX:NewRatio=N: Ratio of the YG to the OG (default value = 2)
-
-XX:NewSize=N: YG’s initial size
-
-XX:MaxNewSize=N: YG’s max size
-
-XmnN: Set NewSize and MaxNewSize to the same value using this flag
YG 的初始大小由 newRatio 的值确定,公式为:
The initial size of the YG is determined by the value of NewRatio by the given formula −
(total heap size) / (newRatio + 1)
由于 newRatio 的初始值为 2,因此上述公式给出 YG 的初始值为总堆大小的 1/3。你始终可以通过使用 NewSize 标记显式指定 YG 的大小来覆盖此值。此标记没有任何默认值,如果不显式设置,YG 的大小将继续使用上述公式计算。
Since the initial value of newRatio is 2, the above formula gives the initial value of YG to be 1/3 of the total heap size. You can always override this value by explicitly specifying the size of the YG using the NewSize flag. This flag does not have any default value, and if it is not set explicitly, the size of the YG will keep getting calculated using the above formula.
Permagen and Metaspace Configurations
永久代和元空间是堆区域,JVM 在其中存储类的元数据。在 Java 7 中,该空间被称为“永久代”,在 Java 8 中,它被称为“元空间”。编译器和运行时使用此信息。你可以使用以下标记控制永久代的大小:-:PermSize = N 和 -XX:MaxPermSize = N。可以使用以下来控制元空间的大小::Metaspace- Size = N 和 -:MaxMetaspaceSize = N。
The permagen and the metaspace are heap areas where the JVM keeps classes' metadata. The space is called the "permagen' in Java 7, and in Java 8, it is called the "metaspace'. This information is used by the compiler and the runtime. You can control the permagen’s size using the following flags: -XX: PermSize=N and -XX:MaxPermSize=N. Metaspace’s size can be controlled using: -XX:Metaspace- Size=N and -XX:MaxMetaspaceSize=N.
在没有设置标记值的情况下管理永久代和元空间有一些差异。默认情况下,两者都有一个默认的初始大小。但是,虽然元空间可以占用尽可能多的堆,但永久代不能占用超过默认初始值。例如,64b JVM 的堆空间最大为 82M 的永久代大小。
There are some differences how the permagen and the metaspace are managed when the flag values are not set. By default, both have a default initial size. But while the metaspace can occupy as much of the heap as is needed, the permagen can occupy no more than the default initial values. For example, the 64b JVM has 82M of heap space as maximum permagen size.
请注意,由于元空间可以占用无限量的内存,除非指定不占用无限量内存,否则可能出现内存不足错误。每次调整这些区域的大小时,都会进行一次完全 GC。因此,在启动时,如果有大量类正在加载,元空间可能不断调整大小,每次都导致完全 GC。因此,如果初始元空间大小太小,大型应用程序需要花费大量时间才能启动。增加初始大小是一个好主意,因为它可以减少启动时间。
Note that since the metaspace can occupy unlimited amounts of memory unless specified not to, there can be an out of memory error. A full GC takes place whenever these regions are getting resized. Hence, during startup, if there are a lot of classes that are getting loaded, the metaspace can keep resizing resulting in a full GC every time. Thus, it takes a lot of time for large applications to startup in case the initial metaspace size is too low. It is a good idea to increase the initial size as it reduces the startup time.
虽然永久代和元空间保存着类元数据,但是它并不永久,并且该空间会像对象一样被 GC 回收。这通常出现在服务器应用程序中。每当向服务器进行新的部署时,旧元数据都必须进行清理,因为新的类加载器现在需要空间。这块空间会被 GC 释放。
Though the permagen and metaspace hold the class metadata, it is not permanent, and the space is reclaimed by the GC, as in case of objects. This is typically in case of server applications. Whenever you make a new deployment to the server, the old metadata has to be cleaned up as new class loaders will now need space. This space is freed by the GC.