Java 简明教程

Java - Just-In-Time (JIT) Compiler

Just-in-time (JIT) compiler 是一个由 JVM 在内部用来将字节码中的热点翻译成机器可以理解的代码的编译器。JIT 编译器的主要目的是在性能内进行大量优化。

Just-in-time (JIT) compiler is a compiler that is used by JVM internally to translate the hot spots in the byte code to machine-understandable code. The main purpose of JIT compiler is to do heavy optimizations in performance.

Java 编译的代码针对 JVM。Java 编译器 javac 将 Java 代码编译为字节码。现在,JVM 解释此字节码并在底层硬件上执行它。如果某些代码要反复执行,JVM 会将代码识别为热点,并使用 JIT compiler 将代码进一步编译到本机机器代码级别,并在需要时重复使用编译的代码。

Java-compiled code is targeted for JVM. A Java compiler, javac compiles the Java code into bytecode. Now JVM interprets this bytecode and executes it on the underlying hardware. In case of some code is to be executed again and again, JVM identifies the code as hotspots and compiles the code further using the JIT compiler to the native machine code level and reuses the compiled code whenever needed.

我们先来了解一下编译语言和解释语言之间的差异,以及 Java 如何利用这两种方法。

Let’s first understand the difference between Compiled vs Interpreted language and how Java takes benefits of both approaches.

Compiled Vs. Interpreted Languages

诸如 CC++FORTRAN 之类的语言是 compiled languages。它们的代码作为针对底层机器的二进制代码传递。这意味着高级代码立即由专门为底层架构编写的静态编译器编译为二进制代码。生成的可执行文件无法在任何其他架构上运行。

Languages such as C, C++, and FORTRAN are compiled languages. Their code is delivered as binary code targeted at the underlying machine. This means that the high-level code is compiled into binary code at once by a static compiler written specifically for the underlying architecture. The binary that is produced will not run on any other architecture.

另一方面,诸如 PythonPerl 之类的 interpreted languages 可以运行在任何机器上,只要它们具有有效的解释器。它按行遍历高级代码,将其转换成二进制代码。

On the other hand, interpreted languages like Python and Perl can run on any machine, as long as they have a valid interpreter. It goes over line-by-line over the high-level code, converting that into binary code.

解释型代码通常比已编译的代码慢。例如,考虑一个循环。解释器会为循环的每次迭代转换对应的代码。另一方面,编译的代码只会转换一个。此外,由于解释器一次只能看到一行,因此它们无法执行任何重要的代码,例如更改编译器的语句执行顺序。

Interpreted code is typically slower than compiled code. For example, consider a loop. An interpreted will convert the corresponding code for each iteration of the loop. On the other hand, a compiled code will translate only one. Further, since interpreters see only one line at a time, they are unable to perform any significant code such as changing the order of execution of statements like compilers.

Example

下面我们来看一个这样的优化示例

We shall look into an example of such optimization below −

Adding two numbers stored in memory: 由于访问内存会消耗多个 CPU 周期,因此好的编译器会发出指令从内存中获取数据,仅在数据可用时才执行加法操作。它不会等待,与此同时,执行其他指令。另一方面,由于解释器在任何给定时间都不知道整个代码,因此在解释过程中无法进行此类优化。

Adding two numbers stored in memory: Since accessing memory can consume multiple CPU cycles, a good compiler will issue instructions to fetch the data from memory and execute the addition only when the data is available. It will not wait and in the meantime, execute other instructions. On the other hand, no such optimization would be possible during interpretation since the interpreter is not aware of the entire code at any given time.

但随后,解释语言可以在任何装有该语言的有效解释器的机器上运行。

But then, interpreted languages can run on any machine that has a valid interpreter of that language.

Is Java Compiled or Interpreted?

Java 尝试找到一个中间立场。由于 JVM 介于 javac 编译器和底层硬件之间,javac(或任何其他编译器)编译器将 Java 代码编译到字节码中,该代码由特定于平台的 JVM 理解。然后,JVM 在执行代码时使用 JIT (Just-in-time) compilation 将字节码编译为二进制文件。

Java tried to find a middle ground. Since the JVM sits in between the javac compiler and the underlying hardware, the javac (or any other compiler) compiler compiles Java code in the Bytecode, which is understood by a platform-specific JVM. The JVM then compiles the Bytecode in binary using JIT (Just-in-time) compilation, as the code executes.

HotSpots

在典型的程序中,仅有很小一部分代码经常执行,而且通常是这部分代码极大地影响整个应用程序的性能。这样的代码段称为 HotSpots

In a typical program, there’s only a small section of code that is executed frequently, and often, it is this code that affects the performance of the whole application significantly. Such sections of code are called HotSpots.

如果某个代码段仅执行一次,那么将其编译将是浪费时间,而直接解释字节代码会更快。但是,如果该代码段是一个热点部分并且执行多次,则 JVM 会对其进行编译。例如,如果一个方法多次调用,编译代码所需的额外周期将通过生成更快的二进制文件来抵消。

If some section of code is executed only once, then compiling it would be a waste of effort, and it would be faster to interpret the Bytecode instead. But if the section is a hot section and is executed multiple times, the JVM would compile it instead. For example, if a method is called multiple times, the extra cycles that it would take to compile the code would be offset by the faster binary that is generated.

此外,JVM 运行特定方法或循环的次数越多,它收集的信息就越多,从而可以进行多种优化,以便生成更快的二进制文件。

Further, the more the JVM runs a particular method or a loop, the more information it gathers to make sundry optimizations so that a faster binary is generated.

Working of JIT Compiler

JIT compiler 有助于通过将某些热点代码编译成机器代码或本机代码来缩短 Java 程序的执行时间。

JIT compiler helps in improving the Java programs execution time by compiling certain hotspot codes to machine or native code.

JVM 扫描完整代码并识别 JIT 要优化的热点或代码,然后在运行时调用 JIT Compiler,进而提高程序的效率并更快地运行程序。

JVM scans the complete code and identifies the hotspots or the code which is to be optimized by JIT and then invokes JIT Compiler at runtime in turn improves the efficiency of the program and runs it faster.

由于 JIT compilation 是一个处理器且会占用大量内存,因此需要计划好即时编译 (JIT)。

As JIT compilation is a processor and memory-intensive activity, JIT compilation is to be planned accordingly.

Compilation Levels

JVM 支持五个编译级别−

JVM supports five compilation levels −

  1. Interpreter

  2. C1 with full optimization (no profiling)

  3. C1 with invocation and back-edge counters (light profiling)

  4. C1 with full profiling

  5. C2 (uses profiling data from the previous steps)

如果你想禁用所有 JIT compilers 并只使用解释器,请使用 -Xint

Use -Xint if you want to disable all JIT compilers and use only the interpreter.

Client Vs. Server JIT (Just-In-Time) Compiler

使用 -client-server 激活相应的模式。客户端编译器 (C1) 开始编译代码的时间比服务器编译器 (C2) 早。因此,到 C2 开始编译时,C1 就已经编译了一部分代码。但 C2 在等待时会分析代码,以便比 C1 更多地了解它。因此,它等待的时间如果被优化所抵消的话,就可以用来生成一个更快的二进制文件。

Use -client and -server to activate the respective modes. The client compiler (C1) starts compiling code sooner than the server compiler (C2). So, by the time C2 has started compilation, C1 would have already compiled sections of code. But while it waits, C2 profiles the code to know about it more than C1 does. Hence, the time it waits if offset by the optimizations can be used to generate a much faster binary.

从用户的角度来看,这是程序启动时间和程序运行时间之间的权衡。如果启动时间是首要考虑因素,则应使用 C1。如果应用程序预计将长时间运行(此类应用程序通常部署在服务器上),则最好使用 C2,因为它生成的代码更快,可以在很大程度上抵消额外的启动时间。

From the perspective of a user, the trade-off is between the startup time of the program and the time taken for the program to run. If startup time is the premium, then C1 should be used. If the application is expected to run for a long time (typical of applications deployed on servers), it is better to use C2 as it generates much faster code which greatly offsets any extra startup time.

对于诸如 IDE(NetBeans、Eclipse)和其他 GUI 程序之类的程序,启动时间至关重要。NetBeans 的启动可能需要一分钟或更长时间。当启动诸如 NetBeans 之类的程序时,会编译数百个类。在这种情况下,C1 编译器是最佳选择。

For programs such as IDEs (NetBeans, Eclipse) and other GUI programs, the startup time is critical. NetBeans might take a minute or longer to start. Hundreds of classes are compiled when programs such as NetBeans are started. In such cases, the C1 compiler is the best choice.

请注意,C1 有两个版本 - 32b 和 64b。C2 只有 64b 版本。

Note that there are two versions of C1 − 32b and 64b. C2 comes only in 64b.

Examples of JIT Compiler Optimizations

以下示例展示了 JIT 编译器的优化:

Following examples showcases JIT Compiler Optimizations:

Example of JIT optimization in case of objects

我们考虑以下代码:

Let us consider the following code −

for(int i = 0 ; i <= 100; i++) {
   System.out.println(obj1.equals(obj2)); //two objects
}

如果解释此代码,解释器将推断 for each 迭代 obj1 的类。这是因为 Java 中的每个类都具有一个 .equals() 方法,该方法从 Object 类扩展而来,并且可以被覆盖。因此,即使 obj1 在每次迭代时都是字符串,仍会进行推断。

If this code is interpreted, the interpreter would deduce for each iteration that classes of obj1. This is because each class in Java has an .equals() method, that is extended from the Object class and can be overridden. So even if obj1 is a string for each iteration, the deduction will still be done.

另一方面,实际上发生的是,JVM 会注意到对于每次迭代,obj1 属于 String 类,因此,它会直接生成对应于 .equals() method of the String class 的代码。因此,无需查找,编译代码将执行得更快。

On the other hand, what would actually happen is that the JVM would notice that for each iteration, obj1 is of class String and hence, it would generate code corresponding to the .equals() method of the String class directly. Thus, no lookups will be required, and the compiled code would execute faster.

仅当 JVM 知道代码如何执行时,才有可能发生此类行为。因此,在编译代码的某些部分之前,它会等待。

This kind of behavior is only possible when the JVM knows how the code behaves. Thus, it waits before compiling certain sections of the code.

Example of JIT optimization in case of primitive values

下面是另一个示例:

Below is another example −

int sum = 7;
for(int i = 0 ; i <= 100; i++) {
   sum += i;
}

对于每个循环,解释器都会从内存中获取“sum”的值,将其加到“i”,然后将其存储回内存。内存访问是昂贵的操作,通常需要多个 CPU周期。由于此代码运行多次,因此它是热点。JIT 将编译此代码并进行以下优化。

An interpreter, for each loop, fetches the value of 'sum' from the memory, adds 'i' to it, and stores it back into memory. Memory access is an expensive operation and typically takes multiple CPU cycles. Since this code runs multiple times, it is a HotSpot. The JIT will compile this code and make the following optimization.

“sum”的本地副本将存储在一个特定于特定线程的寄存器中。对寄存器中的值执行所有操作,当循环完成时,该值将被写回内存。

A local copy of 'sum' would be stored in a register, specific to a particular thread. All the operations would be done to the value in the register and when the loop completes, the value would be written back to the memory.

如果其他线程也在访问变量怎么办?由于某些其他线程正在更新局部变量副本,因此他们将看到旧值。在这种情况下需要线程同步。一个非常基本的同步原语是将“sum”声明为 volatile。现在,在访问变量之前,线程将刷新其本地寄存器并从内存中获取值。访问它后,该值会立即被写入内存。

What if other threads are accessing the variable as well? Since updates are being done to a local copy of the variable by some other thread, they would see a stale value. Thread synchronization is needed in such cases. A very basic sync primitive would be to declare 'sum' as volatile. Now, before accessing a variable, a thread would flush its local registers and fetch the value from the memory. After accessing it, the value is immediately written to the memory.

Optimizations Done by Just-In-Time (JIT) Compiler

以下是 JIT 编译器所做的一些通用优化:

Below are some general optimizations that are done by the JIT compilers −

  1. Method inlining

  2. Dead code elimination

  3. Heuristics for optimizing call sites

  4. Constant folding