Cprogramming 简明教程

Compilation Process in C

C 语言是一种编译语言。与解释型语言相比，编译语言提供更快的执行性能。不同的编译器产品可用于编译 C 程序。它们有 GCC、Clang、MSVC 等。在本章中，我们将解释在使用 GCC 编译器编译 C 程序时后台发生的情况。

C is a compiled language. Compiled languages provide faster execution performance as compared to interpreted languages. Different compiler products may be used to compile a C program. They are GCC, Clang, MSVC, etc. In this chapter, we will explain what goes in the background when you compile a C program using GCC compiler.

Compiling a C Program

由 1 和 0 位组成的二进制指令序列称为 machine code 。C、C++、Java 等高级编程语言包含更接近于英语等人类语言的关键字。因此，用 C（或任何其他高级语言）编写的程序需要转换为等效的机器代码。此过程称为 compilation 。

A sequence of binary instructions consisting of 1 and 0 bits is called as machine code. High-level programming languages such as C, C++, Java, etc. consist of keywords that are closer to human languages such as English. Hence, a program written in C (or any other high-level language) needs to be converted to its equivalent machine code. This process is called compilation.

请注意，机器代码特定于硬件架构和操作系统。换句话说，在使用 Windows 操作系统计算机上编译的某个 C 程序的机器代码将与使用 Linux 操作系统计算机上的另一台计算机不兼容。因此，我们必须使用适合目标操作系统的编译器。

Note that the machine code is specific to the hardware architecture and the operating system. In other words, the machine code of a certain C program compiled on a computer with Windows OS will not be compatible with another computer using Linux OS. Hence, we must use the compiler suitable for the target OS.

C Compilation Process Steps

在本次教程中，我们将使用 gcc（代表 GNU 编译器集合）。GNU 项目是理查德·斯托曼的一个自由软件项目，开发者可以免费使用该项目中的强大工具。

In this tutorial, we will be using the gcc (which stands for GNU Compiler Collection). The GNU project is a free-software project by Richard Stallman that allows developers to have access to powerful tools for free.

gcc 编译器支持多种编程语言，包括 C 语言。为了使用它，我们应该安装与目标计算机兼容的版本。

The gcc compiler supports various programming languages, including C. In order to use it, we should install its version compatible with the target computer.

编译过程分为四步——

The compilation process has four different steps −

Preprocessing
Compiling
Assembling
Linking

下图展示了编译过程。

The following diagram illustrates the compilation process.

Example

为了理解这一过程，我们考虑用 C 语言编写的以下源代码：

To understand this process, let us consider the following source code in C languge (main.c) −

#include <stdio.h>

int main(){

   /* my first program in C */

   printf("Hello World! \n");

   return 0;
}

运行代码并检查其输出：

Run the code and check its output −

Hello World!

“.c” 是一个文件扩展名，通常表示文件是用 C 编写的。第一行是预处理器指令，它告诉编译器包含头文件。和之间的文本是注释，这些注释用于文档目的。

The ".c" is a file extension that usually means the file is written in C. The first line is the preprocessor directive #include that tells the compiler to include the stdio.h header file. The text inside / and / are comments and these are useful for documentation purpose.

程序的入口点是。这意味着程序将从执行此函数块内的语句开始。这里，在给定的程序代码中，只有两个语句：一条将输出句子“Hello World”到终端，另一条语句告诉程序，如果退出或结束正确地，则“返回 0”。因此，一旦我们编译它，如果我们运行这个程序，我们只会看到“Hello World”这个短语出现。

The entry point of the program is the main() function. It means the program will start by executing the statements that are inside this function’s block. Here, in the given program code, there are only two statements: one that will print the sentence "Hello World" on the terminal, and another statement that tells the program to "return 0" if it exited or ended correctly. So, once we compiled it, if we run this program we will only see the phrase "Hello World" appearing.

What Goes Inside the C Compilation Process?

为了使我们的“main.c”代码可执行，我们需要输入命令“gcc main.c”，编译过程将经历它包含的所有四步。

In order for our "main.c" code to be executable, we need to enter the command "gcc main.c", and the compiling process will go through all of the four steps it contains.

Step 1: Preprocessing

预处理器执行以下操作——

The preprocessor performs the following actions −

It removes all the comments in the source file(s).
It includes the code of the header file(s), which is a file with extension .h which contains C function declarations and macro definitions.
It replaces all of the macros (fragments of code which have been given a name) by their values.

这步的输出将存储在一个扩展名为“@”的文件中，因此这里它将在“@”中。

The output of this step will be stored in a file with a ".i" extension, so here it will be in "main.i".

为了在此步骤后立即停止编译，我们可以在源文件中使用选项“@”与 gcc 命令，然后按 Enter。

In order to stop the compilation right after this step, we can use the option "-E" with the gcc command on the source file, and press Enter.

gcc -E main.c

Step 2: Compiling

编译器从预处理过的文件中生成 IR 代码（中间表示），因此这将产生一个“.s”文件。话虽如此，其他编译器可能会在编译的这一步生成汇编代码。

The compiler generates the IR code (Intermediate Representation) from the preprocessed file, so this will produce a ".s" file. That being said, other compilers might produce assembly code at this step of compilation.

我们可以通过 gcc 命令上的“@”选项在此步骤后停止，然后按 Enter。

We can stop after this step with the "-S" option on the gcc command, and press Enter.

gcc -S main.c

@ 文件应该是这样的——

This is what the main.s file should look like −

.file	"helloworld.c"
   .text
   .def	__main;	.scl	2;	.type	32;	.endef
   .section .rdata,"dr"
.LC0:
   .ascii "Hello, World! \0"
   .text
   .globl	main
   .def	main;	.scl	2;	.type	32;	.endef
   .seh_proc	main
main:
   pushq	%rbp
   .seh_pushreg	%rbp
   movq	%rsp, %rbp
   .seh_setframe	%rbp, 0
   subq	$32, %rsp
   .seh_stackalloc	32
   .seh_endprologue
   call	__main
   leaq	.LC0(%rip), %rcx
   call	puts
   movl	$0, %eax
   addq	$32, %rsp
   popq	%rbp
   ret
   .seh_endproc
   .ident	"GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
   .def	puts;	.scl	2;	.type	32;	.endef

Step 3: Assembling

汇编器获取 IR 代码并将它转换成目标代码，即机器语言（二进制）代码。这将生成一个以“.o”结尾的文件。

The assembler takes the IR code and transforms it into object code, that is code in machine language (i.e. binary). This will produce a file ending in ".o".

我们可以在此步骤后通过使用带 gcc 命令的选项“-c”停止编译过程，然后按 Enter。

We can stop the compilation process after this step by using the option "-c" with the gcc command, and pressing Enter.

请注意，“main.o”文件不是文本文件，因此当你使用文本编辑器打开此文件时，其内容不可读。

Note that the "main.o" file is not a text file, hence its contents won’t be readable when you open this file with a text editor.

Step 4: Linking

链接器创建最终可执行文件（二进制）。它将所有源文件的目标代码链接在一起。链接器知道在哪里查找@ 或 @ 中的函数定义。

The linker creates the final executable, in binary. It links object codes of all the source files together. The linker knows where to look for the function definitions in the static libraries or the dynamic libraries.

静态库是编译器将所有使用的库函数复制到可执行文件的产物。动态库中的代码并没有全部复制，只有库的名称被放置在二进制文件中。

Static libraries are the result of the linker making a copy of all the used library functions to the executable file. The code in dynamic libraries is not copied entirely, only the name of the library is placed in the binary file.

默认情况下，在这第四步也是最后一步后，也就是您在没有任何选项的情况下键入整个“ gcc main.c ”命令时，编译器将创建一个名为 main.out 的可执行程序（Windows 系统中的名称为 main.exe ），我们可以从命令行运行该程序。

By default, after this fourth and last step, that is when you type the whole "gcc main.c" command without any options, the compiler will create an executable program called main.out (or main.exe in case of Windows) that we can run from the command line.

我们还可以选择通过在 gcc 命令后添加“ -o ”选项（置于编译文件或多个文件的名称之后），来创建一个具有我们想要的名称的可执行程序。

We can also choose to create an executable program with the name we want, by adding the "-o" option to the gcc command, placed after the name of the file or files we are compiling.

gcc main.c -o hello.out

所以现在，如果您没有使用 "-o" 选项，我们可以键入“ ./hello.out ”；如果您使用了该选项，我们可以键入“ ./hello ”来执行编译的代码。输出将显示“ Hello World ”，然后将再次显示 shell 提示符。

So now we could either type "./hello.out" if you didn’t use the "-o" option or "./hello" to execute the compiled code. The output will be "Hello World" and following it, the shell prompt will appear again.