Cprogramming 简明教程

Tokens in C

token 是指计算机语言(如 C)的源代码中的最小单元。术语令牌借自语言学理论。就像语言(如英语)中的某一段文本包含单词(字母集合)、数字和标点符号一样。编译器将 C 程序分解为 tokens ,然后继续进行编译过程中使用的后续阶段。

A token is referred to as the smallest unit in the source code of a computer language such as C. The term token is borrowed from the theory of linguistics. Just as a certain piece of text in a language (like English) comprises words (collection of alphabets), digits, and punctuation symbols. A compiler breaks a C program into tokens and then proceeds ahead to the next stages used in the compilation process.

compilation process 中的第一阶段是标记生成器。标记生成器将源代码划分为各个标记,识别 token 类型,并一次将标记传递给编译器的下一阶段。

The first stage in the compilation process is a tokenizer. The tokenizer divides the source code into individual tokens, identifying the token type, and passing tokens one at a time to the next stage of the compiler.

解析器是编译中的下一阶段。它能够理解语言的语法。识别语法错误并将无错误程序转换为机器语言。

The parser is the next stage in the compilation. It is capable of understanding the language’s grammar. identifies syntax errors and translates an error-free program into the machine language.

C 源代码还包含不同类型的 tokens 。C 中的标记具有以下类型 −

A C source code also comprises tokens of different types. The tokens in C are of the following types −

  1. Character set

  2. Keyword tokens

  3. Literal tokens

  4. Identifier tokens

  5. Operator tokens

  6. Special symbol tokens

让我们讨论这些令牌类型中的每一个。

Let us discuss each of these token types.

C Character set

C 语言确定了一套字符集,其中包括英语字母集 – 大写和小写(A 到 Z 以及 a 到 z)、数字 0 到 9,以及某些具有特殊含义的符号。在 C 中,特定字符组合也具有附加的特殊含义。例如,\n 被称为换行符。此类组合称为转义序列。

The C language identifies a character set that comprises English alphabets – upper and lowercase (A to Z, as well as a to z), digits 0 to 9, and certain other symbols with a special meaning attached to them. In C, certain combinations of characters also have a special meaning attached to them. For example, \n is known as a newline character. Such combinations are called escape sequences.

以下是 C 语言的字符集 −

Here is the character set of C language −

  1. Uppercase: A to Z

  2. Lowercase: a to z

  3. Digits: 0 to 9

  4. Special characters: ! " # $ % & ' ( ) * + - . : , ; ` ~ = < > { } [ ] ^ _ \ /

双引号符号 " 和 " 之间任何此类字符的序列都用于表示字符串文本。数字用于表示数字文本。方括号用于定义数组。大括号用于标记代码块。反斜杠是转义字符。其他字符定义为运算符。

A sequence of any of these characters inside a pair of double quote symbols " and " are used to represent a string literal. Digits are used to represent numeric literal. Square brackets are used for defining an array. Curly brackets are used to mark code blocks. Back slash is an escape character. Other characters are defined as operators.

C Keywords

在 C 中,字母的预定义序列称为 keyword 。与人类语言相比,编程语言的关键字较少。起初,C 有 32 个关键字,随后在 C 标准的后续修订中又添加了几个关键字。所有关键字都为小写。每个关键字都有使用规则(在编程中称为语法)。

In C, a predefined sequence of alphabets is called a keyword. Compared to human languages, programming languages have fewer keywords. To start with, C had 32 keywords, later on, few more were added in subsequent revisions of C standards. All keywords are in lowercase. Each keyword has rules of usage (in programming it is called syntax) attached to it.

C 编译器检查是否已按照语法使用关键字,并将源代码转换为目标代码。

The C compiler checks whether a keyword has been used according to the syntax, and translates the source code into the object code.

C Literals

在计算机编程术语中,术语 literal 指的是文本表示形式,用于分配一个值给 variable ,直接硬编码在源代码中。

In computer programming terminology, the term literal refers to a textual representation of a value to be assigned to a variable, directly hard-coded in the source code.

数字文本包含数字、小数点符号和/或指数字符 E 或 e。

A numeric literal contains digits, a decimal symbol, and/or the exponentiation character E or e.

字符串文本由置于一对双引号符号中的任何字符序列组成。字符文本是单引号中的单个字符。

The string literal is made up of any sequence of characters put inside a pair of double quotation symbols. A character literal is a single character inside a single quote.

Arrays 还可以通过在方括号之间放置逗号分隔的文本序列来表示为文本形式。

Arrays can also be represented in literal form by putting a comma-separated sequence of literals between square brackets.

在 C 中, escape sequences 也是一种文本。两个或多个字符,第一个为反斜杠 \ 字符,置于单引号中形成转义序列。每个转义序列都有一个附加的预定义含义。

In C, escape sequences are also a type of literal. Two or more characters, the first being a backslash \ character, put inside a single quote form an escape sequence. Each escape sequence has a predefined meaning attached to it.

C Identifiers

与关键字相反, identifiers 是程序中的用户定义元素。您需要通过向各种程序元素提供相应名称来定义它们。例如,变量、 constant 、标签、用户定义类型、 function 等。

In contrast to the keywords, the identifiers are the user-defined elements in a program. You need to define various program elements by giving them an appropriate name. For example, variable, constant, label, user-defined type, function, etc.

在 C 中规定了形成标识符的某些规则。重要的限制之一是不能使用保留关键字作为标识符。例如, for 是 C 中的关键字,因此不能用作标识符,即变量、函数等的名称。

There are certain rules prescribed in C, to form an identifier. One of the important restrictions is that a reserved keyword cannot be used as an identifier. For example, for is a keyword in C, and hence it cannot be used as an identifier, i.e., name of a variable, function, etc.

C Operators

C 是一种计算语言。因此,C 程序包含执行算术和比较运算的表达式。C 的字符集中的特殊符号大多定义为 operators 。例如,众所周知的符号 +*/arithmetic operators in C 。同样, <> 用作 comparison operators

C is a computational language. Hence a C program consists of expressions that perform arithmetic and comparison operations. The special symbols from the character set of C are mostly defined as operators. For example, the well-known symbols, +, , * and / are the arithmetic operators in C. Similarly, < and > are used as comparison operators.

C Special symbols

除了定义为运算符的符号外,其他符号还包括标点符号,如逗号、分号和冒号。在 C 中,您会发现它们在不同的上下文中使用方式不同。

Apart from the symbols defined as operators, the other symbols include punctuation symbols like commas, semicolons, and colons. In C, you find them used differently in different contexts.

同样,括号 () 用于算术表达式以及函数定义。大括号用于标记函数的作用域、 conditionallooping statements 中的代码块等。

Similarly, the parentheses ( and ) are used in arithmetic expressions as well as in function definitions. The curly brackets are employed to mark the scope of functions, code blocks in conditional and looping statements, etc.