Java 简明教程
Java - Unicode System
Unicode 是一种国际字符集,包含全球多种语言的各种字符、符号和脚本。
Unicode is an international character set that encompasses a vast range of characters, symbols, and scripts from many languages across the globe.
Unicode System in Java
Java 编程语言作为平台独立的语言,内置了对 Unicode 字符的支持,允许开发人员创建可以无缝地使用不同语言和脚本的应用程序。
Java programming language, being platform-independent, has built-in support for Unicode characters, allowing developers to create applications that can work seamlessly with diverse languages and scripts.
在 Unicode 之前,有多个标准来表示字符编码 −
Before Unicode, there were multiple standards to represent character encoding −
-
ASCII − for the United States.
-
ISO 8859-1 − for Western European Language.
-
KOI-8 − for Russian.
-
*GB18030 and BIG-5 * − for Chinese.
所以为了支持跨国应用程序代码,一些字符使用单字节,另一些使用双字节。甚至相同的代码在一种语言中可能表示不同的字符,而在另一种语言中可能表示其他字符。
So to support multinational application codes, some character was using single byte, some two. An even same code may represent a different character in one language and may represent other characters in another language.
为了克服上述缺点,开发了 Unicode 系统,其中每个字符由 2 个字节表示。由于 Java 是为多语言而开发的,因此它采用了 Unicode 系统。最低值由 \u0000 表示,最高值由 \uFFFF 表示。
To overcome above shortcoming, the Unicode system was developed where each character is represented by 2 bytes. As Java was developed for multilingual languages it adopted the Unicode system. The lowest value is represented by \u0000 and the highest value is represented by \uFFFF.
Approaches: Working with Unicode Characters & Values
有两种处理 Java 中 Unicode 字符的方法:使用 Unicode 转义序列和直接存储 Unicode 字符。
There are two approaches for working with Unicode characters in Java: Using Unicode Escape Sequences and Directly Storing Unicode Characters.
第一种方法包括使用转义序列表示 Unicode 字符,当无法直接在 Java 代码中键入或显示字符时,这种方法很有用。第二种方法包括直接在 variables 中存储 Unicode 字符,当字符可以直接键入或显示时,这种方法更加方便。
The first approach involves representing Unicode characters using escape sequences and is useful when the characters cannot be directly typed or displayed in the Java code. The second approach involves directly storing Unicode characters in variables and is more convenient when the characters can be directly typed or displayed.
方法的选择取决于程序的特定要求。但是总的来说,当字符可以直接键入或显示时,方法二更简单、更方便,而当字符不能直接键入或显示时,则需要使用方法一。
The choice of approach depends on the specific requirements of the program. However, in general, Approach 2 is simpler and more convenient when the characters can be directly typed or displayed, while Approach 1 is necessary when they cannot.
1. Using Unicode Escape Sequences
在 Java 中存储 Unicode 字符的一种方法是使用 Unicode 转义序列。转义序列是一系列表示特殊字符的字符。在 Java 中,Unicode 转义序列以字符 “\u” 开头,后面跟着四个十六进制数字,它们表示所需字符的 Unicode 代码点。
One way to store Unicode characters in Java is by using Unicode escape sequences. An escape sequence is a series of characters that represent a special character. In Java, a Unicode escape sequence starts with the characters '\u' followed by four hexadecimal digits that represent the Unicode code point of the desired character.
package com.tutorialspoint;
public class UnicodeCharacterDemo {
public static void main (String[]args) {
//Unicode escape sequence
char unicodeChar = '\u0041';
// point for 'A'
System.out.println("Stored Unicode Character: " + unicodeChar);
}
}
编译并运行上面的程序。将产生以下结果 −
Compile and run above program. This will produce the following result −
Output
Stored Unicode Character: A
在上面的代码片段中,Unicode 转义序列 “\u0041” 表示字符 “A”。将转义序列分配给 char 变量 unicodeChar,然后将存储的字符打印到控制台中。
In the above code snippet, the Unicode escape sequence '\u0041' represents the character 'A.' The escape sequence is assigned to the char variable unicodeChar, and the stored character is then printed to the console.
2. Storing Unicode Values Directly
或者,您可以通过用单引号括住字符,直接在 char 变量中存储 Unicode 字符。但是,对于无法使用键盘直接键入或不可见的字符(如控制字符),此方法可能不可行。
Alternatively, you can directly store a Unicode character in a char variable by enclosing the character in single quotes. However, this approach may not be feasible for characters that cannot be typed directly using a keyboard or are not visible, such as control characters.
package com.tutorialspoint;
public class UnicodeCharacterDemo {
public static void main(String[] args) {
// Storing Unicode character directly
char unicodeChar = 'A';
// Directly storing the character 'A'
System.out.println("Stored Unicode Character: " + unicodeChar);
}
}
编译并运行上面的程序。将产生以下结果 −
Compile and run above program. This will produce the following result −
Output
Stored Unicode Character: A
在此示例中,字符 “A” 直接用单引号括起来,并分配给 char 变量 unicodeChar。然后将存储的字符打印到控制台中。
In this example, the character 'A' is directly enclosed in single quotes and assigned to the char variable unicodeChar. The stored character is then printed to the console.
package com.tutorialspoint;
public class UnicodeCharacterDemo {
public static void main(String[] args) {
// Storing Unicode characters using escape sequences
char letterA = '\u0041';
char letterSigma = '\u03A3';
char copyrightSymbol = '\u00A9';
// Storing Unicode characters directly
char letterZ = 'Z';
char letterOmega = 'Ω';
char registeredSymbol = '®';
// Printing the stored Unicode characters
System.out.println("Stored Unicode Characters using Escape Sequences:");
System.out.println("Letter A: " + letterA);
System.out.println("Greek Capital Letter Sigma: " + letterSigma);
System.out.println("Copyright Symbol: " + copyrightSymbol);
System.out.println("\nStored Unicode Characters Directly:");
System.out.println("Letter Z: " + letterZ);
System.out.println("Greek Capital Letter Omega: " + letterOmega);
System.out.println("Registered Symbol: " + registeredSymbol);
}
}
编译并运行上面的程序。将产生以下结果 −
Compile and run above program. This will produce the following result −
Output
Stored Unicode Characters using Escape Sequences:
Letter A: A
Greek Capital Letter Sigma: Σ
Copyright Symbol: ©
Stored Unicode Characters Directly:
Letter Z: Z
Greek Capital Letter Omega: Ω
Registered Symbol: ®
Example 3: Assigning Unicode Characters and Values to Variables
此示例演示了如何操作已存储的 Unicode 字符。它计算大写字母 “A” 与小写字母 “a” 之间的差值,并使用该差值计算出大写字母 “C”。然后,它通过向大写字母 “C” 的 Unicode 代码点添加 32 来计算小写字母 “c”。将已操作的 Unicode 字符打印到控制台中。
This example demonstrates how to manipulate the stored Unicode characters. It calculates the difference between the capital letter 'A' and the small letter 'a' and uses that difference to calculate the capital letter 'C.' It then calculates the small letter 'c' by adding 32 to the Unicode code point of the capital letter 'C.' The manipulated Unicode characters are printed to the console.
package com.tutorialspoint;
public class UnicodeCharacterDemo {
public static void main(String[] args) {
// Storing Unicode characters using escape sequences
char letterA = '\u0041';
char letterSmallA = '\u0061';
// Storing Unicode characters directly
char letterB = 'B';
// Manipulating the stored Unicode characters
int difference = letterA - letterSmallA;
char letterC = (char) (letterB + difference);
char letterSmallC = (char) (letterC + 32);
// Printing the manipulated Unicode characters
System.out.println("Manipulated Unicode Characters:");
System.out.println("Difference between A and a: " + difference);
System.out.println("Calculated Letter C: " + letterC);
System.out.println("Calculated Letter c: " + letterSmallC);
}
}
编译并运行上面的程序。将产生以下结果 −
Compile and run above program. This will produce the following result −
Output
Manipulated Unicode Characters:
Difference between A and a: -32
Calculated Letter C: "
Calculated Letter c: B
Conclusion
在 Java 中,您可以使用字符字面量存储 Unicode 字符,方法是使用 Unicode 转义序列或直接将字符括在单引号中。两种方法都有其优点和局限性。转义序列提供了一种一致的方式来表示源代码中的任何 Unicode 字符,而直接存储字符在处理可以轻松键入或显示的字符时更方便。
In Java, you can store Unicode characters using character literals by employing either Unicode escape sequences or directly enclosing the characters in single quotes. Both approaches have their advantages and limitations. Escape sequences provide a consistent way to represent any Unicode character in the source code, while directly storing characters is more convenient when dealing with characters that can be easily typed or displayed.
本文提供了一种在 Java 中存储 Unicode 字符的算法,讨论了存储这些字符的两种不同方法,并为每种方法演示了实际示例。理解这些技术将帮助开发人员创建可以无缝地使用不同语言和脚本的应用程序,利用 Unicode 在 Java 编程中的强大功能。
This article has provided an algorithm to store Unicode characters in Java, discussed two different approaches for storing these characters, and demonstrated working examples for each approach. Understanding these techniques will help developers create applications that can work seamlessly with diverse languages and scripts, leveraging the power of Unicode in Java programming.