Mysql 简明教程

MySQL - Collation

MySQL Collation 是用于确定如何比较和排序字符集的各种字符的一组规则。MySQL 支持多种字符集,包括 ASCII、Unicode 规范、二进制等。

MySQL Collation is a set of rules used to decide how to compare and sort various characters of a character set. MySQL supports multiple character sets including ASCII, Unicode System, Binary, etc.

这些字符集中的每个字符都受到权重的限制。MySQL 校对根据其各自的权重对字符进行排序。例如,在比较字符集中两个字符时,如果一个字符的权重大于另一个字符,则该字符较大;反之,如果两个字符具有相等的权重,则它们相等。

Every character of these character sets is subjected to a weight. A MySQL collation orders the characters based on their respective weights. For instance, when comparing two characters in a character set, if one character holds heavier weight than the other, it is greater; and vice-versa. If both characters have equal weights, they are equal.

每个字符集都必须至少有一个校对(或更多),并且没有两个字符集可以具有相同的校对。

Each character set must have at least one collation (or more) and no two character sets can have the same collation.

Implementing MySQL Collations

MySQL 实施了各种类型的校对,用于比较字符串 −

MySQL implements various types of collations in order to compare character strings −

  1. Simple Collations for 8-bit Character Sets

  2. Complex Collations for 8-bit Character Sets

  3. Collations for Non-Unicode Multibyte Character Sets

  4. Collations for Unicode Multibyte Character Sets

  5. Miscellaneous Collations

每个字符集都具有内置的二进制校对,因此不需要重新定义它们。在任何情况下都不应修改这些内置校对,以免产生意外的服务器行为。

Every character set has a built-in binary collation, so they need not be redefined. Built-in collations like these must not be modified in any case to avoid unexpected server behaviour.

Simple Collations for 8-bit Character Sets

  1. As the 8-bit character sets can only hold up to 256 characters, this type of collation is implemented by using a weights array of length 256.

  2. Each character in the character set is one-to-one mapped to the weights.

  3. It is a case-insensitive collation, so the uppercase and lowercase of same character hold the same weight.

Complex Collations for 8-bit Character Sets

  1. For complex 8-bit character sets, collations are implemented by defining the order of characters using functions.

  2. Here, we create a C source file that specifies the character set properties and defines the necessary support routines to perform operations on that character set properly.

Collations for Non-Unicode Multibyte Character Sets

  1. Unlike single-byte (8-bit) characters, there are two types of relationships between codes and weights of multi-byte characters.

  2. Weight of a character is equal to its code.

  3. Character codes are mapped one-to-one with weights, where weights are not necessarily equal to codes.

Collations for Unicode Multibyte Character Sets

一些整理规则基于 Unicode 整理算法 (UCA)。它们具有以下属性 -

Some collations are based on the Unicode Collation Algorithm (UCA). They hold the following properties −

  1. If a character has weight, each weight uses 2 bytes.

  2. If a character has no weight, then the character is ignorable.

  3. A single character can have many weights. This is called Expansion. For example, the German letter (SHARP S) has a weight of 0x0FEA0FEA.

  4. Multiple characters together can have only one weight. This is called Contraction. For example, 'ch' is a single letter in Czech and has a weight of 0x0EE2.

Miscellaneous Collations

  1. Collations that do not fall into any previous categories are termed as Miscellaneous Collations.

Set Character Set and Collation

MySQL 允许我们在三个不同的级别设置字符集和整理规则。以下内容对此进行了描述:

MySQL allows us to set the character sets and collations at three different levels. The same is described below:

  1. At Server level

  2. At Database level

  3. At Table level

At Server Level

在 MySQL 中,字符集 latin1 将用作默认字符集。因此,默认整理规则将为 latin1_swedish_ci 。MySQL 允许我们在服务器启动级别更改这些默认设置。

In MySQL, the character set latin1 will be used as the default character set. So, the default collation will be latin1_swedish_ci. MySQL allows us to change these default settings at the server startup level.

在启动 MySQL 服务器时,如果我们指定一个字符集,则它将使用该集的默认整理规则。但是,如果我们明确地指定一个字符集和整理规则,则 MySQL 将在为进一步创建的所有数据库中使用该组合。

When starting up a MySQL server, if we specify a character set, it will use the default collation of that set. But if we explicitly specify both a character set and collation, MySQL will use that combination for all databases created further.

Example

在以下查询中,我们将服务器的字符集设置为 utf8,整理规则设置为 utf8_unicode_cs。

In the following query, we will set the character set as utf8 and the collation as utf8_unicode_cs for the sever.

mysqld --character-set-server=utf8 --collation-server=utf8_unicode_cs

如果 --collation-server 设置为用户定义的整理规则名称,则会发出警告。

A warning is issued if --collation-server is set to a user-defined collation name.

At Database Level

当我们创建一个数据库,并且如果没有提供任何字符集和整理规则时,该数据库将使用服务器的默认字符集和整理规则。

When we create a database and if we do not provide any character set and collation, the database will use the default character set and collation of the server.

我们可以使用 CREATE DATABASE 语句在数据库级别覆盖默认字符集和整理规则。

We can override the default character set and collation at the database level using the CREATE DATABASE statement.

如果我们希望对现有数据库覆盖默认设置,则可以使用 ALTER DATABASE 语句。

If we want to override default settings for existing database, we can use the ALTER DATABASE statement.

Syntax

以下是数据库级别覆盖默认设置的基本语法:

Following is the basic syntax to override the default settings at database level −

[CREATE | ALTER] DATABASE database_name
CHARACTER SET character_set_name
COLLATE collation_name;

Example

这里,我们正在使用以下查询创建数据库,并将字符集指定为 utf8,并将校对指定为 utf8_unicode_ci:

Here, we are creating a database and specifying the character set as utf8 and collation as utf8_unicode_ci using the following query −

CREATE DATABASE testdb
CHARACTER SET utf8
COLLATE utf8_unicode_ci;

At Table Level

在 MySQL 中,数据库可能包含与数据库的字符集和校对不同的字符集和校对的表。

In MySQL, a database may contain tables with different characters sets and collations than the database’s character set and collation.

我们可以使用 CREATE TABLE 语句在创建表时指定默认字符集和校对。

We can specify the default character set and collation at the while creating the table using the CREATE TABLE statement.

如果要覆盖现有表的默认设置,可以使用 ALTER TABLE 语句。

If we want to override default settings for existing table, we can use the ALTER TABLE statement.

Syntax

以下是使用 CREATE TABLE 语句为表指定默认字符集和校对的语法:

Following is the syntax for specifying default character set and collation for a table using the CREATE TABLE statement −

[CREATE | ALTER] TABLE table_name
column_name datatype (length)
CHARACTER SET character_set_name
COLLATE collation_name

Example

在以下查询中,我们创建了一个不带任何字符集和校对的表。因此,它使用了数据库的字符集和校对。

In the following query, we are creating a table without any character set and collation. So, it uses the database’s character set and collation.

CREATE TABLE CUSTOMERS(
   ID VARCHAR(45),
   NAME VARCHAR(45),
   AGE INT
);

现在,我们使用 ALTER TABLE 语句将字符集修改为“latin1”,并将校对修改为“latin_german_ci”。

Now, we are using the ALTER TABLE statement to modify the character set as 'latin1' and collation as 'latin_german_ci'.

ALTER TABLE CUSTOMERS
CHARACTER SET latin1
COLLATE latin1_german1_ci;

Displaying Default Collations

我们可以使用 SHOW CHARACTER SET 查询显示 MySQL 数据库服务器中所有字符集的默认校对。

We can display all the default collations of character sets in MySQL database server using the SHOW CHARACTER SET query.

SHOW CHARACTER SET;

每个字符集的校对字符串都以字符集名称开头,以 _ci (不区分大小写)、_cs (区分大小写)或 _bin (二进制)结尾。

A collation string for every character set starts with the character set name and ends with _ci (case insensitive), _cs(case sensitive) or _bin(binary).

The MySQL LIKE Clause

在 MySQL 中,使用 LIKE 子句与 SHOW COLLATION 语句,我们可以指定模式来获取与给定模式匹配的校对名称和其他信息。

In MySQL, using the LIKE clause with the SHOW COLLATION statement, we can specify a pattern to fetch the names and other information of the collations that match the given pattern.

SHOW COLLATION LIKE 'greek%';

Output

上面的查询返回名称中包含 greek 的所有校对。

The above query returns all the collations with the name greek in it.

The MySQL WHERE Clause

我们可以将 WHERE 子句与 SHOW COLLATION 语句一起使用,以检索与指定条件匹配的校对名称。

We can use the WHERE clause with the SHOW COLLATION statement to retrieve collation names that match the specified condition.

SHOW COLLATION WHERE Charset = 'cp1251';

Output

上面的查询返回字符集 ID 等于“cp1251”的所有校对。

The above query returns all the collations where the charset id equal to 'cp1251'.