Mysqli 简明教程

MySQLi - Handling Duplicates

表或结果集有时会包含重复记录。有时允许这样做,但有时需要停止重复记录。有时,需要识别重复记录并将其从表中删除。本章将介绍如何防止重复记录出现在表中,以及如何删除已存在的重复记录。

Tables or result sets sometimes contain duplicate records. Sometimes, it is allowed but sometimes it is required to stop duplicate records. Sometimes, it is required to identify duplicate records and remove them from the table. This chapter will describe how to prevent duplicate records occurring in a table and how to remove already existing duplicate records.

Preventing Duplicates from Occurring in a Table

你可以在具有适当字段的表上使用 PRIMARY KEYUNIQUE 索引来停止重复记录。我们举一个例子:下表没有这样的索引或主键,因此它将允许 first_name 和 last_name 出现重复记录。

You can use a PRIMARY KEY or UNIQUE Index on a table with appropriate fields to stop duplicate records. Let’s take one example: The following table contains no such index or primary key, so it would allow duplicate records for first_name and last_name.

CREATE TABLE person_tbl (
   first_name CHAR(20),
   last_name CHAR(20),
   sex CHAR(10)
);

若要防止在此表中创建具有相同 first 和 last 名称值的多个记录,请在其定义中添加 PRIMARY KEY。在你这样做的时候,同时有必要声明索引列为 NOT NULL,因为 PRIMARY KEY 不允许 NULL 值 −

To prevent multiple records with the same first and last name values from being created in this table, add a PRIMARY KEY to its definition. When you do this, it’s also necessary to declare the indexed columns to be NOT NULL, because a PRIMARY KEY does not allow NULL values −

CREATE TABLE person_tbl (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

表中存在唯一索引通常会导致出现错误,如果你向表中插入一条记录,该记录会使索引所定义的列或列中的现有记录重复。

The presence of a unique index in a table normally causes an error to occur if you insert a record into the table that duplicates an existing record in the column or columns that define the index.

使用 INSERT IGNORE 而不是 INSERT 。如果记录不重复现有记录,MySQL 将照常插入它。如果记录是重复的,IGNORE 关键字告诉 MySQL 在不生成错误的情况下默默地丢弃它。

Use INSERT IGNORE rather than INSERT. If a record doesn’t duplicate an existing record, MySQL inserts it as usual. If the record is a duplicate, the IGNORE keyword tells MySQL to discard it silently without generating an error.

下面的例子不会出错,同时也不会插入重复的记录。

Following example does not error out and same time it will not insert duplicate records.

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
   → VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
   → VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)

使用 REPLACE 而不是 INSERT。如果记录是新的,则会像 INSERT 一样插入它。如果它是重复的,则新记录将替换旧记录 −

Use REPLACE rather than INSERT. If the record is new, it’s inserted just as with INSERT. If it’s a duplicate, the new record replaces the old one −

mysql> REPLACE INTO person_tbl (last_name, first_name)
   → VALUES( 'Ajay', 'Kumar');
Query OK, 1 row affected (0.00 sec)

mysql> REPLACE INTO person_tbl (last_name, first_name)
   → VALUES( 'Ajay', 'Kumar');
Query OK, 2 rows affected (0.00 sec)

应根据你希望施加的重复处理行为选择 INSERT IGNORE 和 REPLACE。INSERT IGNORE 会保留一组重复记录中的第一个,并丢弃其余记录。REPLACE 会保留一组重复记录中的最后一个,并删除任何先前的记录。

INSERT IGNORE and REPLACE should be chosen according to the duplicate-handling behavior you want to effect. INSERT IGNORE keeps the first of a set of duplicated records and discards the rest. REPLACE keeps the last of a set of duplicates and erase out any earlier ones.

强制唯一性的另一种方法是向表中添加 UNIQUE 索引而不是 PRIMARY KEY。

Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table.

CREATE TABLE person_tbl (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10)
   UNIQUE (last_name, first_name)
);

Counting and Identifying Duplicates

以下是计算表中带有 first_name 和 last_name 的重复记录的查询。

Following is the query to count duplicate records with first_name and last_name in a table.

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
   → FROM person_tbl
   → GROUP BY last_name, first_name
   → HAVING repetitions > 1;

此查询将返回 person_tbl 表中所有重复记录的列表。一般来说,要识别重复值集,请执行以下操作 −

This query will return a list of all the duplicate records in person_tbl table. In general, to identify sets of values that are duplicated, do the following −

  1. Determine which columns contain the values that may be duplicated.

  2. List those columns in the column selection list, along with COUNT(*).

  3. List the columns in the GROUP BY clause as well.

  4. Add a HAVING clause that eliminates unique values by requiring group counts to be greater than one.

Eliminating Duplicates from a Query Result:

你可以将 DISTINCT 与 SELECT 语句一起使用,以查找表中可用的唯一记录。

You can use DISTINCT along with SELECT statement to find out unique records available in a table.

mysql> SELECT DISTINCT last_name, first_name
   → FROM person_tbl
   → ORDER BY last_name;

与 DISTINCT 对应的替代方法是添加一个 GROUP BY 子句,该子句命名你正在选择的列。这将起到删除重复项并仅选择指定列中值唯一组合的效果 −

An alternative to DISTINCT is to add a GROUP BY clause that names the columns you’re selecting. This has the effect of removing duplicates and selecting only the unique combinations of values in the specified columns −

mysql> SELECT last_name, first_name
   → FROM person_tbl
   → GROUP BY (last_name, first_name);

Removing Duplicates Using Table Replacement

如果你在表中复制记录并且希望从该表中删除所有重复记录,那么这里有一个过程 -

If you have duplicate records in a table and you want to remove all the duplicate records from that table, then here is the procedure −

mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
   → FROM person_tbl;
   → GROUP BY (last_name, first_name);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;

从表中删除重复记录的一种简单方法是向该表添加 INDEX 或 PRIMAY KEY。即使此表已经可用,你也可以使用此技术来删除重复记录,并且将来你也将安全。

An easy way of removing duplicate records from a table is to add an INDEX or PRIMAY KEY to that table. Even if this table is already available, you can use this technique to remove duplicate records and you will be safe in future as well.

mysql> ALTER IGNORE TABLE person_tbl
   → ADD PRIMARY KEY (last_name, first_name);