Mysql 简明教程

MySQL - Handling Duplicates

数据库中的表或结果集通常包含重复记录。虽然通常允许重复,但在某些情况下有必要防止重复。在这种情况下,识别并从数据库表中删除重复记录至关重要。

Tables or result sets in a database usually contain duplicate records. While duplicates are generally allowed, there are situations where it is necessary to prevent them. In such cases, it becomes essential to identify and remove duplicate records from a database table.

Importance of Handling MySQL Duplicates

处理数据库中的重复数据变得必要的原因有多种。主要原因之一是组织数据库中存在重复数据会导致逻辑错误。此外,我们需要处理冗余数据以防止以下后果 -

There are various reasons why handling duplicates in a database becomes necessary. One of the main reasons is that the existence of duplicates in an organizational database will lead to logical errors. In addition to it, we need to handle redundant data to prevent the following consequences −

  1. Duplicate data occupies storage space, reducing the efficiency of database usage and increasing storage costs.

  2. Dealing with duplicate records consumes additional resources, driving up the overall cost of maintaining the database.

  3. Duplicates in a database can lead to logical errors in data, affecting the integrity and reliability of the information stored.

Preventing Duplicate Entries

您可以在表上使用 PRIMARY KEYUNIQUE 索引(带有适当的字段)以防止重复记录条目进入表中。

You can use a PRIMARY KEY or a UNIQUE Index on a table with the appropriate fields to prevent duplicate record entries into a table.

Example

下表不包含此类索引或主键,因此它允许 first_namelast_name 的重复记录。

The following table contains no such index or primary key, so it would allow duplicate records for first_name and last_name.

CREATE TABLE CUSTOMERS (
   first_name CHAR(20),
   last_name CHAR(20),
   sex CHAR(10)
);

为了防止创建带有相同名字和姓氏值的多个记录,请向其定义中添加一个 PRIMARY KEY 。执行此操作时,还需要声明索引列为 NOT NULL ,因为 PRIMARY KEY 不允许 NULL 值 -

To prevent multiple records with the same first and last name values from being created in this table, add a PRIMARY KEY to its definition. When you do this, it is also necessary to declare the indexed columns to be NOT NULL, because a PRIMARY KEY does not allow NULL values −

CREATE TABLE CUSTOMERS (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

Using INSERT IGNORE Query −

Using INSERT IGNORE Query −

表中存在唯一索引通常会在尝试插入重复索引列中现有记录的记录时导致错误。

The existence of a unique index in a table normally causes an error when attempting to insert a record that duplicates an existing record in the indexed column(s).

为了在不生成错误的情况下处理这种情况,您可以使用“INSERT IGNORE”命令。当记录不是重复记录时,MySQL 会像往常一样插入记录。但是,如果记录是重复的,“IGNORE”关键字会指示 MySQL 在不产生错误的情况下丢弃它。

To handle this situation without generating an error, you can use the "INSERT IGNORE" command. When a record is not a duplicate, MySQL inserts it as usual. However, if the record is duplicate, the "IGNORE" keyword instructs MySQL to discard it without producing an error.

提供的示例不会导致错误,它还确保不插入重复记录 -

The provided example does not result in an error, and it also ensures that duplicate records are not inserted −

INSERT IGNORE INTO CUSTOMERS (LAST_NAME, FIRST_NAME)
VALUES ('Jay', 'Thomas'), ('Jay', 'Thomas');

我们获得以下输出 −

We get the following output −

Query OK, 1 row affected, 1 warning (0.01 sec)
Records: 2  Duplicates: 1  Warnings: 1

Using REPLACE Query −

Using REPLACE Query −

不要使用 INSERT 命令,而应使用 REPLACE 命令。在处理新记录时,像使用 INSERT 命令一样插入。但是,如果它是重复的,则此新记录替换旧记录。

Instead of using the INSERT command, consider using the REPLACE command. When dealing with a new record, it is inserted just as with INSERT. However, if it is a duplicate, the new record replaces the old one.

REPLACE INTO CUSTOMERS (LAST_NAME, FIRST_NAME)
VALUES ( 'Ajay', 'Kumar'), ( 'Ajay', 'Kumar');

以下是上面代码的输出: -

Following is the output of the above code −

Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

您在 INSERT IGNORE 和 REPLACE 命令之间做出的选择应取决于您希望实现的特定重复处理行为。INSERT IGNORE 命令保留第一组重复的记录,并丢弃剩下的记录。另一方面,REPLACE 命令会保留最后一组重复的记录,并删除所有之前的实例。

Your choice between the INSERT IGNORE and REPLACE commands should depend on the specific duplicate-handling behaviour you wish to achieve. The INSERT IGNORE command retains the first set of duplicated records and discards the remaining. On the other hand, the REPLACE command keeps the last set of duplicates and removes any earlier instances.

Using UNIQUE Constraint −

Using UNIQUE Constraint −

实施唯一性的另一种方法是向表中添加 UNIQUE 索引,而不是 PRIMARY KEY −

Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table −

CREATE TABLE CUSTOMERS (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   UNIQUE (last_name, first_name)
);

Counting and Identifying Duplicates

您可以使用 COUNT 函数和 GROUP BY 子句按特定列统计和识别重复的记录。

You can use the COUNT function and GROUP BY clause to count and identify duplicate records based on specific columns.

Example

以下是对表中带有 first_name 和 last_name 的重复记录进行统计的查询 −

Following is the query to count duplicate records with first_name and last_name in a table −

SELECT COUNT(*) as repetitions, last_name, first_name
FROM CUSTOMERS
GROUP BY last_name, first_name
HAVING repetitions > 1;

该查询将返回 CUSTOMERS 表中所有重复记录的列表。一般来说,要识别重复的值集,请按照以下步骤操作。

This query will return a list of all the duplicate records in the CUSTOMERS table. In general, to identify sets of values that are duplicated, follow the steps given below.

  1. Determine which columns may contain duplicated values.

  2. Include those columns in the column selection list, along with COUNT(*).

  3. List the columns in the GROUP BY clause as well.

  4. Apply a HAVING clause to filter unique values by requiring the group counts to be greater than one.

Eliminating Duplicates from a Query Result

您可以将 DISTINCT 命令与 SELECT 语句结合使用,以找出表中可用的唯一记录。

You can use the DISTINCT command along with the SELECT statement to find out unique records available in a table.

SELECT DISTINCT last_name, first_name
FROM CUSTOMERS
ORDER BY last_name;

DISTINCT 命令的替代方法是添加一个 GROUP BY 子句,指定您正在选择的列。这种方法消除了重复项,并且仅从指定列中检索唯一值组合。

An alternative to the DISTINCT command is to add a GROUP BY clause that specifies the columns you are selecting. This approach eliminates duplicates and retrieves only the unique combinations of values from the specified columns.

SELECT last_name, first_name
FROM CUSTOMERS
GROUP BY (last_name, first_name);

Removing Duplicates Using Table Replacement

如果表中存在重复的记录,并且您想从表中删除所有重复的记录,请按照以下步骤进行操作 −

If you have duplicate records in a table and you want to remove all the duplicate records from that table, then follow the procedure given below −

CREATE TABLE tmp AS
SELECT DISTINCT last_name, first_name, sex
FROM CUSTOMERS;

DROP TABLE CUSTOMERS;
ALTER TABLE tmp RENAME TO CUSTOMERS;

Handling Duplicates Using a Client Program

我们还可以使用客户端程序处理重复项。

We can also handle duplicate using the client program.

Syntax

Example

以下是这些程序 −

Following are the programs −