Mysql 简明教程

MySQL - Delete Duplicate Records

The MySQL Delete Duplicate Records

MySQL 数据库(包括 MySQL)中的重复记录非常常见。MySQL 数据库以表格的形式存储数据,表格由行和列组成。如果数据库表中的两行或多行具有相同的值,则称记录出现重复。

Duplicate records in a database, including MySQL, is a very common occurrence. A MySQL database stores data in the form of tables consisting of rows and columns. Now, a record is said to be duplicated when two or more rows in a database table have same values.

此冗余可能出于以下原因:

This redundancy might occur due to various reasons −

  1. The row might be inserted twice.

  2. When raw data is imported from external sources.

  3. There might be a bug in the database application.

无论出于什么原因,删除这样的冗余变得很重要,以减少错误提高数据准确性,或提高数据库的性能效率。

Whatever might be reason, deleting such redundancy becomes important to increase the data accuracy with less errors, or to increase the efficiency of database performance.

Find Duplicate Values

在删除重复记录之前,我们必须先找出它们是否存在于表中。这可以使用如下方法完成:

Before removing duplicate records, we must find whether they exist in a table or not. This is possible using the following ways −

  1. GROUP BY Clause

  2. COUNT() Method

Example

让我们先创建一个名为 “CUSTOMERS” 的表,其中包含重复值:

Let us first create table named "CUSTOMERS" containing duplicate values −

CREATE TABLE CUSTOMERS(
   ID int,
   NAME varchar(100)
);

使用以下 INSERT 查询,向 “CUSTOMERS” 表中插入几条记录。在这里,我们已经将 “John” 作为重复记录添加了 3 次:

Using the following INSERT query, insert few records into the "CUSTOMERS" table. Here, we have added "John" as duplicate record 3 times −

INSERT INTO CUSTOMERS
VALUES (1,'John'), (2,'Johnson'), (3,'John'), (4,'John');

获得的 CUSTOMERS 表如下:−

The CUSTOMERS table obtained is as follows −

现在,我们正在使用 COUNT() 方法和 GROUP BY 子句中所示的以下查询检索表中重复的记录:

Now, we are retrieving the record that is duplicated in the table using the COUNT() method and GROUP BY clause as shown in the following query −

SELECT NAME, COUNT(NAME) FROM CUSTOMERS
GROUP BY NAME HAVING COUNT(NAME) > 1;

Output

以下是所获得的输出 −

Following is the output obtained −

Delete Duplicate Records

要从数据库表中删除重复记录,我们可以使用 DELETE 命令。但是,这个 DELETE 命令可以用两种方式来删除表中的重复内容:

To delete duplicate records from a database table, we can use the DELETE command. However, this DELETE command can be used in two ways to remove duplicates from a table −

  1. Using DELETE…​ JOIN

  2. Using ROW_NUMBER() Function

Using DELETE…​ JOIN

为了使用 DELETE…​ JOIN 命令从表中删除重复记录,我们在自身上执行内连接。这适用于并非完全相同的情况。

To use DELETE…​ JOIN command in order to remove duplicate records from a table, we perform inner join on itself. This is applicable for cases that are not completely identical.

例如,假设客户详细信息在客户记录中重复,但序列号不断增加。在这种情况下,即使 ID 不相同,也会复制该记录。

For instance, suppose there is a repetition of customer details in customer records, but the serial number keeps incrementing. Here, the record is duplicated even if the ID is not same.

Example

在以下查询中,我们使用先前创建的 CUSTOMERS 表使用 DELETE…​ JOIN 命令删除重复记录:

In the following query, we are using the CUSTOMERS table created previously to remove duplicate records using DELETE…​ JOIN command −

DELETE t1 FROM CUSTOMERS t1
INNER JOIN CUSTOMERS t2
WHERE t1.id < t2.id AND t1.name = t2.name;

Output

以下是所获得的输出 −

Following is the output obtained −

Query OK, 2 rows affected (0.01 sec)

Verification

Verification

我们可以使用以下 SELECT 语句来验证是否已删除重复记录:

We can verify whether the duplicate records have been removed or not using the following SELECT statement −

SELECT * FROM CUSTOMERS;

我们可以看到在所得表中,该查询删除了重复项,并在表中留下了不同的记录:

We can see in the table obtained that the query removed duplicates and leave distinct records in the table −

Using ROW_NUMBER() Function

在 MySQL 中,ROW_NUMBER() 函数用于给从查询所得结果集中的每一行一个顺序号,从 1 开始。

The ROW_NUMBER() Function in MySQL is used to assign a sequential number, starting from 1, to each row in a result-set obtained from a query.

使用此函数,MySQL 允许你检测重复的行,可以用 DELETE 语句删除这些行。

Using this function, MySQL allows you to detect the duplicate rows, which can be removed with the DELETE statement.

Example

在这里,我们对 'NAME' 列中有重复值的 CUSTOMERS 表应用 ROW_NUMBER() 函数。我们将使用以下查询根据 'NAME' 列在分区内分配行号:

Here, we are applying the ROW_NUMBER() function to the CUSTOMERS table having duplicate values in the 'NAME' column. We will assign row numbers within a partition based on the 'NAME' column using the following query −

SELECT id, ROW_NUMBER()
OVER (PARTITION BY name ORDER BY name) AS row_num
FROM CUSTOMERS;

以下是所获得的输出 −

Following is the output obtained −

现在,使用以下语句,删除重复行(行号大于 1 的行):

Now, with the following statement, delete the duplicate rows (rows with a row number greater than 1) −

DELETE FROM CUSTOMERS WHERE id IN(
   SELECT id FROM (SELECT id, ROW_NUMBER()
   OVER (PARTITION BY name ORDER BY name) AS row_num
   FROM CUSTOMERS) AS temp_table WHERE row_num>1
);

我们得到了如下输出 −

We get the output as shown below −

Query OK, 2 rows affected (0.00 sec)

为了验证是否已删除重复记录,请使用以下 SELECT 查询:

To verify whether the duplicate records have been removed or not, use the following SELECT query −

SELECT * FROM CUSTOMERS;

产生的结果如下 −

The result produced is as follows −

Delete Duplicate Records Using Client Program

我们还可以使用客户端程序删除重复记录。

We can also delete duplicate records using client program.

Syntax

Example

以下是这些程序 −

Following are the programs −