Mysql 简明教程

MySQL - Find Duplicate Records

Duplicate records in a table decrease the efficiency of a MySQL database (by increasing the execution time, using unnecessary space, etc.). Thus, locating duplicates becomes necessary to efficiently use the database.

We can, however, also prevent users from entering duplicate values into a table, by adding constraints on the desired column(s), such as PRIMARY KEY and UNIQUE constraints.

But, due to various reasons like, human error, an application bug or data extracted from external resources, if duplicates are still entered into the database, there are various ways to find the records. Using SQL GROUP BY and HAVING clauses is one of the common ways to filter records containing duplicates.

Finding Duplicate Records

Before finding the duplicate records in a table we need to define the criteria for which we need the duplicate records for. You can do this in two steps −

  1. First of all, we need to group all the rows by the columns on which you want to check the duplicity on, using the GROUPBY clause.

  2. Then Using the Having clause and the count function then, we need to verify whether any of the above formed groups have more than 1 entity.

Example

首先,让我们使用以下查询创建一个名为 CUSTOMERS 的表 -

CREATE TABLE CUSTOMERS (
   ID INT NOT NULL,
   NAME VARCHAR (20) NOT NULL,
   AGE INT NOT NULL,
   ADDRESS CHAR (25),
   SALARY DECIMAL (18, 2),
   PRIMARY KEY (ID)
);

Now, let us insert some duplicate records into the above-created table using the INSERT IGNORE INTO statement as shown below −

INSERT INTO CUSTOMERS VALUES
(1, 'Ramesh', 32, 'Ahmedabad', 2000.00),
(2, 'Khilan', 25, 'Delhi', 1500.00),
(3, 'Kaushik', 23, 'Kota', 2000.00),
(4, 'Chaitali', 25, 'Mumbai', 6500.00),
(5, 'Hardik', 27, 'Bhopal', 8500.00),
(6, 'Komal', 22, 'Hyderabad', 4500.00),
(7, 'Muffy', 24, 'Indore', 10000.00);

表创建如下 −

On the following query, we are trying to return the count of duplicate records using the MySQL COUNT() function −

SELECT SALARY, COUNT(SALARY)
AS "COUNT" FROM CUSTOMERS
GROUP BY SALARY
ORDER BY SALARY;

Output

以上查询的输出如下所示:

With Having Clause

The HAVING clause in MySQL can be used to filter conditions for a group of rows in a table. Here, we are going to use the HAVING clause with the COUNT() function to find the duplicate values in one or more columns of a table.

Duplicates values in single column

Following are the steps to find the duplicate values in a single column of a table:

Step-1: Firstly, we need to use the GROUP BY clause to group all rows in the column that we want to check the duplicates.

Step-2: Then , to find duplicate groups, use COUNT() function in the HAVING clause to check if any group has more than one element.

Example

Using the following query, we can find all rows that have duplicate DOG_NAMES in the PETS table −

SELECT SALARY, COUNT(SALARY)
FROM CUSTOMERS
GROUP BY SALARY
HAVING COUNT(SALARY) > 1;

Output

输出如下 −

Duplicate Values in Multiple Columns

We can use the AND operator in the HAVING clause to find the duplicate rows in multiple columns. The rows are considered duplicate only when the combination of columns are duplicate.

Example

In the following query, we are finding rows in the PETS table with duplicate records in DOG_NAME, AGE, OWNER_NAME columns −

SELECT SALARY, COUNT(SALARY),
AGE, COUNT(AGE)
FROM CUSTOMERS
GROUP BY SALARY, AGE
HAVING  COUNT(SALARY) > 1
AND COUNT(AGE) > 1;

Output

输出如下 −

The ROW_NUMBER() function with PARTITION BY

In MySQL, the ROW_NUMBER() function and PARTITION BY clause can be used to find duplicate records in a table. The partition clause divides the table based on a specific column or multiple columns, then the ROW_NUMBER() function assigns a unique row number to each row within each partition. Rows with the same partition and row number are considered duplicates rows.

Example

In the following query, we are assigning a

SELECT *, ROW_NUMBER() OVER (
   PARTITION BY SALARY, AGE
   ORDER BY SALARY, AGE
) AS row_numbers
FROM CUSTOMERS;

Output

The output for the query above as follows −

Find Duplicate Records Using Client Program

我们还可以使用客户程序查找重复记录。

Syntax

Example

以下是这些程序 −