Mysql 简明教程
MySQL - Find Duplicate Records
表中的重复记录会降低 MySQL 数据库的效率(通过增加执行时间、使用不必要的空间等)。因此,查找重复项对于有效使用数据库来说变得有必要。
Duplicate records in a table decrease the efficiency of a MySQL database (by increasing the execution time, using unnecessary space, etc.). Thus, locating duplicates becomes necessary to efficiently use the database.
但是,我们还可以通过对所需列添加约束(例如 PRIMARY KEY 和 UNIQUE 约束)来阻止用户向表中输入重复值。
We can, however, also prevent users from entering duplicate values into a table, by adding constraints on the desired column(s), such as PRIMARY KEY and UNIQUE constraints.
但是,由于人机错误、应用程序 Bug 或从外部资源中提取的数据等各种原因,如果重复项仍然输入到数据库中,则有多种方法可以查找记录。使用 SQL GROUP BY 和 HAVING 子句是筛选包含重复项的记录的常用方法之一。
But, due to various reasons like, human error, an application bug or data extracted from external resources, if duplicates are still entered into the database, there are various ways to find the records. Using SQL GROUP BY and HAVING clauses is one of the common ways to filter records containing duplicates.
Finding Duplicate Records
在查找表中的重复记录之前,我们需要定义需要重复记录的条件。您可以通过两个步骤执行此操作 -
Before finding the duplicate records in a table we need to define the criteria for which we need the duplicate records for. You can do this in two steps −
-
First of all, we need to group all the rows by the columns on which you want to check the duplicity on, using the GROUPBY clause.
-
Then Using the Having clause and the count function then, we need to verify whether any of the above formed groups have more than 1 entity.
Example
首先,让我们使用以下查询创建一个名为 CUSTOMERS 的表 -
First of all, let us create a table with the name CUSTOMERS using the following query −
CREATE TABLE CUSTOMERS (
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25),
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
现在,让我们使用 INSERT IGNORE INTO 语句将一些重复记录插入到上述创建的表中,如下所示 -
Now, let us insert some duplicate records into the above-created table using the INSERT IGNORE INTO statement as shown below −
INSERT INTO CUSTOMERS VALUES
(1, 'Ramesh', 32, 'Ahmedabad', 2000.00),
(2, 'Khilan', 25, 'Delhi', 1500.00),
(3, 'Kaushik', 23, 'Kota', 2000.00),
(4, 'Chaitali', 25, 'Mumbai', 6500.00),
(5, 'Hardik', 27, 'Bhopal', 8500.00),
(6, 'Komal', 22, 'Hyderabad', 4500.00),
(7, 'Muffy', 24, 'Indore', 10000.00);
表创建如下 −
The table is created as −
在以下查询中,我们尝试使用 MySQL COUNT() 函数返回重复记录的计数 -
On the following query, we are trying to return the count of duplicate records using the MySQL COUNT() function −
SELECT SALARY, COUNT(SALARY)
AS "COUNT" FROM CUSTOMERS
GROUP BY SALARY
ORDER BY SALARY;
With Having Clause
MySQL 中的 HAVING 子句可用于筛选表中一组行的条件。在此,我们将 HAVING 子句与 COUNT() 函数一起使用,以查找表中一列或多列中的重复值。
The HAVING clause in MySQL can be used to filter conditions for a group of rows in a table. Here, we are going to use the HAVING clause with the COUNT() function to find the duplicate values in one or more columns of a table.
Duplicates values in single column
以下是查找表中单列中重复值的方法:
Following are the steps to find the duplicate values in a single column of a table:
Step-1: 首先,我们需要使用 GROUP BY 子句对希望检查重复项的列中的所有行进行分组。
Step-1: Firstly, we need to use the GROUP BY clause to group all rows in the column that we want to check the duplicates.
Step-2: 接下来,要在 HAVING 子句中使用 COUNT() 函数查找重复组,以检查是否有任何组有多个元素。
Step-2: Then , to find duplicate groups, use COUNT() function in the HAVING clause to check if any group has more than one element.
Example
可以使用以下查询,我们可以找到宠物表中具有重复 DOG_NAMES 的所有行-
Using the following query, we can find all rows that have duplicate DOG_NAMES in the PETS table −
SELECT SALARY, COUNT(SALARY)
FROM CUSTOMERS
GROUP BY SALARY
HAVING COUNT(SALARY) > 1;
Duplicate Values in Multiple Columns
我们可以在 HAVING 子句中使用 AND 运算符查找多列中的重复行。仅当列组合重复时,这些行才被认为是重复的。
We can use the AND operator in the HAVING clause to find the duplicate rows in multiple columns. The rows are considered duplicate only when the combination of columns are duplicate.
Example
在以下查询中,我们正在宠物表中查找 DOG_NAME、AGE、OWNER_NAME 列中具有重复记录的行-
In the following query, we are finding rows in the PETS table with duplicate records in DOG_NAME, AGE, OWNER_NAME columns −
SELECT SALARY, COUNT(SALARY),
AGE, COUNT(AGE)
FROM CUSTOMERS
GROUP BY SALARY, AGE
HAVING COUNT(SALARY) > 1
AND COUNT(AGE) > 1;
The ROW_NUMBER() function with PARTITION BY
在 MySQL 中,ROW_NUMBER() 函数和 PARTITION BY 子句可用于查找表中的重复记录。分区子句根据特定列或多列对表进行划分,然后 ROW_NUMBER() 函数为每个分区内的每一行分配一个唯一行号。具有相同分区和行号的行被视为重复行。
In MySQL, the ROW_NUMBER() function and PARTITION BY clause can be used to find duplicate records in a table. The partition clause divides the table based on a specific column or multiple columns, then the ROW_NUMBER() function assigns a unique row number to each row within each partition. Rows with the same partition and row number are considered duplicates rows.