Sql 简明教程

SQL - Handling Duplicates

有时,表或结果集包含重复记录。虽然在大多数情况下允许重复,但在某些情况下,有必要防止重复记录并从数据库表中删除它们。

Sometimes, tables or result sets contain duplicate records. While in most cases, duplicates are allowed, there are situations where it is necessary to prevent duplicate records and remove them from a database table.

Why is Handling Duplicates in SQL Necessary?

在 SQL 数据库中处理重复记录对于防止以下后果变得十分必要 -

Handling duplicates in an SQL database becomes necessary to prevent the following consequences −

  1. The existence of duplicates in an organizational database will lead to logical errors.

  2. Duplicate data occupies space in the storage, which leads to decrease in usage efficiency of a database.

  3. Due to the increased use of resources, the overall cost of the handling resources rises.

  4. With increase in logical errors due to the presence of duplicates, the conclusions derived from data analysis in a database will also be erroneous.

本章将介绍如何防止表中出现重复记录,以及如何删除已有的重复记录。

This chapter will describe how to prevent the occurrence of duplicate records in a table and how to remove the already existing duplicate records.

Preventing Duplicate Entries

为了防止重复记录进入表,我们可以在相关字段上定义一个主键或唯一索引。这些数据库约束确保指定列或列集中的每个条目是唯一的。

To prevent the entry of duplicate records into a table, we can define a PRIMARY KEY or a UNIQUE Index on the relevant fields. These database constraints ensure that each entry in the specified column or set of columns is unique.

Example

让我们使用以下查询创建一个 CUSTOMERS 表 -

Let us create a CUSTOMERS table using the following query −

CREATE TABLE CUSTOMERS (
   FIRST_NAME CHAR(20),
   LAST_NAME CHAR(20),
   SEX CHAR(10)
);

由于我们没有在表上定义任何约束,因此可以向其中插入重复记录。为防止这种情况,在相关字段(比如 LAST_NAME 和 FIRST_NAME)上添加主键约束 -

As we have not defined any constraints on the table, duplicate records can be inserted into it. To prevent such cases, add a PRIMARY KEY constraint on relevant fields (say LAST_NAME and FIRST_NAME together) −

ALTER TABLE CUSTOMERS
ADD PRIMARY KEY (LAST_NAME, FIRST_NAME);

Using INSERT IGNORE Query:

Using INSERT IGNORE Query:

另外,我们可以使用 INSERT IGNORE 语句插入记录,而不为重复项生成错误,如下所示 -

Alternatively, we can use the INSERT IGNORE statement to insert records without generating an error for duplicates as shown below −

INSERT IGNORE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) VALUES
( 'Jay', 'Thomas'),
( 'Jay', 'Thomas');

正如您在下面看到的,该表将只包含一个记录(忽略重复值)。

As you can see below, the table will only consist of a single record (ignoring the duplicate value).

FIRST_NAME

LAST_NAME

SEX

Thomas

Jay

NULL

Using REPLACE Query:

Using REPLACE Query:

或者,使用 REPLACE 语句替换重复值,如下面的查询所示 -

Or, use the REPLACE statement to replace duplicates as shown in the following query −

REPLACE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) VALUES
( 'Ajay', 'Kumar'),
( 'Ajay', 'Kumar');

该表将包含以下记录 -

The table will contain the following records −

FIRST_NAME

LAST_NAME

SEX

Kumar

Ajay

NULL

Thomas

Jay

NULL

应该根据所需的重复处理行为来选择 INSERT IGNORE 和 REPLACE 语句。INSERT IGNORE 语句保留第一组重复记录,并丢弃任何随后的重复项。相反,REPLACE 语句保留最后一组重复项,并擦除任何早期的重复项。

The choice between the INSERT IGNORE and REPLACE statements should be made based on the desired duplicate-handling behaviour. The INSERT IGNORE statement retains the first set of duplicate records and discards any subsequent duplicates. Conversely, the REPLACE statement preserves the last set of duplicates and erases any earlier ones.

Using UNIQUE Constraint:

Using UNIQUE Constraint:

在表中强制体现唯一性的其他方法是添加 UNIQUE 约束,而不是 PRIMARY KEY 约束 −

Another way to enforce uniqueness in a table is by adding a UNIQUE constraint rather than a PRIMARY KEY constraint −

CREATE TABLE BUYERS (
   FIRST_NAME CHAR(20) NOT NULL,
   LAST_NAME CHAR(20) NOT NULL,
   SEX CHAR(10),
   UNIQUE (LAST_NAME, FIRST_NAME)
);

Counting and Identifying Duplicates

要基于特定列计数和识别重复记录,我们可以使用 COUNT 函数和 GROUP BY 子句。

To count and identify duplicate records based on specific columns, we can use the COUNT function and GROUP BY clause.

Example

以下是用于计算 BUYERS 中 FirstName 和 LastName 中重复记录的查询 −

Following is the query to count duplicate records with FIRST_NAME and LAST_NAME in the BUYERS −

SELECT COUNT(*) as repetitions, LAST_NAME, FIRST_NAME
FROM BUYERS
GROUP BY LAST_NAME, FIRST_NAME
HAVING repetitions > 1;

该查询将返回 PERSON_TABLE 表中所有重复记录的列表。要识别重复的值集,请按照以下步骤操作 −

This query will return a list of all the duplicate records in the PERSON_TABLE table. To identify sets of values that are duplicated, follow the steps given below −

  1. Determine which columns contain the values that may be duplicated.

  2. List those columns in the column selection list, along with the COUNT(*).

  3. List the columns in the GROUP BY clause as well.

  4. Add a HAVING clause that eliminates the unique values by requiring the group counts to be greater than one.

Eliminating Duplicates from a Table

我们可以在 SELECT 语句中使用 DISTINCT 关键字,从表中检索唯一记录。

We can use the DISTINCT keyword along with the SELECT statement to retrieve unique records from a table.

SELECT DISTINCT LAST_NAME, FIRST_NAME
FROM BUYERS
ORDER BY LAST_NAME;

或者,您可以包括一个 GROUP BY 子句,指定要选择以消除重复项的列 −

Alternatively, you can include a GROUP BY clause specifying the columns you are selecting to eliminate duplicates −

SELECT LAST_NAME, FIRST_NAME
FROM BUYERS
GROUP BY LAST_NAME, FIRST_NAME;