Sql 简明教程
SQL - Database Tuning
SQL Database Tuning
SQL 数据库调整是优化数据库防止其成为瓶颈的一组活动。
Database Tuning in SQL is a set of activities performed to optimize a database and prevents it from becoming a bottleneck.
有各种技术可用来配置特定数据库的最佳性能。数据库调整与查询调整有重叠;因此,良好索引和避免不当查询有助于提高数据库效率。此外,增加存储量、更新到最新数据库版本以及投资更强大的 CPU(如果需要)也都是一些常用技术。
There are various techniques with which you can configure the optimal performance of a particular database. Database tuning overlaps with query tuning; so, good indexing and avoiding improper queries help in increasing the database efficiency. In addition, increasing storage, updating to latest database versions and investing in a more powerful CPU (if needed) are also some of the general techniques.
Database Tuning Techniques
我们可以实施以下技术来优化数据库性能:
We can implement the following techniques to optimize the performance of a database −
Database Normalization
规范化是消除数据库中重复数据的过程。我们可以通过将大型表分解成较小的相关表格来规范化数据库。这提高了数据库的性能,因为它检索数据需要更少的时间,而不是一个大表。
Normalization is the process of removing of duplicate data from a database. We can normalize a database by breaking down larger tables into smaller related tables. This increases the performance of database as it requires less time to retrieve data from small tables instead of one large table.
Proper Indexes
在 SQL 中,索引是数据库中特定数据位置的指针(内存地址)。我们在数据库中使用索引减少查询时间,因为数据库引擎可以使用其索引来跳至特定记录的位置而不是扫描整个数据库。
In SQL, indexes are the pointers (memory address) to the location of specific data in database. We use indexes in our database to reduce query time, as the database engine can jump to the location of a specific record using its index instead of scanning the entire database.
Avoid Improper Queries
选择正确的查询来高效检索数据也可以提高数据库的性能。例如,当我们只需要单列中的数据时选择检索整个表时会不必要地增加查询时间。因此,明智地查询数据库。
Choosing the correct query to retrieve data efficiently also improves the performance of a database. For example, choosing to retrieve an entire table when we only need the data in a single column will unnecessarily increase query time. So, query the database wisely.
让我们讨论一些常见的不当查询,以及如何纠正它们以优化数据库性能。
Let us discuss some of the common improper queries made and how to rectify them to optimize the database performance.
1. Use SELECT fields instead of SELECT ( )*
1. Use SELECT fields instead of SELECT ()*
在大型数据库中,我们总是应该仅从数据库中检索所需列,而不是检索所有列,即使不需要它们。我们可以通过在 SELECT 语句中指定列名而不是使用 SELECT (*) 语句轻松做到这一点。
In large databases, we should always retrieve only the required columns from the database instead of retrieving all the columns, even when they are not needed. We can easily do this by specifying the column names in the SELECT statement instead of using the SELECT (*) statement.
Example
假设我们已使用 CREATE TABLE 语句在 MySQL 数据库中创建了一个名为 CUSTOMERS 的表,如下所示 −
Assume we have created a table with name CUSTOMERS in MySQL database using CREATE TABLE statement as shown below −
CREATE TABLE CUSTOMERS (
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25),
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
以下查询使用 INSERT 语句将值插入此表:
Following query inserts values into this table using the INSERT statement −
INSERT INTO CUSTOMERS VALUES
(1, 'Ramesh', 32, 'Ahmedabad', 2000.00 ),
(2, 'Khilan', 25, 'Delhi', 1500.00 ),
(3, 'Kaushik', 23, 'Kota', 2000.00 ),
(4, 'Chaitali', 25, 'Mumbai', 6500.00 ),
(5, 'Hardik', 27, 'Bhopal', 8500.00 ),
(6, 'Komal', 22, 'Hyderabad', 4500.00 ),
(7, 'Muffy', 24, 'Indore', 10000.00 );
假设我们只想要 CUSTOMERS 表中的 ID、NAME 和 SALARY 列中的数据。因此,我们只应该在 SELECT 语句中指定那三列,如下所示:
Let us say we only want the data in ID, NAME and SALARY columns of the CUSTOMERS table. So, we should only specify those three columns in our SELECT statement as shown below −
SELECT ID, NAME, SALARY FROM CUSTOMERS;
Output
获得的输出如下所示 −
The output obtained is as shown below −
ID |
NAME |
SALARY |
1 |
Ramesh |
2000.00 |
2 |
Khilan |
1500.00 |
3 |
Kaushik |
2000.00 |
4 |
Chaitali |
6500.00 |
5 |
Hardik |
8500.00 |
6 |
Komal |
4500.00 |
7 |
Muffy |
10000.00 |
2. Use Wildcards
2. Use Wildcards
通配符 (%) 是我们用来根据模式搜索数据的字符。这些通配符与索引配对只提高了性能,因为数据库可以快速找到与模式匹配的数据。
Wildcards (%) are characters that we use to search for data based on patterns. These wildcards paired with indexes only improves performance because the database can quickly find the data that matches the pattern.
Example
如果我们想要从 CUSTOMERS 表中检索所有以 K 开头的客户的姓名,则以下查询将提供最快的结果:
If we want to retrieve the names of all the customers starting with K from the CUSTOMERS table, then, the following query will provide the quickest result −
SELECT ID, NAME FROM CUSTOMERS WHERE NAME LIKE 'K%';
Output
以下是以上查询的输出:
Following is the output of the above query −
ID |
NAME |
2 |
Khilan |
3 |
Kaushik |
6 |
Komal |
3. Use Explicit Join
3. Use Explicit Join
SQL 链接用于基于通用列合并两张表。有两种创建链接的方法:隐式链接和显式链接。显式链接符号在 ON 子句中使用 JOIN 关键字链接两张表,而隐式链接符号不使用 JOIN 关键字,而是使用 WHERE 子句。
SQL JOINs are used to combine two tables based on a common column. There are two ways of creating a JOIN implicit join and explicit join. Explicit Join notation use the JOIN keyword with the ON clause to join two tables while the implicit join notation does not use the JOIN keyword and works with the WHERE clause.
从性能方面来说,它们都处于同一级别。然而,在更复杂的情况下,隐式链接符号可能会产生与预期完全不同的结果。因此,更喜欢显式链接。
Performance wise, they are both on the same level. However, in more complicated cases, the implicit join notation might produce completely different results than intended. Therefore, Explicit Joining is preferred.
4. Avoid using SELECT DISTINCT
4. Avoid using SELECT DISTINCT
SQL 中的 DISTINCT 运算符用于从数据库中检索唯一记录。在设计合理、具有唯一索引的数据库表上,我们很少会使用它。
The DISTINCT operator in SQL is used to retrieve unique records from the database. And on a properly designed database table with unique indexes, we rarely use it.
但是,如果我们仍然必须在某个表上使用它,则使用 GROUP BY 子句代替 DISTINCT 关键字会展现出更好的查询性能(至少在某些数据库中是这样)。
But, if we still have to use it on a table, using the GROUP BY clause instead of the DISTINCT keyword shows a better query performance (at least in some databases).
5. Avoid using Multiple OR
5. Avoid using Multiple OR
OR 运算符用于在过滤数据库时组合多个条件。每当我们在过滤器条件中使用 OR 时,每个语句都会被单独处理。这会降低数据库性能,因为必须多次扫描整个表以检索与过滤器条件匹配的数据。
The OR operator is used to combine multiple conditions when filtering a database. Whenever we use OR in a filter condition each statement is processed separately. This degrades database performance as the entire table must be scanned multiple times to retrieve the data that matches the filter condition.
而我们可以使用更优化的解决方案;将不同的 OR 条件拆分为单独的查询,数据库可以并行处理这些查询。然后,可以使用 UNION 将这些查询的结果合并起来。
Instead, we can use a more optimized solution; by breaking the different OR conditions into separate queries, which can be processed parallelly by the database. Then, the results from these queries can be combined using UNION.
Example
例如,让我们假设我们要求获取所有年龄大于 25 岁或薪水大于 2000 的客户的详细信息。经过优化的查询如下所示:
For example, let us say we have a requirement of getting the details of all the customers whose age is greater than 25 or whose salary is greater than 2,000. The optimized query would be as show below −
SELECT ID, NAME FROM CUSTOMERS WHERE AGE > 25
UNION
SELECT ID, NAME FROM CUSTOMERS WHERE SALARY > 2000;
Output
执行上面的代码后,我们得到以下输出: -
After executing the above code, we get the following output −
ID |
NAME |
1 |
Ramesh |
5 |
Hardik |
4 |
Chaitali |
6 |
Komal |
7 |
Muffy |
6. Use WHERE instead of HAVING
6. Use WHERE instead of HAVING
WHERE 子句和 HAVING 子句都用于在 SQL 中筛选数据。但是,WHERE 子句比 HAVING 更有效。使用 WHERE 子句,只会检索与条件匹配的记录。但使用 HAVING 子句,它会首先检索所有记录,然后根据条件对其进行筛选。因此,更推荐使用 WHERE 子句。
The WHERE and HAVING clause are both used to filter data in SQL. However, WHERE clause is more efficient than HAVING. With WHERE clause, only the records that match the condition are retrieved. But with HAVING clause, it first retrieves all the records and then filters them based on a condition. Therefore, the WHERE clause is preferable.
Database Defragmentation
当数据存储在数据库中时,它们被放置在连续的物理位置中。在这种情况下,逻辑位置和物理位置都按相同的顺序排列。
When the data is stored in a database, they are placed in contiguous physical locations. In this case, both logical locations and physical locations are in the same order.
然而,当数据库表被更改(通过删除或更新记录),索引也会被更改以适应所做的更改。这会导致索引散布在存储空间中。物理位置也会失去连续分配。因此,数据库性能会下降。
However, when the database tables are altered by deleting or updating the records, indexes are also altered to suit the changes made. This will cause the indexes to be scattered across the storage. Physical locations also lose the contiguous allocation. Thus, reducing the database performance.
碎片整理是解决此问题的办法。它将重新组织/重建索引的逻辑顺序,以匹配物理顺序。但是,此过程会首先分析索引,然后选择它们只需要被重新组织还是需要被彻底重建。
Defragmentation is the solution this problem. It will reorganize/rebuild the logical ordering of the indexes to match the physical ordering. But, this process first analyses the indexes and chooses whether they only need to be reorganized or rebuilt completely.
Built-In Tuning Tools
一些数据库提供了内置的调整工具来监视数据库性能。例如,Oracle 数据库提供了以下调整工具:
Some databases provide built-in tuning tools to monitor the database performance. For instance, the Oracle database provides the following tuning tools −
-
EXPLAIN − In SQL, the EXPLAIN command give us the order in which a query is executed along with the estimated cost of each step. We can use this to find the query the least cost to optimize the database.
-
tkprof − tkprof is a command that gives us various statistics, such as CPU and I/O usage of a query. By using these statistics, we can tune our queries to reduce CPU and I/O utilization to increase the efficiency of our database.