Dwh 简明教程
Data Warehousing - Overview
“数据仓库”这一术语最早由比尔·因蒙于 1990 年创造。根据因蒙的说法,数据仓库是主题导向、集成、随时间变化且不可变的数据集合。这些数据可帮助分析师在组织中做出明智的决策。
The term "Data Warehouse" was first coined by Bill Inmon in 1990. According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to take informed decisions in an organization.
操作数据库每天都在频繁更改,原因是会发生交易。假设业务经理想要分析有关任何数据的先前反馈,例如产品、供应商或任何消费者数据,那么经理将没有可用数据进行分析,因为先前数据已因交易而更新。
An operational database undergoes frequent changes on a daily basis on account of the transactions that take place. Suppose a business executive wants to analyze previous feedback on any data such as a product, a supplier, or any consumer data, then the executive will have no data available to analyze because the previous data has been updated due to transactions.
数据仓库通过多维视图为我们提供概括和汇总数据。除了概括和汇总的数据视图外,数据仓库还为我们提供了联机分析处理 (OLAP) 工具。这些工具可以帮助我们以交互式且有效的方式分析多维空间中的数据。这种分析可生成数据概括和数据挖掘。
A data warehouses provides us generalized and consolidated data in multidimensional view. Along with generalized and consolidated view of data, a data warehouses also provides us Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective analysis of data in a multidimensional space. This analysis results in data generalization and data mining.
关联、聚类、分类、预测等数据挖掘功能可以与 OLAP 操作集成,以在多个抽象层级中增强交互式知识挖掘。这就是为什么数据仓库现在已成为数据分析和联机分析处理的重要平台。
Data mining functions such as association, clustering, classification, prediction can be integrated with OLAP operations to enhance the interactive mining of knowledge at multiple level of abstraction. That’s why data warehouse has now become an important platform for data analysis and online analytical processing.
Understanding a Data Warehouse
-
A data warehouse is a database, which is kept separate from the organization’s operational database.
-
There is no frequent updating done in a data warehouse.
-
It possesses consolidated historical data, which helps the organization to analyze its business.
-
A data warehouse helps executives to organize, understand, and use their data to take strategic decisions.
-
Data warehouse systems help in the integration of diversity of application systems.
-
A data warehouse system helps in consolidated historical data analysis.
Why a Data Warehouse is Separated from Operational Databases
数据仓库与操作数据库分开放置,原因如下−
A data warehouses is kept separate from operational databases due to the following reasons −
-
An operational database is constructed for well-known tasks and workloads such as searching particular records, indexing, etc. In contract, data warehouse queries are often complex and they present a general form of data.
-
Operational databases support concurrent processing of multiple transactions. Concurrency control and recovery mechanisms are required for operational databases to ensure robustness and consistency of the database.
-
An operational database query allows to read and modify operations, while an OLAP query needs only read only access of stored data.
-
An operational database maintains current data. On the other hand, a data warehouse maintains historical data.
Data Warehouse Features
数据仓库的关键特性如下所述−
The key features of a data warehouse are discussed below −
-
Subject Oriented − A data warehouse is subject oriented because it provides information around a subject rather than the organization’s ongoing operations. These subjects can be product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making.
-
Integrated − A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. This integration enhances the effective analysis of data.
-
Time Variant − The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical point of view.
-
Non-volatile − Non-volatile means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse.
Note − 数据仓库不需要事务处理、恢复和并发控制,因为它被物理存储并与操作数据库分开。
Note − A data warehouse does not require transaction processing, recovery, and concurrency controls, because it is physically stored and separate from the operational database.
Data Warehouse Applications
如前所述,数据仓库帮助企业管理人员组织、分析并使用其数据进行决策。数据仓库是企业管理“闭环”反馈系统的计划执行评估的一部分。数据仓库广泛用于以下领域−
As discussed before, a data warehouse helps business executives to organize, analyze, and use their data for decision making. A data warehouse serves as a sole part of a plan-execute-assess "closed-loop" feedback system for the enterprise management. Data warehouses are widely used in the following fields −
-
Financial services
-
Banking services
-
Consumer goods
-
Retail sectors
-
Controlled manufacturing
Types of Data Warehouse
信息处理、分析处理和数据挖掘是接下来讨论的三种数据仓库应用程序−
Information processing, analytical processing, and data mining are the three types of data warehouse applications that are discussed below −
-
Information Processing − A data warehouse allows to process the data stored in it. The data can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts, or graphs.
-
Analytical Processing − A data warehouse supports analytical processing of the information stored in it. The data can be analyzed by means of basic OLAP operations, including slice-and-dice, drill down, drill up, and pivoting.
-
Data Mining − Data mining supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification and prediction. These mining results can be presented using the visualization tools.
Sr.No. |
Data Warehouse (OLAP) |
Operational Database(OLTP) |
1 |
It involves historical processing of information. |
It involves day-to-day processing. |
2 |
OLAP systems are used by knowledge workers such as executives, managers, and analysts. |
OLTP systems are used by clerks, DBAs, or database professionals. |
3 |
It is used to analyze the business. |
It is used to run the business. |
4 |
It focuses on Information out. |
It focuses on Data in. |
5 |
It is based on Star Schema, Snowflake Schema, and Fact Constellation Schema. |
It is based on Entity Relationship Model. |
6 |
It focuses on Information out. |
It is application oriented. |
7 |
It contains historical data. |
It contains current data. |
8 |
It provides summarized and consolidated data. |
It provides primitive and highly detailed data. |
9 |
It provides summarized and multidimensional view of data. |
It provides detailed and flat relational view of data. |
10 |
The number of users is in hundreds. |
The number of users is in thousands. |
11 |
The number of records accessed is in millions. |
The number of records accessed is in tens. |
12 |
The database size is from 100GB to 100 TB. |
The database size is from 100 MB to 100 GB. |
13 |
These are highly flexible. |
It provides high performance. |