Dwh 简明教程
Data Warehousing - Concepts
What is Data Warehousing?
数据仓库是构建和使用数据仓库的过程。数据仓库是通过集成来自多个异构源的数据构建的,这些源支持分析报告、结构化或即席查询以及决策。数据仓库涉及数据清理、数据集成和数据合并。
Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations.
Using Data Warehouse Information
有有助于利用数据仓库中可用数据的决策支持技术。这些技术帮助高管快速有效地使用仓库。他们可以收集数据、分析数据并根据仓库中的信息做出决策。仓库中收集的信息可用于以下任何域中−
There are decision support technologies that help utilize the data available in a data warehouse. These technologies help executives to use the warehouse quickly and effectively. They can gather data, analyze it, and take decisions based on the information present in the warehouse. The information gathered in a warehouse can be used in any of the following domains −
-
Tuning Production Strategies − The product strategies can be well tuned by repositioning the products and managing the product portfolios by comparing the sales quarterly or yearly.
-
Customer Analysis − Customer analysis is done by analyzing the customer’s buying preferences, buying time, budget cycles, etc.
-
Operations Analysis − Data warehousing also helps in customer relationship management, and making environmental corrections. The information also allows us to analyze business operations.
Integrating Heterogeneous Databases
为了集成异构数据库,我们有两种方法−
To integrate heterogeneous databases, we have two approaches −
-
Query-driven Approach
-
Update-driven Approach
Query-Driven Approach
这是集成异构数据库的传统方法。此方法用于在多个异构数据库之上构建包装器和集成器。这些集成器也被称为调解器。
This is the traditional approach to integrate heterogeneous databases. This approach was used to build wrappers and integrators on top of multiple heterogeneous databases. These integrators are also known as mediators.
Process of Query-Driven Approach
-
When a query is issued to a client side, a metadata dictionary translates the query into an appropriate form for individual heterogeneous sites involved.
-
Now these queries are mapped and sent to the local query processor.
-
The results from heterogeneous sites are integrated into a global answer set.
Update-Driven Approach
这是传统方法的替代方法。当今的数据仓库系统遵循更新驱动方法,而不是前面讨论的传统方法。在更新驱动方法中,来自多个异构源的信息会预先集成并存储在仓库中。该信息可用于直接查询和分析。
This is an alternative to the traditional approach. Today’s data warehouse systems follow update-driven approach rather than the traditional approach discussed earlier. In update-driven approach, the information from multiple heterogeneous sources are integrated in advance and are stored in a warehouse. This information is available for direct querying and analysis.
Advantages
此方法有以下优点:
This approach has the following advantages −
-
This approach provide high performance.
-
The data is copied, processed, integrated, annotated, summarized and restructured in semantic data store in advance.
-
Query processing does not require an interface to process data at local sources.
Functions of Data Warehouse Tools and Utilities
以下是数据仓库工具和实用程序的功能−
The following are the functions of data warehouse tools and utilities −
-
Data Extraction − Involves gathering data from multiple heterogeneous sources.
-
Data Cleaning − Involves finding and correcting the errors in data.
-
Data Transformation − Involves converting the data from legacy format to warehouse format.
-
Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions.
-
Refreshing − Involves updating from data sources to warehouse.
Note − 数据清理和数据转换是提高数据和数据挖掘结果质量的重要步骤。
Note − Data cleaning and data transformation are important steps in improving the quality of data and data mining results.