Obiee 简明教程
OBIEE – Data Warehouse
在当今竞争激烈的市场中,大多数成功的公司都迅速对市场变化和机遇做出反应。快速响应的要求是有效和高效地使用数据和信息。 “Data Warehouse” 是按类别组织的一个数据中心存储库,以支持组织决策者。一旦数据存储在数据仓库中,就可以对其进行分析。
In today’s competitive market, most successful companies respond quickly to market changes and opportunities. The requirement to respond quickly is by effective and efficient use of data and information. “Data Warehouse” is a central repository of data that is organized by category to support the organization’s decision makers. Once data is stored in a data warehouse, it can be accessed for analysis.
术语“数据仓库”一词最早是由比尔·因蒙于 1990 年发明。据他说,“数据仓库是一个面向主题、集成、随时间变化和非易失的数据集合,以支持管理决策制定过程”。
The term "Data Warehouse" was first invented by Bill Inmon in 1990. According to him, “Data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.”
拉尔夫·金博尔根据其功能提供了对数据仓库的定义。他说,“数据仓库是交易数据的一个副本,专门用于查询和分析”。
Ralph Kimball provided a definition of data warehouse based on its functionality. He said, “Data warehouse is a copy of transaction data specifically structured for query and analysis.”
数据仓库(DW 或 DWH)是一个用于分析数据和报告目的的系统。它们是存储库,可保存来自一个或多个异构数据源的数据。它们存储当前数据和历史数据,并用于创建分析报告。DW 可用于为高级管理人员创建交互式仪表盘。
Data Warehouse (DW or DWH) is a system used for analysis of data and reporting purposes. They are repositories that saves data from one or more heterogeneous data sources. They store both current and historical data and are used for creating analytical reports. DW can be used to create interactive dashboards for the senior management.
例如,分析报告可以包含按季度或按年比较公司的销售报告的数据。
For example, analytic reports can contain data for quarterly comparisons or for annual comparison of sales report for a company.
数据仓库中的数据来自多个运营系统,如销售、人力资源、营销、仓库管理等。它包含了来自不同交易系统中的历史数据,但也可以包含来自其他来源的数据。数据仓库用于将数据处理和分析工作负载从交易工作负载中分离出来,并能够整合来自多个数据源的数据。
Data in DW comes from multiple operational systems like sales, human resource, marketing, warehouse management, etc. It contains historical data from different transaction systems but it can also include data from other sources. DW is used to separate data processing and analysis workload from transaction workload and enables to consolidate the data from several data sources.
The Need for Data Warehouse
例如,您有一个住房贷款机构,其中数据来自多个 SAP/非 SAP 应用程序,如营销、销售、ERP、人力资源管理等。这些数据被提取、转换并加载到数据仓库中。如果您必须对产品进行季度/年度销售比较,则不能使用运营数据库,因为这会使交易系统挂起。这就是对使用数据仓库有需求的地方。
For example − You have a home loan agency, where data comes from multiple SAP/non-SAP applications such as marketing, sales, ERP, HRM, etc. This data is extracted, transformed and loaded into DW. If you have to do quarterly/annual sales comparison of a product, you cannot use an operational database as this will hang the transaction system. This is where the need for using DW arises.
Characteristics of a Data Warehouse
数据仓库的一些主要特征包括:
Some of the key characteristics of DW are −
-
It is used for reporting and data analysis.
-
It provides a central repository with data integrated from one or more sources.
-
It stores current and historical data.
Data Warehouse vs. Transactional System
以下是数据仓库和运营数据库(交易系统)之间的几个区别:
Following are few differences between Data Warehouse and Operational Database (Transaction System) −
-
Transactional system is designed for known workloads and transactions like updating a user record, searching a record, etc. However, DW transactions are more complex and present a general form of data.
-
Transactional system contains the current data of an organization whereas DW normally contains historical data.
-
Transactional system supports parallel processing of multiple transactions. Concurrency control and recovery mechanisms are required to maintain consistency of the database.
-
Operational database query allows to read and modify operations (delete and update), while an OLAP query needs only read-only access of stored data (select statement).
-
DW involves data cleaning, data integration, and data consolidations.
数据仓库具有三层架构:数据源层、集成层和表示层。下图显示了数据仓库系统的通用架构。
DW has a three-layer architecture − Data Source Layer, Integration Layer, and Presentation Layer. The following diagram shows the common architecture of a Data Warehouse system.
Types of Data Warehouse System
以下是数据仓库系统类型:
Following are the types of DW system −
-
Data Mart
-
Online Analytical Processing (OLAP)
-
Online Transaction Processing (OLTP)
-
Predictive Analysis
Data Mart
数据市集是数据仓库最简单的形式,它通常着重于一个功能领域,如销售、财务或营销。因此,数据市集通常仅从少数数据源获取数据。
Data Mart is the simplest form of DW and it normally focuses on a single functional area, such as sales, finance or marketing. Hence, data mart usually gets data only from few data sources.
在此系统中,源头可以是内部事务系统、中央数据仓库或外部数据源应用程序。非规范化是此系统中数据建模技术的法则。
Sources could be an internal transaction system, a central data warehouse, or an external data source application. De-normalization is the norm for data modeling techniques in this system.
Online Analytical Processing (OLAP)
OLAP 系统包含较少的交易,但是涉及使用聚合之类的复杂计算,如 Sum、Count、Average 等。
An OLAP system contains less number of transactions but involves complex calculations like use of Aggregations − Sum, Count, Average, etc.
What is Aggregation?
我们保存带有聚合数据的表格,如每年(1 行)、每季度(4 行)、每月(12 行),并且现在我们想比较数据,例如每年只处理 1 行。但是,在未聚合的数据中,将处理所有行。
We save tables with aggregated data like yearly (1 row), quarterly (4 rows), monthly (12 rows) and now we want to compare data, like Yearly only 1 row will be processed. However, in an un-aggregated data, all the rows will be processed.
OLAP 系统通常将数据储存在多维度架构中,例如星形架构、星系架构(通过逻辑方式合并事实表和维度表)。
OLAP system normally stores data in multidimensional schemas like Star Schema, Galaxy schemas (with Fact and Dimensional tables are joined in logical manner).
在 OLAP 系统中,执行查询的响应时间是一种有效性衡量指标。OLAP 应用程序被数据挖掘技术广泛用于从 OLAP 系统获取数据。OLAP 数据库以多维度架构储存汇总的历史数据。与通常延迟数天的数据市场相比,OLAP 系统的数据延迟只有几小时。
In an OLAP system, response time to execute a query is an effectiveness measure. OLAP applications are widely used by Data Mining techniques to get data from OLAP systems. OLAP databases store aggregated historical data in multi-dimensional schemas. OLAP systems have data latency of a few hours as compared to Data Marts where latency is normally closer to few days.
Online Transaction Processing (OLTP)
OLTP 系统以大量简短的在线事务而闻名,如插入、更新、删除等。OLTP 系统提供快速的查询处理,还负责在多访问环境中提供数据完整性。
An OLTP system is known for large number of short online transactions like insert, update, delete, etc. OLTP systems provide fast query processing and also responsible to provide data integrity in multi-access environment.
对于 OLTP 系统而言,其有效性衡量标准是每秒处理的事务数。OLTP 系统通常仅包含当前数据。用于存储事务数据库的架构是实体模型。OLTP 系统中采用范式化为数据建模技术。
For an OLTP systems, effectiveness is measured by the number of transactions processed per second. OLTP systems normally contain only current data. The schema used to store transactional databases is the entity model. Normalization is used for data modeling techniques in OLTP system.
OLTP vs OLAP
以下图示显示了 OLTP 和 OLAP 系统之间的主要区别。
The following illustration shows the key differences between an OLTP and OLAP system.
Indexes − 在 OLTP 系统中,只有少数索引,而在 OLAP 系统中,有许多索引用于性能优化。
Indexes − In an OLTP system, there are only few indexes while in an OLAP system there are many indexes for performance optimization.
Joins − 在 OLTP 系统中,大量联接和数据是规范化的;然而,在 OLAP 系统中,联接较少且非规范化的。
Joins − In an OLTP system, large number of joins and data is normalized; however, in an OLAP system there are less joins and de-normalized.
Aggregation − 在 OLTP 系统中,数据未聚合,而在 OLAP 数据库中使用了更多聚合。
Aggregation − In an OLTP system, data is not aggregated while in an OLAP database more aggregations are used.