Cognos 简明教程

Data Warehouse - Overview

数据仓库包含 multiple heterogeneous data sources 中的数据,用于分析报告和决策制定。数据仓库是一个用于存储来自不同数据源和应用程序的数据的中心位置。

A Data Warehouse consists of data from multiple heterogeneous data sources and is used for analytical reporting and decision making. Data Warehouse is a central place where data is stored from different data sources and applications.

数据仓库一词最早是由比尔·因蒙在 1990 年发明的。数据仓库始终有别于操作型数据库。

The term Data Warehouse was first invented by Bill Inmom in 1990. A Data Warehouse is always kept separate from an Operational Database.

数据仓库系统中的数据来自操作事务系统,如下所示−

The data in a DW system is loaded from operational transaction systems like −

  1. Sales

  2. Marketing

  3. HR

  4. SCM, etc.

在加载到数据仓库系统以进行信息处理之前,它可能经过操作数据存储或其他转换。

It may pass through operational data store or other transformations before it is loaded to the DW system for information processing.

数据仓库用于报告和分析信息,并存储历史和当前数据。数据仓库系统中的数据用于分析报告,此类报告稍后由业务分析师、销售经理或知识工作者用于决策制定。

A Data Warehouse is used for reporting and analyzing of information and stores both historical and current data. The data in DW system is used for Analytical reporting, which is later used by Business Analysts, Sales Managers or Knowledge workers for decision-making.

data warehouse

在上述图片中,可以看到数据来自 multiple heterogeneous data 源到数据仓库。数据仓库的常见数据源包括−

In the above image, you can see that the data is coming from multiple heterogeneous data sources to a Data Warehouse. Common data sources for a data warehouse includes −

  1. Operational databases

  2. SAP and non-SAP Applications

  3. Flat Files (xls, csv, txt files)

业务智能 (BI) 用户通过数据仓库中的数据访问分析报告、数据挖掘和分析。此类数据由业务用户、销售经理、分析师用于决策制定,以定义未来战略。

Data in data warehouse is accessed by BI (Business Intelligence) users for Analytical Reporting, Data Mining and Analysis. This is used for decision making by Business Users, Sales Manager, Analysts to define future strategy.

Features of a Data Warehouse

它是一个集中数据存储库,其中存储来自一个或多个异构数据源的数据。数据仓库系统存储当前和历史数据。通常,数据仓库系统存储 5-10 年的历史数据。数据仓库系统始终与操作事务系统分开。

It is a central data repository where data is stored from one or more heterogeneous data sources. A DW system stores both current and historical data. Normally a DW system stores 5-10 years of historical data. A DW system is always kept separate from an operational transaction system.

数据仓库系统中的数据用于从季度到年度比较的不同类型的分析报告。

The data in a DW system is used for different types of analytical reporting range from Quarterly to Annual comparison.

Data Warehouse Vs Operational Database

数据仓库和操作数据库之间的差异如下−

The differences between a Data Warehouse and Operational Database are as follows −

  1. An Operational System is designed for known workloads and transactions like updating a user record, searching a record, etc. However, Data Warehouse transactions are more complex and present a general form of data.

  2. An Operational System contains the current data of an organization and Data warehouse normally contains the historical data.

  3. An Operational Database supports parallel processing of multiple transactions. Concurrency control and recovery mechanisms are required to maintain consistency of the database.

  4. An Operational Database query allows to read and modify operations (insert, delete and Update) while an OLAP query needs only read-only access of stored data (Select statement).

Architecture of Data Warehouse

数据仓库涉及数据清理、数据集成和数据合并。数据仓库具有 3 层架构−

Data Warehousing involves data cleaning, data integration, and data consolidations. A Data Warehouse has a 3-layer architecture −

Data Source Layer

它定义了数据如何进入数据仓库。它涉及各种数据源和操作事务系统、平面文件、应用程序等。

It defines how the data comes to a Data Warehouse. It involves various data sources and operational transaction systems, flat files, applications, etc.

Integration Layer

它由操作数据存储和暂存区域组成。暂存区域用于执行数据清理、数据转换以及从不同源加载数据到数据仓库。由于多个数据源在不同的时区可用,因此使用暂存区域存储数据,然后将转换应用于数据。

It consists of Operational Data Store and Staging area. Staging area is used to perform data cleansing, data transformation and loading data from different sources to a data warehouse. As multiple data sources are available for extraction at different time zones, staging area is used to store the data and later to apply transformations on data.

Presentation Layer

这用于由终端用户执行业务智能报告。数据仓库系统中的数据由业务智能用户访问,并用于报告和分析。

This is used to perform BI reporting by end users. The data in a DW system is accessed by BI users and used for reporting and analysis.

下图显示了数据仓库系统的常见架构。

The following illustration shows the common architecture of a Data Warehouse System.

data warehouse architecture

Characteristics of a Data Warehouse

以下是数据仓库的关键特征−

The following are the key characteristics of a Data Warehouse −

  1. Subject Oriented − In a DW system, the data is categorized and stored by a business subject rather than by application like equity plans, shares, loans, etc.

  2. Integrated − Data from multiple data sources are integrated in a Data Warehouse.

  3. Non Volatile − Data in data warehouse is non-volatile. It means when data is loaded in DW system, it is not altered.

  4. Time Variant − A DW system contains historical data as compared to Transactional system which contains only current data. In a Data warehouse you can see data for 3 months, 6 months, 1 year, 5 years, etc.

OLTP vs OLAP

首先,OLTP 代表 Online Transaction Processing ,而 OLAP 代表 Online Analytical Processing

Firstly, OLTP stands for Online Transaction Processing, while OLAP stands for Online Analytical Processing

在 OLTP 系统中,有大量的在线短事务,如 INSERT、UPDATE 和 DELETE。

In an OLTP system, there are a large number of short online transactions such as INSERT, UPDATE, and DELETE.

而对于 OLTP 系统,有效措施是短事务的处理时间,并且非常短暂。它控制多访问环境中的数据完整性。对于 OLTP 系统,每秒事务的数量衡量效率。OLTP 数据仓库系统包含当前和详细的数据,并使用实体模型中的模式(3NF)维护。

Whereas, in an OLTP system, an effective measure is the processing time of short transactions and is very less. It controls data integrity in multi-access environments. For an OLTP system, the number of transactions per second measures the effectiveness. An OLTP Data Warehouse System contains current and detailed data and is maintained in the schemas in the entity model (3NF).

For Example

For Example

零售商店中的日常交易系统,其中客户记录每天都会插入、更新和删除。它提供更快的查询处理速度。OLTP 数据库包含详细和当前数据。存储 OLTP 数据库的模式是实体模型。

A Day-to-Day transaction system in a retail store, where the customer records are inserted, updated and deleted on a daily basis. It provides faster query processing. OLTP databases contain detailed and current data. The schema used to store OLTP database is the Entity model.

在 OLAP 系统中,与事务系统相比,事务数量更少。执行的查询本质上很复杂,涉及数据聚合。

In an OLAP system, there are lesser number of transactions as compared to a transactional system. The queries executed are complex in nature and involves data aggregations.

What is an Aggregation?

如果有人要进行年与年的比较,我们保存包含聚合数据(如年(1 行)、季度(4 行)、月(12 行)等)的表,则只处理一行。然而,在非聚合表中,它将比较所有行。这称为聚合。

We save tables with aggregated data like yearly (1 row), quarterly (4 rows), monthly (12 rows) or so, if someone has to do a year to year comparison, only one row will be processed. However, in an un-aggregated table it will compare all the rows. This is called Aggregation.

在 OLAP 系统中可以使用各种聚合函数,如 Sum、Avg、Max、Min 等。

There are various Aggregation functions that can be used in an OLAP system like Sum, Avg, Max, Min, etc.

For Example

For Example

SELECT Avg(salary)
FROM employee
WHERE title = 'Programmer';

Key Differences

这些是 OLAP 和 OLTP 之间的主要区别。

These are the major differences between an OLAP and an OLTP system.

  1. Indexes − An OLTP system has only few indexes while in an OLAP system there are many indexes for performance optimization.

  2. Joins − In an OLTP system, large number of joins and data are normalized. However, in an OLAP system there are less joins and are de-normalized.

  3. Aggregation − In an OLTP system, data is not aggregated while in an OLAP database more aggregations are used.

  4. Normalization − An OLTP system contains normalized data however data is not normalized in an OLAP system.

oltp

Data Mart Vs Data Warehouse

数据中心专注于单个功能领域,代表数据仓库的最简单形式。考虑包含销售、市场营销、人力资源和财务数据的数据仓库。数据中心专注于单个功能领域,如销售或市场营销。

Data mart focuses on a single functional area and represents the simplest form of a Data Warehouse. Consider a Data Warehouse that contains data for Sales, Marketing, HR, and Finance. A Data mart focuses on a single functional area like Sales or Marketing.

data mart vs data warehouse

在上面的图片中,你可以看到数据仓库和数据中心之间的区别。

In the above image, you can see the difference between a Data Warehouse and a data mart.

Fact vs Dimension Table

事实表代表进行分析的度量。它还包含维度键的外键。

A fact table represents the measures on which analysis is performed. It also contains foreign keys for the dimension keys.

For example − 每一个销售都是一个事实。

For example − Every sale is a fact.

Cust Id

Prod Id

Time Id

Qty Sold

1110

25

2

125

1210

28

4

252

维度表代表维度的特征。客户维度可以具有 Customer_Name、Phone_No、Sex 等等。

The Dimension table represents the characteristics of a dimension. A Customer dimension can have Customer_Name, Phone_No, Sex, etc.

Cust Id

Cust_Name

Phone

Sex

1110

Sally

1113334444

F

1210

Adam

2225556666

M