Dwh 简明教程

Data Warehousing - OLAP

在线分析处理服务器 (OLAP) 基于多维数据模型。它允许管理人员和分析人员通过快速、一致和交互方式获取信息的深刻见解。本章介绍 OLAP 的类型、OLAP 的操作、OLAP 与统计数据库和 OLTP 之间的差异。

Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It allows managers, and analysts to get an insight of the information through fast, consistent, and interactive access to information. This chapter cover the types of OLAP, operations on OLAP, difference between OLAP, and statistical databases and OLTP.

Types of OLAP Servers

我们有四种类型的 OLAP 服务器 −

We have four types of OLAP servers −

  1. Relational OLAP (ROLAP)

  2. Multidimensional OLAP (MOLAP)

  3. Hybrid OLAP (HOLAP)

  4. Specialized SQL Servers

Relational OLAP

ROLAP 服务器放置在关系型后端服务器和客户端前端工具之间。ROLAP 使用关系型或扩展关系型 DBMS 来存储和管理仓库数据。

ROLAP servers are placed between relational back-end server and client front-end tools. To store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.

ROLAP 包括以下内容 -

ROLAP includes the following −

  1. Implementation of aggregation navigation logic.

  2. Optimization for each DBMS back end.

  3. Additional tools and services.

Multidimensional OLAP

MOLAP 使用基于数组的多维存储引擎来实现数据的多维视图。使用多维数据存储时,如果数据集稀疏,存储利用率可能会低。因此,许多 MOLAP 服务器使用两级数据存储表示来处理稠密和稀疏数据集。

MOLAP uses array-based multidimensional storage engines for multidimensional views of data. With multidimensional data stores, the storage utilization may be low if the data set is sparse. Therefore, many MOLAP server use two levels of data storage representation to handle dense and sparse data sets.

Hybrid OLAP

混合 OLAP 是 ROLAP 和 MOLAP 的组合。它提供了 ROLAP 的较高可扩展性和 MOLAP 的更快计算。HOLAP 服务器允许存储详细信息的庞大数据量。汇总会单独存储在 MOLAP 存储中。

Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data volumes of detailed information. The aggregations are stored separately in MOLAP store.

Specialized SQL Servers

专业 SQL 服务器在只读环境中为星型和雪花型架构上的 SQL 查询提供高级查询语言和查询处理支持。

Specialized SQL servers provide advanced query language and query processing support for SQL queries over star and snowflake schemas in a read-only environment.

OLAP Operations

由于 OLAP 服务器基于多维数据视图,因此我们将在多维数据中讨论 OLAP 操作。

Since OLAP servers are based on multidimensional view of data, we will discuss OLAP operations in multidimensional data.

以下是 OLAP 操作列表:

Here is the list of OLAP operations −

  1. Roll-up

  2. Drill-down

  3. Slice and dice

  4. Pivot (rotate)

Roll-up

汇总对数据立方体执行聚合,方式如下:

Roll-up performs aggregation on a data cube in any of the following ways −

  1. By climbing up a concept hierarchy for a dimension

  2. By dimension reduction

下图展示了汇总如何工作。

The following diagram illustrates how roll-up works.

rollup
  1. Roll-up is performed by climbing up a concept hierarchy for the dimension location.

  2. Initially the concept hierarchy was "street < city < province < country".

  3. On rolling up, the data is aggregated by ascending the location hierarchy from the level of city to the level of country.

  4. The data is grouped into cities rather than countries.

  5. When roll-up is performed, one or more dimensions from the data cube are removed.

Drill-down

向下钻取是汇总的反向操作。按以下任一方式执行:

Drill-down is the reverse operation of roll-up. It is performed by either of the following ways −

  1. By stepping down a concept hierarchy for a dimension

  2. By introducing a new dimension.

下图展示了向下钻取如何工作:

The following diagram illustrates how drill-down works −

drill down
  1. Drill-down is performed by stepping down a concept hierarchy for the dimension time.

  2. Initially the concept hierarchy was "day < month < quarter < year."

  3. On drilling down, the time dimension is descended from the level of quarter to the level of month.

  4. When drill-down is performed, one or more dimensions from the data cube are added.

  5. It navigates the data from less detailed data to highly detailed data.

Slice

分片操作从给定的立方体中选择一个特定的维度,并提供一个新的子立方体。请考虑以下显示分片工作原理的图表。

The slice operation selects one particular dimension from a given cube and provides a new sub-cube. Consider the following diagram that shows how slice works.

slice
  1. Here Slice is performed for the dimension "time" using the criterion time = "Q1".

  2. It will form a new sub-cube by selecting one or more dimensions.

Dice

切块从给定的立方体中选择两个或多个维度,并提供一个新的子立方体。请考虑以下显示切块操作的图表。

Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the following diagram that shows the dice operation.

dice

基于以下选择条件的切块操作涉及三个维度。

The dice operation on the cube based on the following selection criteria involves three dimensions.

  1. (location = "Toronto" or "Vancouver")

  2. (time = "Q1" or "Q2")

  3. (item =" Mobile" or "Modem")

Pivot

枢轴操作也称为旋转。它围绕视图中的数据轴旋转,以便提供数据的替代表示。请考虑以下显示枢轴操作的图表。

The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an alternative presentation of data. Consider the following diagram that shows the pivot operation.

pivot

OLAP vs OLTP

Sr.No.

Data Warehouse (OLAP)

Operational Database (OLTP)

1

Involves historical processing of information.

Involves day-to-day processing.

2

OLAP systems are used by knowledge workers such as executives, managers and analysts.

OLTP systems are used by clerks, DBAs, or database professionals.

3

Useful in analyzing the business.

Useful in running the business.

4

It focuses on Information out.

It focuses on Data in.

5

Based on Star Schema, Snowflake, Schema and Fact Constellation Schema.

Based on Entity Relationship Model.

6

Contains historical data.

Contains current data.

7

Provides summarized and consolidated data.

Provides primitive and highly detailed data.

8

Provides summarized and multidimensional view of data.

Provides detailed and flat relational view of data.

9

Number or users is in hundreds.

Number of users is in thousands.

10

Number of records accessed is in millions.

Number of records accessed is in tens.

11

Database size is from 100 GB to 1 TB

Database size is from 100 MB to 1 GB.

12

Highly flexible.

Provides high performance.