Obiee 简明教程
OBIEE – Schema
架构是整个数据库的逻辑描述。它包括所有类型记录的名称和描述,包括所有关联数据项和聚合。与数据库非常相似,DW 也需要维护一个模式。数据库使用关系模型,而 DW 使用星型、雪花和事实星座模式(银河模式)。
Schema is a logical description of the entire database. It includes the name and description of records of all types including all associated data-items and aggregates. Much like a database, DW also requires to maintain a schema. Database uses relational model, while DW uses Star, Snowflake, and Fact Constellation schema (Galaxy schema).
Star Schema
在星型模式中,有多个非规范化形式的维度表,它们仅连接到一个事实表。这些表以逻辑方式连接,以满足分析目的的一些业务要求。这些架构是使用 BI 报告工具创建报告的多维结构。
In a Star Schema, there are multiple dimension tables in de-normalized form that are joined to only one fact table. These tables are joined in a logical manner to meet some business requirement for analysis purpose. These schemas are multidimensional structures which are used to create reports using BI reporting tools.
星型架构中的维度包含一组属性,而事实表包含所有维度和度量值的外键。
Dimensions in Star schemas contain a set of attributes and Fact tables contain foreign keys for all dimensions and measurement values.

在以上星型模式中,有一个位于中心的“销售事实”事实表,并使用主键连接到 4 个维度表。维度表没有进一步规范化,并且这些表的连接在 DW 中被称为星型模式。
In the above Star Schema, there is a fact table “Sales Fact” at the center and is joined to 4 dimension tables using primary keys. Dimension tables are not further normalized and this joining of tables is known as Star Schema in DW.
事实表还包含度量值 - dollar_sold 和 units_sold。
Fact table also contains measure values − dollar_sold and units_sold.
Snowflakes Schema
在雪花模式中,有多个规范化形式的维度表,它们仅连接到一个事实表。这些表以逻辑方式连接,以满足分析目的的一些业务要求。
In a Snowflakes Schema, there are multiple dimension tables in normalized form that are joined to only one fact table. These tables are joined in a logical manner to meet some business requirement for analysis purpose.
星型架构和雪花架构之间的唯一区别在于维度表进一步规范化。规范化将数据拆分为其他表。由于雪花架构中的规范化,数据冗余会减少而不会丢失任何信息,因此易于维护并节省存储空间。
Only difference between a Star and Snowflakes schema is that dimension tables are further normalized. The normalization splits up the data into additional tables. Due to normalization in the Snowflake schema, the data redundancy is reduced without losing any information and therefore it becomes easy to maintain and saves storage space.

在上面的雪花架构示例中,产品和客户表被进一步规范化以节省存储空间。有时,在您执行直接在规范化表中处理行的查询时,它还可以提供性能优化,因此它不会在主维度表中处理行,而是直接进入架构中的规范化表。
In above Snowflakes Schema example, Product and Customer table are further normalized to save storage space. Sometimes, it also provides performance optimization when you execute a query that requires processing of rows directly in normalized table so it doesn’t process rows in primary Dimension table and comes directly to Normalized table in Schema.
Granularity
表中的粒度表示表中存储的信息级别。数据的高粒度意味着数据处于事务级别或接近事务级别,这更加详细。低粒度意味着数据的信息级别较低。
Granularity in a table represents the level of information stored in the table. High granularity of data means that data is at or near the transaction level, which has more detail. Low granularity means that data has low level of information.
事实表通常设计为低粒度。这意味着我们需要找到可以存储在事实表中的最低信息级别。在日期维度中,粒度级别可以是年、月、季度、期间、周和日。
A fact table is usually designed at a low level of granularity. This means that we need to find the lowest level of information that can be stored in a fact table. In date dimension, the granularity level could be year, month, quarter, period, week, and day.
定义粒度这一过程包含两步 −
The process of defining granularity consists of two steps −
-
Determining the dimensions that are to be included.
-
Determining the location to place the hierarchy of each dimension of information.
Slowly Changing Dimensions
缓慢变化维度指的是某个属性值随着时间的变化。这是数据仓库中的一个常见概念。
Slowly changing dimensions refer to changing value of an attribute over time. It is one of the common concepts in DW.
Example
Andy是XYZ公司的一名员工。他最早于2015年7月在纽约市工作。员工查找表中关于他的原始条目包含以下记录 −
Andy is an employee of XYZ Inc. He was first located in New York City in July 2015. Original entry in the employee lookup table has the following record −
Employee ID |
10001 |
Name |
Andy |
Location |
New York |
在某个时间,他已搬迁至加州洛杉矶。XYZ公司现在该如何修改其员工表以反映这种改变?
At a later date, he has relocated to LA, California. How should XYZ Inc. now modify its employee table to reflect this change?
这就是“缓慢变化维度”概念。
This is known as "Slowly Changing Dimension" concept.
有三种解决方法可以解决此类问题 −
There are three ways to solve this type of problem −
Solution 1
新记录替换原始记录。旧记录不可追溯。
The new record replaces the original record. No trace of the old record exists.
缓慢变化维度,新信息直接覆盖原始信息。换言之,不会保留历史。
Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.
Employee ID |
10001 |
Name |
Andy |
Location |
LA, California |
-
Benefit − This is the easiest way to handle the Slowly Changing Dimension problem as there is no need to keep track of the old information.
-
Disadvantage − All historical information is lost.
-
Use − Solution 1 should be used when it is not required for DW to keep track of historical information.
Solution 2
在新员工维度表中录入一条新记录。在这种方法中,员工Andy将被视为两人。
A new record is entered into the Employee dimension table. So the employee, Andy, is treated as two people.
向表中添加一条新记录以表示新信息,原始记录和新记录将同时存在。新记录将获得自己的主键,如下所示 −
A new record is added to the table to represent the new information and both the original and new record will be present. The new record gets its own primary key as follows −
Employee ID |
10001 |
10002 |
Name |
Andy |
Andy |
Location |
New York |
LA, California |
-
Benefit − This method allows us to store all the historical information.
-
Disadvantage − Size of the table grows faster. When the number of rows for the table is very high, space and performance of table can be a concern.
-
Use − Solution 2 should be used when it is necessary for DW to keep historical data.
Solution 3
修改员工维度表中的原始记录以反映变化。
The original record in Employee dimension is modified to reflect the change.
将有两列表示特定属性,一列表示原始值,另一列表示新值。此外还将有一列指出当前值何时生效。
There will be two columns to indicate the particular attribute, one indicates original value and other indicates the new value. There will also be a column that indicates when the current value becomes active.
Employee ID |
Name |
Original Location |
New Location |
Date Moved |
10001 |
Andy |
New York |
LA, California |
July 2015 |
-
Benefits − This does not increase the size of the table, since new information is updated. This allows us to keep historical information.
-
Disadvantage − This method doesn’t keep all history when an attribute value is changed more than once.
-
Use − Solution 3 should only be used when it is required for DW to keep information of historical changes.
Normalization
规范化是将表分解成冗余度较小的更小表的过程,而不会丢失任何信息。所以数据库规范化是组织数据库的属性和表的过程,以最大程度地减少数据冗余(重复数据)。
Normalization is the process of decomposing a table into less redundant smaller tables without losing any information. So Database normalization is the process of organizing the attributes and tables of a database to minimize data redundancy (duplicate data).
Purpose of Normalization
-
It is used to eliminate certain types of data (redundancy/ replication) to improve consistency.
-
It provides maximum flexibility to meet future information needs by keeping tables corresponding to object types in their simplified forms.
-
It produces a clearer and readable data model.
Advantages
-
Data integrity.
-
Enhances data consistency.
-
Reduces data redundancy and space required.
-
Reduces update cost.
-
Maximum flexibility in responding to ad-hoc queries.
-
Reduces the total number of rows per block.
Disadvantages
数据库中查询性能缓慢,因为必须执行联接才能从几个规范化表中检索相关数据。
Slow performance of queries in database because joins have to be performed to retrieve relevant data from several normalized tables.
必须了解数据模型才能在几个表之间执行适当的联接。
You have to understand the data model in order to perform proper joins among several tables.