Dwh 简明教程
Data Warehousing - Terminologies
在本章中,我们将讨论数据仓库中最常用的几个术语。
In this chapter, we will discuss some of the most commonly used terms in data warehousing.
Metadata
元数据简单地定义为关于数据的数据。用于表示其他数据的数据称为元数据。例如,书籍的索引是书中内容的元数据。换句话说,我们可以说元数据是将我们引至详细数据的大致数据。
Metadata is simply defined as data about data. The data that are used to represent other data is known as metadata. For example, the index of a book serves as a metadata for the contents in the book. In other words, we can say that metadata is the summarized data that leads us to the detailed data.
在数据仓库方面,我们可以将元数据定义如下 −
In terms of data warehouse, we can define metadata as following −
-
Metadata is a road-map to data warehouse.
-
Metadata in data warehouse defines the warehouse objects.
-
Metadata acts as a directory. This directory helps the decision support system to locate the contents of a data warehouse.
Metadata Repository
元数据存储库是数据仓库系统的一个组成部分。它包含以下元数据 −
Metadata repository is an integral part of a data warehouse system. It contains the following metadata −
-
Business metadata − It contains the data ownership information, business definition, and changing policies.
-
Operational metadata − It includes currency of data and data lineage. Currency of data refers to the data being active, archived, or purged. Lineage of data means history of data migrated and transformation applied on it.
-
Data for mapping from operational environment to data warehouse − It metadata includes source databases and their contents, data extraction, data partition, cleaning, transformation rules, data refresh and purging rules.
-
The algorithms for summarization − It includes dimension algorithms, data on granularity, aggregation, summarizing, etc.
Data Cube
数据立方体帮助我们在多维度表示数据。它通过维度和事实来定义。维度是企业按其保存记录的实体。
A data cube helps us represent data in multiple dimensions. It is defined by dimensions and facts. The dimensions are the entities with respect to which an enterprise preserves the records.
Illustration of Data Cube
假设一家公司希望借助销售数据仓库来跟踪销售记录,并按时间、项目、分支和位置。这些维度允许跟踪月度销售情况以及项目售出的分支。每个维度都关联一张表。这张表称为维度表。例如,“项目”维度表可能具有诸如项目名称、项目类型、项目品牌之类的属性。
Suppose a company wants to keep track of sales records with the help of sales data warehouse with respect to time, item, branch, and location. These dimensions allow to keep track of monthly sales and at which branch the items were sold. There is a table associated with each dimension. This table is known as dimension table. For example, "item" dimension table may have attributes such as item_name, item_type, and item_brand.
下表表示公司关于时间、项目和位置维度的销售数据的二维视图。
The following table represents the 2-D view of Sales Data for a company with respect to time, item, and location dimensions.
但此处的此二维表中,我们只有时间和项目才具有记录。新德里的销售情况根据销售项目类型按时间和项目维度显示。如果我们希望根据更多维度(比如位置维度)查看销售数据,那么三维视图将很有用。下方的表格显示了关于时间、项目和位置的三维销售数据视图 −
But here in this 2-D table, we have records with respect to time and item only. The sales for New Delhi are shown with respect to time, and item dimensions according to type of items sold. If we want to view the sales data with one more dimension, say, the location dimension, then the 3-D view would be useful. The 3-D view of the sales data with respect to time, item, and location is shown in the table below −
以上三维表格可以如以下图形所示表示为三维数据立方体 −
The above 3-D table can be represented as 3-D data cube as shown in the following figure −
Data Mart
数据超市包含组织范围内对组织中特定人群有价值的数据子集。换句话说,数据超市仅包含特定于特定人群的数据。例如,市场数据超市可能仅包含与项目、客户和销售有关的数据。数据超市局限于主题。
Data marts contain a subset of organization-wide data that is valuable to specific groups of people in an organization. In other words, a data mart contains only those data that is specific to a particular group. For example, the marketing data mart may contain only data related to items, customers, and sales. Data marts are confined to subjects.
Points to Remember About Data Marts
-
Windows-based or Unix/Linux-based servers are used to implement data marts. They are implemented on low-cost servers.
-
The implementation cycle of a data mart is measured in short periods of time, i.e., in weeks rather than months or years.
-
The life cycle of data marts may be complex in the long run, if their planning and design are not organization-wide.
-
Data marts are small in size.
-
Data marts are customized by department.
-
The source of a data mart is departmentally structured data warehouse.
-
Data marts are flexible.
下图显示了数据市集的图形表示形式。
The following figure shows a graphical representation of data marts.