Dwh 简明教程

Data Warehousing - Metadata Concepts

What is Metadata?

元数据简单地被定义为关于数据的数据。用于表示其他数据的数据称为元数据。例如,一本的索引充当了书中内容的元数据。换言之,我们可以说元数据是将我们引导到详细数据的汇总数据。就数据仓库而言,我们可以将元数据定义如下。

Metadata is simply defined as data about data. The data that is used to represent other data is known as metadata. For example, the index of a book serves as a metadata for the contents in the book. In other words, we can say that metadata is the summarized data that leads us to detailed data. In terms of data warehouse, we can define metadata as follows.

  1. Metadata is the road-map to a data warehouse.

  2. Metadata in a data warehouse defines the warehouse objects.

  3. Metadata acts as a directory. This directory helps the decision support system to locate the contents of a data warehouse.

Note − 在数据仓库中,我们创建元数据,用于给定数据仓库的数据名称和定义。在此元数据中,我们还创建其他元数据,用于对提取的任何数据的时间戳记进行时间戳记和提取数据的源。

Note − In a data warehouse, we create metadata for the data names and definitions of a given data warehouse. Along with this metadata, additional metadata is also created for time-stamping any extracted data, the source of extracted data.

Categories of Metadata

元数据可以大致分为三类 −

Metadata can be broadly categorized into three categories −

  1. Business Metadata − It has the data ownership information, business definition, and changing policies.

  2. Technical Metadata − It includes database system names, table and column names and sizes, data types and allowed values. Technical metadata also includes structural information such as primary and foreign key attributes and indices.

  3. Operational Metadata − It includes currency of data and data lineage. Currency of data means whether the data is active, archived, or purged. Lineage of data means the history of data migrated and transformation applied on it.

metadata categories

Role of Metadata

元数据在数据仓库中发挥着非常重要的作用。元数据在仓库中的作用与仓库数据的作用不同,但它扮演着重要的作用。下面解释了元数据的各种作用。

Metadata has a very important role in a data warehouse. The role of metadata in a warehouse is different from the warehouse data, yet it plays an important role. The various roles of metadata are explained below.

  1. Metadata acts as a directory.

  2. This directory helps the decision support system to locate the contents of the data warehouse.

  3. Metadata helps in decision support system for mapping of data when data is transformed from operational environment to data warehouse environment.

  4. Metadata helps in summarization between current detailed data and highly summarized data.

  5. Metadata also helps in summarization between lightly detailed data and highly summarized data.

  6. Metadata is used for query tools.

  7. Metadata is used in extraction and cleansing tools.

  8. Metadata is used in reporting tools.

  9. Metadata is used in transformation tools.

  10. Metadata plays an important role in loading functions.

下图显示了元数据的角色。

The following diagram shows the roles of metadata.

metadata role

Metadata Repository

元数据存储库是数据仓库系统的一个组成部分。它具有以下元数据:

Metadata repository is an integral part of a data warehouse system. It has the following metadata −

  1. Definition of data warehouse − It includes the description of structure of data warehouse. The description is defined by schema, view, hierarchies, derived data definitions, and data mart locations and contents.

  2. Business metadata − It contains has the data ownership information, business definition, and changing policies.

  3. Operational Metadata − It includes currency of data and data lineage. Currency of data means whether the data is active, archived, or purged. Lineage of data means the history of data migrated and transformation applied on it.

  4. Data for mapping from operational environment to data warehouse − It includes the source databases and their contents, data extraction, data partition cleaning, transformation rules, data refresh and purging rules.

  5. Algorithms for summarization − It includes dimension algorithms, data on granularity, aggregation, summarizing, etc.

Challenges for Metadata Management

元数据的重要性不容忽视。元数据有助于提高报告的准确性、验证数据转换并确保计算准确性。元数据还强制终端用户遵守业务术语的定义。除了所有这些对元数据的应用,元数据也面临着挑战。以下讨论了部分挑战。

The importance of metadata can not be overstated. Metadata helps in driving the accuracy of reports, validates data transformation, and ensures the accuracy of calculations. Metadata also enforces the definition of business terms to business end-users. With all these uses of metadata, it also has its challenges. Some of the challenges are discussed below.

  1. Metadata in a big organization is scattered across the organization. This metadata is spread in spreadsheets, databases, and applications.

  2. Metadata could be present in text files or multimedia files. To use this data for information management solutions, it has to be correctly defined.

  3. There are no industry-wide accepted standards. Data management solution vendors have narrow focus.

  4. There are no easy and accepted methods of passing metadata.