Dwh 简明教程
Data Warehousing - Process Managers
进程管理器负责维护进出数据仓库的数据流。有三种不同类型的进程管理器:
Process managers are responsible for maintaining the flow of data both into and out of the data warehouse. There are three different types of process managers −
-
Load manager
-
Warehouse manager
-
Query manager
Data Warehouse Load Manager
加载管理器执行从数据库中提取和加载数据的所需操作。加载管理器的规模和复杂性因从一个数据仓库到另一个数据仓库的特定解决方案而异。
Load manager performs the operations required to extract and load the data into the database. The size and complexity of a load manager varies between specific solutions from one data warehouse to another.
Load Manager Architecture
负载管理器执行以下函数 -
The load manager does performs the following functions −
-
Extract data from the source system.
-
Fast load the extracted data into temporary data store.
-
Perform simple transformations into structure similar to the one in the data warehouse.
Extract Data from Source
数据是从操作数据库或外部信息提供程序中提取的。网关是指用于提取数据应用程序程序。它受底层 DBMS 支持,并允许客户端程序生成要在服务器上执行的 SQL。开放数据库连接 (ODBC) 和 Java 数据库连接 (JDBC) 都是网关的示例。
The data is extracted from the operational databases or the external information providers. Gateways are the application programs that are used to extract data. It is supported by underlying DBMS and allows the client program to generate SQL to be executed at a server. Open Database Connection (ODBC) and Java Database Connection (JDBC) are examples of gateway.
Fast Load
-
In order to minimize the total load window, the data needs to be loaded into the warehouse in the fastest possible time.
-
Transformations affect the speed of data processing.
-
It is more effective to load the data into a relational database prior to applying transformations and checks.
-
Gateway technology is not suitable, since they are inefficient when large data volumes are involved.
Simple Transformations
在加载时,可能需要执行简单转换。在完成简单转换后,我们可以做复杂检查。假设我们正在加载 EPOS 销售交易,我们需要执行以下检查 -
While loading, it may be required to perform simple transformations. After completing simple transformations, we can do complex checks. Suppose we are loading the EPOS sales transaction, we need to perform the following checks −
-
Strip out all the columns that are not required within the warehouse.
-
Convert all the values to required data types.
Warehouse Manager
仓库管理员负责仓库管理流程,它由第三方系统软件、C 程序与 Shell 脚本组成。仓库管理员的大小和复杂性因具体解决方案而异。
The warehouse manager is responsible for the warehouse management process. It consists of a third-party system software, C programs, and shell scripts. The size and complexity of a warehouse manager varies between specific solutions.
Warehouse Manager Architecture
仓库管理员包括以下内容 -
A warehouse manager includes the following −
-
The controlling process
-
Stored procedures or C with SQL
-
Backup/Recovery tool
-
SQL scripts
Functions of Warehouse Manager
仓库管理员执行以下函数 -
A warehouse manager performs the following functions −
-
Analyzes the data to perform consistency and referential integrity checks.
-
Creates indexes, business views, partition views against the base data.
-
Generates new aggregations and updates the existing aggregations.
-
Generates normalizations.
-
Transforms and merges the source data of the temporary store into the published data warehouse.
-
Backs up the data in the data warehouse.
-
Archives the data that has reached the end of its captured life.
Note - 仓库管理员分析查询配置文件以确定索引和聚合是否合适。
Note − A warehouse Manager analyzes query profiles to determine whether the index and aggregations are appropriate.
Query Manager
查询管理器负责将查询指向合适的表。通过将查询指向合适的表,它可以加快查询请求和响应的过程。此外,查询管理器还负责调度用户发布的查询的执行。
The query manager is responsible for directing the queries to suitable tables. By directing the queries to appropriate tables, it speeds up the query request and response process. In addition, the query manager is responsible for scheduling the execution of the queries posted by the user.