Dwh 简明教程
Data Warehousing - Security
数据仓库的目标是让用户可以轻松访问大量数据,从而让用户能够提取有关整个业务的信息。但我们知道,可以对数据应用一些安全限制,这可能会成为访问信息时的障碍。如果分析师对数据的看法受到限制,那么就不可能全面了解业务中的趋势。
The objective of a data warehouse is to make large amounts of data easily accessible to the users, hence allowing the users to extract information about the business as a whole. But we know that there could be some security restrictions applied on the data that can be an obstacle for accessing the information. If the analyst has a restricted view of data, then it is impossible to capture a complete picture of the trends within the business.
可以汇总每个分析师的数据并传递给管理层,在管理层可以汇总不同的汇总。由于汇总汇总与整体汇总不同,因此有可能错过数据中的一些信息趋势,除非有人整体分析数据。
The data from each analyst can be summarized and passed on to management where the different summaries can be aggregated. As the aggregations of summaries cannot be the same as that of the aggregation as a whole, it is possible to miss some information trends in the data unless someone is analyzing the data as a whole.
Security Requirements
添加安全功能会影响数据仓库的性能,因此尽早确定安全需求非常重要。在数据仓库正式投入使用后添加安全功能很困难。
Adding security features affect the performance of the data warehouse, therefore it is important to determine the security requirements as early as possible. It is difficult to add security features after the data warehouse has gone live.
在数据仓库的设计阶段,我们应该记住将来可能添加哪些数据源,以及添加这些数据源会产生什么影响。我们在设计阶段应该考虑以下可能性。
During the design phase of the data warehouse, we should keep in mind what data sources may be added later and what would be the impact of adding those data sources. We should consider the following possibilities during the design phase.
-
Whether the new data sources will require new security and/or audit restrictions to be implemented?
-
Whether the new users added who have restricted access to data that is already generally available?
当未来的用户和数据源不明确时,会出现这种情况。在这种情况下,我们需要利用业务知识和数据仓库的目标来了解可能的条件。
This situation arises when the future users and the data sources are not well known. In such a situation, we need to use the knowledge of business and the objective of data warehouse to know likely requirements.
以下活动会受到安全措施的影响 −
The following activities get affected by security measures −
-
User access
-
Data load
-
Data movement
-
Query generation
User Access
我们需要先对数据进行分类,然后再根据用户可以访问的数据对用户进行分类。换句话说,用户会根据其可访问的数据进行分类。
We need to first classify the data and then classify the users on the basis of the data they can access. In other words, the users are classified according to the data they can access.
Data Classification
Data Classification
可以使用以下两种方法对数据进行分类 −
The following two approaches can be used to classify the data −
-
Data can be classified according to its sensitivity. Highly-sensitive data is classified as highly restricted and less-sensitive data is classified as less restrictive.
-
Data can also be classified according to the job function. This restriction allows only specific users to view particular data. Here we restrict the users to view only that part of the data in which they are interested and are responsible for.
第二种方法存在一些问题。为了理解,我们举个例子。假设您正在为银行构建数据仓库。请考虑存储在数据仓库中的数据是所有帐户的交易数据。问题是,谁可以查看交易数据。解决方案在于根据职能对数据进行分类。
There are some issues in the second approach. To understand, let’s have an example. Suppose you are building the data warehouse for a bank. Consider that the data being stored in the data warehouse is the transaction data for all the accounts. The question here is, who is allowed to see the transaction data. The solution lies in classifying the data according to the function.
User classification
User classification
下列方法可用于对用户分类 −
The following approaches can be used to classify the users −
-
Users can be classified as per the hierarchy of users in an organization, i.e., users can be classified by departments, sections, groups, and so on.
-
Users can also be classified according to their role, with people grouped across departments based on their role.
Classification on basis of Department
Classification on basis of Department
让我们举一个数据仓库的示例,其中用户来自销售和营销部门。我们可以通过自上而下的公司视图进行安全管理,访问权限以不同部门为中心。但对不同级别的用户可能会有一些限制。此结构如下图所示。
Let’s have an example of a data warehouse where the users are from sales and marketing department. We can have security by top-to-down company view, with access centered on the different departments. But there could be some restrictions on users at different levels. This structure is shown in the following diagram.
但是,如果每个部门访问不同的数据,那么我们应该为每个部门单独设计安全访问权限。可以通过部门数据市实现此目的。由于这些数据市与数据仓库分离,因此我们可以在每个数据市上强制执行单独的安全限制。此方法如下图所示。
But if each department accesses different data, then we should design the security access for each department separately. This can be achieved by departmental data marts. Since these data marts are separated from the data warehouse, we can enforce separate security restrictions on each data mart. This approach is shown in the following figure.
Classification Based on Role
Classification Based on Role
如果数据通常对所有部门可用,那么遵循角色访问层次结构十分有用。换句话说,如果数据通常由所有部门访问,那么按照用户的角色应用安全限制。角色访问层次结构如下图所示。
If the data is generally available to all the departments, then it is useful to follow the role access hierarchy. In other words, if the data is generally accessed by all the departments, then apply security restrictions as per the role of the user. The role access hierarchy is shown in the following figure.
Audit Requirements
审计是安全性的子集,也是一项成本高昂的活动。审计可能会导致系统的大量开销。为了及时完成审计,我们需要更多的硬件,因此,建议在可能的情况下关闭审计。审计要求可分类如下 −
Auditing is a subset of security, a costly activity. Auditing can cause heavy overheads on the system. To complete an audit in time, we require more hardware and therefore, it is recommended that wherever possible, auditing should be switched off. Audit requirements can be categorized as follows −
-
Connections
-
Disconnections
-
Data access
-
Data change
Note −对于上述每个类别,都需要审计成功、失败或两者。从安全性的角度考虑,审计失败非常重要。审计失败很重要,因为它们可以突出显示未经授权或欺诈性访问。
Note − For each of the above-mentioned categories, it is necessary to audit success, failure, or both. From the perspective of security reasons, the auditing of failures are very important. Auditing of failure is important because they can highlight unauthorized or fraudulent access.
Network Requirements
网络安全与其他安全性一样重要。我们不能忽视网络安全要求。我们需要考虑以下问题 −
Network security is as important as other securities. We cannot ignore the network security requirement. We need to consider the following issues −
-
Is it necessary to encrypt data before transferring it to the data warehouse?
-
Are there restrictions on which network routes the data can take?
这些限制需要谨慎考虑。以下是需要记住的要点 −
These restrictions need to be considered carefully. Following are the points to remember −
-
The process of encryption and decryption will increase overheads. It would require more processing power and processing time.
-
The cost of encryption can be high if the system is already a loaded system because the encryption is borne by the source system.
Data Movement
在移动数据时存在潜在的安全隐患。假设我们需要传输一些受限数据作为需要加载的平面文件。将数据加载到数据仓库时,会产生以下问题 −
There exist potential security implications while moving the data. Suppose we need to transfer some restricted data as a flat file to be loaded. When the data is loaded into the data warehouse, the following questions are raised −
-
Where is the flat file stored?
-
Who has access to that disk space?
如果我们讨论这些平面文件备份,将出现以下问题 -
If we talk about the backup of these flat files, the following questions are raised −
-
Do you backup encrypted or decrypted versions?
-
Do these backups need to be made to special tapes that are stored separately?
-
Who has access to these tapes?
数据移动的其他一些形式(如查询结果集)也需要考虑。创建临时表时提出的问题如下 -
Some other forms of data movement like query result sets also need to be considered. The questions raised while creating the temporary table are as follows −
-
Where is that temporary table to be held?
-
How do you make such table visible?
我们应该避免意外违背安全限制。如果一个可以访问受限数据的用户可以生成可访问的临时表,数据将对未授权用户可见。通过为有权访问受限数据的用户提供一个单独的临时区域,我们可以解决这个问题。
We should avoid the accidental flouting of security restrictions. If a user with access to the restricted data can generate accessible temporary tables, data can be visible to non-authorized users. We can overcome this problem by having a separate temporary area for users with access to restricted data.
Documentation
审计和安全要求需要得到妥善的记录。这将被视为证明的一部分。本文件可包含从下列信息收集的所有信息 -
The audit and security requirements need to be properly documented. This will be treated as a part of justification. This document can contain all the information gathered from −
-
Data classification
-
User classification
-
Network requirements
-
Data movement and storage requirements
-
All auditable actions
Impact of Security on Design
安全性会影响应用程序代码和开发时间表。安全性影响以下领域 -
Security affects the application code and the development timescales. Security affects the following area −
-
Application development
-
Database design
-
Testing
Application Development
安全性会影响整体应用程序开发,并且也会影响数据仓库中加载管理器、仓库管理器和查询管理器等重要组件的设计。加载管理器可能需要检查代码以过滤记录并将其放在不同位置。也可能需要更多转换规则来隐藏某些数据。也可能需要额外的元数据来处理任何额外对象。
Security affects the overall application development and it also affects the design of the important components of the data warehouse such as load manager, warehouse manager, and query manager. The load manager may require checking code to filter record and place them in different locations. More transformation rules may also be required to hide certain data. Also there may be requirements of extra metadata to handle any extra objects.
为了创建和维护额外的视图,仓库管理器可能需要额外的代码来实施安全性。可能必须将额外检查编码到数据仓库中,以防止数据仓库在不应该提供的位置移动数据时受到愚弄。查询管理器需要进行更改以处理任何访问限制。查询管理器需要了解所有额外视图和聚合。
To create and maintain extra views, the warehouse manager may require extra codes to enforce security. Extra checks may have to be coded into the data warehouse to prevent it from being fooled into moving data into a location where it should not be available. The query manager requires the changes to handle any access restrictions. The query manager will need to be aware of all extra views and aggregations.
Database design
数据库布局也会受到影响,因为在实施安全措施后,视图和表的数量会增加。增加安全性会增加数据库的大小,从而增加数据库设计和管理的复杂性。它还将增加备份管理和恢复计划的复杂性。
The database layout is also affected because when security measures are implemented, there is an increase in the number of views and tables. Adding security increases the size of the database and hence increases the complexity of the database design and management. It will also add complexity to the backup management and recovery plan.
Testing
测试数据仓库是一个复杂且漫长的过程。向数据仓库中增加安全性也会影响测试时间复杂度。它在以下两个方面影响测试 -
Testing the data warehouse is a complex and lengthy process. Adding security to the data warehouse also affects the testing time complexity. It affects the testing in the following two ways −
-
It will increase the time required for integration and system testing.
-
There is added functionality to be tested which will increase the size of the testing suite.