Dwh 简明教程

Data Warehousing - Backup

数据仓库是一个复杂的系统,它包含大量数据。因此,备份所有数据非常重要,以便日后可以根据要求进行恢复。在本章中,我们将讨论设计备份策略中的问题。

A data warehouse is a complex system and it contains a huge volume of data. Therefore it is important to back up all the data so that it becomes available for recovery in future as per requirement. In this chapter, we will discuss the issues in designing the backup strategy.

Backup Terminologies

在进一步操作之前,您应该了解下面讨论的一些备份术语。

Before proceeding further, you should know some of the backup terminologies discussed below.

  1. Complete backup − It backs up the entire database at the same time. This backup includes all the database files, control files, and journal files.

  2. Partial backup − As the name suggests, it does not create a complete backup of the database. Partial backup is very useful in large databases because they allow a strategy whereby various parts of the database are backed up in a round-robin fashion on a day-to-day basis, so that the whole database is backed up effectively once a week.

  3. Cold backup − Cold backup is taken while the database is completely shut down. In multi-instance environment, all the instances should be shut down.

  4. Hot backup − Hot backup is taken when the database engine is up and running. The requirements of hot backup varies from RDBMS to RDBMS.

  5. Online backup − It is quite similar to hot backup.

Hardware Backup

决定使用哪种硬件进行备份非常重要。备份和恢复处理速度取决于所使用的硬件、硬件连接方式、网络带宽、备份软件和服务器 I/O 系统的速度。在这里,我们将讨论一些可用的硬件选择及其优缺点。这些选择如下所示−

It is important to decide which hardware to use for the backup. The speed of processing the backup and restore depends on the hardware being used, how the hardware is connected, bandwidth of the network, backup software, and the speed of server’s I/O system. Here we will discuss some of the hardware choices that are available and their pros and cons. These choices are as follows −

  1. Tape Technology

  2. Disk Backups

Tape Technology

磁带选择可以分为以下几类−

The tape choice can be categorized as follows −

  1. Tape media

  2. Standalone tape drives

  3. Tape stackers

  4. Tape silos

Tape Media

Tape Media

磁带介质有几种,下表列出了部分磁带介质标准−

There exists several varieties of tape media. Some tape media standards are listed in the table below −

Tape Media

Capacity

I/O rates

DLT

40 GB

3 MB/s

3490e

1.6 GB

3 MB/s

8 mm

14 GB

1 MB/s

需要考虑的其他因素如下−

Other factors that need to be considered are as follows −

  1. Reliability of the tape medium

  2. Cost of tape medium per unit

  3. Scalability

  4. Cost of upgrades to tape system

  5. Cost of tape medium per unit

  6. Shelf life of tape medium

Standalone Tape Drives

Standalone Tape Drives

磁带驱动器可以通过以下方式连接 −

The tape drives can be connected in the following ways −

  1. Direct to the server

  2. As network available devices

  3. Remotely to other machine

将磁带驱动器连接到数据仓库可能会存在问题。

There could be issues in connecting the tape drives to a data warehouse.

  1. Consider the server is a 48node MPP machine. We do not know the node to connect the tape drive and we do not know how to spread them over the server nodes to get the optimal performance with least disruption of the server and least internal I/O latency.

  2. Connecting the tape drive as a network available device requires the network to be up to the job of the huge data transfer rates. Make sure that sufficient bandwidth is available during the time you require it.

  3. Connecting the tape drives remotely also require high bandwidth.

Tape Stackers

将多个磁带加载到单个磁带驱动器的做法称为磁带堆叠器。堆叠器在处理完当前磁带后将其卸载,然后加载下一个磁带,因此一次只能访问一个磁带。价格和功能可能有所不同,但共同点是它们可以执行无人值守的备份。

The method of loading multiple tapes into a single tape drive is known as tape stackers. The stacker dismounts the current tape when it has finished with it and loads the next tape, hence only one tape is available at a time to be accessed. The price and the capabilities may vary, but the common ability is that they can perform unattended backups.

Tape Silos

磁带孤岛提供大存储容量。磁带孤岛可以存储和管理数千个磁带。它们可以集成多个磁带驱动器。它们拥有标记和存储其存储磁带的软件和硬件。孤岛通过网络或专线远程连接非常常见。我们应确保连接的带宽能够胜任任务。

Tape silos provide large store capacities. Tape silos can store and manage thousands of tapes. They can integrate multiple tape drives. They have the software and hardware to label and store the tapes they store. It is very common for the silo to be connected remotely over a network or a dedicated link. We should ensure that the bandwidth of the connection is up to the job.

Disk Backups

磁盘备份的方法有 −

Methods of disk backups are −

  1. Disk-to-disk backups

  2. Mirror breaking

这些方法用于 OLTP 系统中。这些方法最大程度地减少数据库停机时间,最大限度地提高可用性。

These methods are used in the OLTP system. These methods minimize the database downtime and maximize the availability.

Disk-to-Disk Backups

Disk-to-Disk Backups

在此,备份是在磁盘上进行,而不是在磁带上进行。磁盘到磁盘的备份出于以下原因进行 −

Here backup is taken on the disk rather on the tape. Disk-to-disk backups are done for the following reasons −

  1. Speed of initial backups

  2. Speed of restore

从磁盘备份数据到磁盘比备份到磁带快得多。然而,这是备份的中间步骤。稍后,数据将备份到磁带上。磁盘到磁盘备份的另一个优点是,它为你提供了最新备份的在线副本。

Backing up the data from disk to disk is much faster than to the tape. However it is the intermediate step of backup. Later the data is backed up on the tape. The other advantage of disk-to-disk backups is that it gives you an online copy of the latest backup.

Mirror Breaking

Mirror Breaking

其理念是,在工作日内对磁盘进行镜像以提升弹性。当需要备份时,可以中断其中一个镜像集。此技术是磁盘到磁盘备份的一种变体。

The idea is to have disks mirrored for resilience during the working day. When backup is required, one of the mirror sets can be broken out. This technique is a variant of disk-to-disk backups.

Note − 数据库可能需要关闭以确保备份的一致性。

Note − The database may need to be shutdown to guarantee consistency of the backup.

Optical Jukeboxes

光盘自动换碟器允许将数据存储在近线上。此技术允许大量的光盘以与磁带堆叠器或磁带孤岛相同的方式进行管理。此技术的缺点是其写入速度比磁盘慢。但光盘介质提供了长寿命和可靠性,这使其成为归档的良好介质选择。

Optical jukeboxes allow the data to be stored near line. This technique allows a large number of optical disks to be managed in the same way as a tape stacker or a tape silo. The drawback of this technique is that it has slow write speed than disks. But the optical media provides long-life and reliability that makes them a good choice of medium for archiving.

Software Backups

有可用于帮助备份过程的软件工具。这些软件工具作为一个包提供。这些工具不仅可以进行备份,还可以有效地管理和控制备份策略。市场上有许多软件包可用。其中一些列在以下表格中 −

There are software tools available that help in the backup process. These software tools come as a package. These tools not only take backup, they can effectively manage and control the backup strategies. There are many software packages available in the market. Some of them are listed in the following table −

Package Name

Vendor

Networker

Legato

ADSM

IBM

Epoch

Epoch Systems

Omniback II

HP

Alexandria

Sequent

Criteria for Choosing Software Packages

选择最佳软件包的标准如下 −

The criteria for choosing the best software package are listed below −

  1. How scalable is the product as tape drives are added?

  2. Does the package have client-server option, or must it run on the database server itself?

  3. Will it work in cluster and MPP environments?

  4. What degree of parallelism is required?

  5. What platforms are supported by the package?

  6. Does the package support easy access to information about tape contents?

  7. Is the package database aware?

  8. What tape drive and tape media are supported by the package?