Kho dữ liệu - Sao lưu

Kho dữ liệu là một hệ thống phức tạp và nó chứa một khối lượng dữ liệu khổng lồ. Do đó, điều quan trọng là phải sao lưu tất cả dữ liệu để có thể khôi phục trong tương lai theo yêu cầu. Trong chương này, chúng ta sẽ thảo luận các vấn đề trong việc thiết kế chiến lược dự phòng.

Thuật ngữ dự phòng

Trước khi tiếp tục, bạn nên biết một số thuật ngữ sao lưu được thảo luận bên dưới.

  • Complete backup- Nó sao lưu toàn bộ cơ sở dữ liệu cùng một lúc. Bản sao lưu này bao gồm tất cả các tệp cơ sở dữ liệu, tệp điều khiển và tệp tạp chí.

  • Partial backup- Như tên cho thấy, nó không tạo ra một bản sao lưu hoàn chỉnh của cơ sở dữ liệu. Sao lưu một phần rất hữu ích trong cơ sở dữ liệu lớn vì chúng cho phép một chiến lược theo đó các phần khác nhau của cơ sở dữ liệu được sao lưu theo kiểu tuần hoàn hàng ngày, để toàn bộ cơ sở dữ liệu được sao lưu hiệu quả mỗi tuần một lần.

  • Cold backup- Sao lưu nguội được thực hiện trong khi cơ sở dữ liệu hoàn toàn tắt. Trong môi trường đa phiên bản, tất cả các phiên bản phải được tắt.

  • Hot backup- Sao lưu nóng được thực hiện khi công cụ cơ sở dữ liệu đang hoạt động. Các yêu cầu của sao lưu nóng thay đổi từ RDBMS đến RDBMS.

  • Online backup − It is quite similar to hot backup.

Hardware Backup

It is important to decide which hardware to use for the backup. The speed of processing the backup and restore depends on the hardware being used, how the hardware is connected, bandwidth of the network, backup software, and the speed of server's I/O system. Here we will discuss some of the hardware choices that are available and their pros and cons. These choices are as follows −

  • Tape Technology
  • Disk Backups

Tape Technology

The tape choice can be categorized as follows −

  • Tape media
  • Standalone tape drives
  • Tape stackers
  • Tape silos

Tape Media

There exists several varieties of tape media. Some tape media standards are listed in the table below −

Tape Media Capacity I/O rates
DLT 40 GB 3 MB/s
3490e 1.6 GB 3 MB/s
8 mm 14 GB 1 MB/s

Other factors that need to be considered are as follows −

  • Reliability of the tape medium
  • Cost of tape medium per unit
  • Scalability
  • Cost of upgrades to tape system
  • Cost of tape medium per unit
  • Shelf life of tape medium

Standalone Tape Drives

The tape drives can be connected in the following ways −

  • Direct to the server
  • As network available devices
  • Remotely to other machine

There could be issues in connecting the tape drives to a data warehouse.

  • Consider the server is a 48node MPP machine. We do not know the node to connect the tape drive and we do not know how to spread them over the server nodes to get the optimal performance with least disruption of the server and least internal I/O latency.

  • Connecting the tape drive as a network available device requires the network to be up to the job of the huge data transfer rates. Make sure that sufficient bandwidth is available during the time you require it.

  • Connecting the tape drives remotely also require high bandwidth.

Tape Stackers

The method of loading multiple tapes into a single tape drive is known as tape stackers. The stacker dismounts the current tape when it has finished with it and loads the next tape, hence only one tape is available at a time to be accessed. The price and the capabilities may vary, but the common ability is that they can perform unattended backups.

Tape Silos

Tape silos provide large store capacities. Tape silos can store and manage thousands of tapes. They can integrate multiple tape drives. They have the software and hardware to label and store the tapes they store. It is very common for the silo to be connected remotely over a network or a dedicated link. We should ensure that the bandwidth of the connection is up to the job.

Disk Backups

Methods of disk backups are −

  • Disk-to-disk backups
  • Mirror breaking

These methods are used in the OLTP system. These methods minimize the database downtime and maximize the availability.

Disk-to-Disk Backups

Here backup is taken on the disk rather on the tape. Disk-to-disk backups are done for the following reasons −

  • Speed of initial backups
  • Speed of restore

Backing up the data from disk to disk is much faster than to the tape. However it is the intermediate step of backup. Later the data is backed up on the tape. The other advantage of disk-to-disk backups is that it gives you an online copy of the latest backup.

Mirror Breaking

The idea is to have disks mirrored for resilience during the working day. When backup is required, one of the mirror sets can be broken out. This technique is a variant of disk-to-disk backups.

Note − The database may need to be shutdown to guarantee consistency of the backup.

Optical Jukeboxes

Optical jukeboxes allow the data to be stored near line. This technique allows a large number of optical disks to be managed in the same way as a tape stacker or a tape silo. The drawback of this technique is that it has slow write speed than disks. But the optical media provides long-life and reliability that makes them a good choice of medium for archiving.

Software Backups

There are software tools available that help in the backup process. These software tools come as a package. These tools not only take backup, they can effectively manage and control the backup strategies. There are many software packages available in the market. Some of them are listed in the following table −

Package Name Vendor
Networker Legato
ADSM IBM
Epoch Epoch Systems
Omniback II HP
Alexandria Sequent

Criteria for Choosing Software Packages

The criteria for choosing the best software package are listed below −

  • How scalable is the product as tape drives are added?
  • Does the package have client-server option, or must it run on the database server itself?
  • Will it work in cluster and MPP environments?
  • What degree of parallelism is required?
  • What platforms are supported by the package?
  • Does the package support easy access to information about tape contents?
  • Is the package database aware?
  • What tape drive and tape media are supported by the package?