In: Categories » Computers and technology » Servers » Hierarchical Storage Management
Avery interesting exercise for the system administrator in your spare time (ha ha) is to use OS utilities to generate an aging report for all of the user files in the various home directories under his or her watch. Search for files that have not been accessed in each of 2 weeks, 1 month, 3 months, 6 months, and 1year. Go back further if your systems have been around that long.
Consider this: Those files that haven’t been touched in 6 months or more still take up valuable time, space, and bandwidth during your backups and once again during restores. Apart from simply deleting them, what can you do to recapture the resources that these files consume? Welcome to the world of Hierarchical Storage Management (HSM). Hierarchical Storage Management is a grossly underused utility that provides a sort of automated archival system. An HSM process examines the most recent access date of the files in a filesystem and, based on rules set up and maintained by the system administrator, automatically migrates the files to a less expensive, more permanent, and slower medium. This medium may be specially dedicated tapes, writable CDs, magneto-optical disks, or some other not-quite-online storage medium. Left behind in the filesystem is a stub file, which is a special file that tells the HSM process to find the real file. Stub files are usually about a couple of kilobytes in size. Once a file is migrated, a user need only access the old file in the usual manner, and the system will either locate and mount the appropriate tape or place an operator request for a particular tape to be mounted. Once accessed, the file is returned to the local disk, and the clock on it starts again. While a file is migrated, it is no longer backed up; only its stub is. Obviously, the trade-off is that the first time a user needs to access a file, it may take several minutes, or longer, to retrieve the file from the offline medium. But once the file has been retrieved from the remote storage, it will be stored locally, and future accesses will take place at the normal rate.
There is a significant additional benefit to implementing HSM: Since files are not being stored on active disks, the disks will be used more efficiently. If 20 percent of the space on a filesystem can be migrated with HSM, then it’s as if the disks have grown by 20 percent, with no need to buy new disks. If properly implemented, apart from the delay in retrieving a file, HSM should be totally transparent to your users. A directory listing does not indicate that anything is unusual. The files appear to be present. Only when a user actually tries to access a file can any difference be detected. If, like so many other systems, your user directories are littered with large files that have been untouched for months, and you need to shrink your backup windows and loads, HSM may be worth a look.
Archives
Archives are similar to, though somewhat less sophisticated than, HSM. To archive a file, the file is written to tape (or some other offline medium) and deleted from the filesystem completely. Some external mechanism must be employed to maintain the location of archived files. Otherwise, there is a very real risk that the file could get lost. In some environments, users are allowed and even encouraged to maintain their own archive tapes. In others, the administrators maintain archives.
Synthetic Fulls
Incremental backups, as we discussed, are a very efficient way to back up file systems, since they require copying far less data to tape then fulls. Data that have not changed since the last full need not be backed up. Unfortunately, cumulative incremental backups can grow in size over time, until their size approaches that of a full. The number of tapes required to restore a set of differential incremental backups can also grow over time until the sheer number becomes overwhelming. The remedy to these problems is the same: Take full backups from time to time. What if you never had to take full backups of your production servers again? You can with synthetic full backups. In a synthetic full backup model, you keep a full copy of your filesystem on another node. You take a full backup of your system (curly) once, and restore it on a separate system (shemp). Then, you take nightly differential incremental backups of curly and restore them onto shemp. Then, you take a full backup of the filesystem from shemp. If something happens to curly, you can replace him with shemp. Shemp’s data should be identical to what was on curly, and since all the data is on a single tape, the time it takes to complete the restore will be greatly reduced, as compared to restoring a full and a bunch of incrementals.
For additional protection, you may choose to keep shemp at a different site from curly, although that’s not necessary if you send shemp’s backup tapes offsite. (No need to keep an on-site copy, since shemp itself fills that role quite admirably.) This model is certainly not for everyone, since it requires a great deal of extra hardware (system, disks, and tape drives), but it is a very effective way to reduce backup windows and restore times, a very elusive combination.
Use More Hardware
Vendors are developing new techniques and technologies, and exploiting old ones, to speed up backups through the use of additional hardware or clever software tricks. These techniques are of varying usefulness, depending on your environment. Some will surely work, while others could actually slow you down.
Host-Free Backups
The marketing hype for host-free backups (sometimes called backdoor or direct backups) says that they allow the backing up of data directly from disk to tape, without requiring an intervening CPU. The reality of host-free backups is that there must be a CPU someplace. There is no technology today that we have found that allows data to simply copy itself to tape. There must be a CPU to initiate and perform the copying work. Some hardware arrays have embedded CPUs that perform the work. Other solutions use dedicated data movers from companies like Chaparral, ADIC, or Crossroads, which contain CPUs that take care of the data copying. Other solutions use the host’s CPU to copy the data. While there may be advantages to using off-host CPUs to copy the data, it is not an absolute given, as many people assume, that the use of host-based CPU is bad. Some CPU must be used in order to move the data. The external CPU that is included in a disk array or data mover may be quite expensive, and it may be completely unnecessary if there are enough idle CPU cycles in the host CPU. And most systems have some extra CPU cycles to spare. What’s more, no matter which CPU does the work, the storage will still be called upon to perform I/Os, and there will likely be some overhead incurred from that work. If the copying (backing up) is done from disks that have been split off of the original copies, I/O overhead may still be evident in the SCSI or Fibre Channel connections and/or host bus adapters (HBAs), or over the networks between the clients, hosts, and tape drives. Today’s technology simply does not support true host-free backups. In the future, enough intelligence will be placed inside the disk arrays to allow them to push their data directly out to a tape drive, but even then, there is still a CPU involved. The advantage that any host-free backup offers is that the backup can be performed without impacting the CPU or memory performance of the host holding the data. However, that CPU may be the cheapest in your system. In many cases it will cost less to get some additional CPU for your host than it would to get additional CPU for your disk array or data mover. More and more backup software vendors are adding this capability to their software.
Third-Mirror Breakoff
Some hardware and software vendors have implemented a backup model that involves keeping three active disk mirrors of a filesystem or database. When it’s time for a backup, the third mirror gets split off from the first two. New transactions continue to be written to the first two mirrors, while the backup is started using the data on the third mirror. At the end of the backup, the third mirror must be resynchronized with the other two copies, so that it’s ready for the next backup. The good news about third-mirror breakoffs is that the backup can be taken without taking the database or filesystem out of service for more than a few seconds (and that is only to ensure that the third-mirror copy is clean and consistent). The downside is the additional disk space required. Instead of the 100 percent disk space overhead that is required for regular mirrors, third-mirror breakoff requires a 200 percent disk space overhead. To mirror 100GB of disk space, you need 200GB of total storage space. To perform third-mirror breakoff, you need 300GB of disk. In addition, the act of resynchronization is very I/Ointensive, and it can be CPU-intensive too, depending on the implementation, there may be a very noticeable CPU performance impact caused by the resynchronization process. The other variation of third-mirror breakoff is to create the third mirror before the backup begins, instead of after it is completed. The I/O and CPU overhead is greater in this case, since all of the work must be done up front, rather than gradually over time. The potential is there to save disk overhead, as one set of disks could be reused as the third mirror for more than one filesystem. EMC Symmetrix arrays use a utility called TimeFinder that allows this resynchronization to be performed off-host, eliminating the impact to your server’s CPU and allowing much faster performance than when the resynchronization is performed on a server. There will still be some I/O impact on the disks being copied. Third-mirror breakoff is a method that is especially appetizing to companies who make their livings selling expensive disk drives. It allows them to sell a lot more disk to perform functions that do not necessarily need extra disk space.
legal notice
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
Useful tools and features
related articles
Defining Downtime Definitions for downtime vary from gentle to tough, and from simple to complex. Easy definitions are often given in terms of failed components, such as the server itself, disks, the network, the operating system, or key applications. Stricter definitions may include slow server or network performance, the inability to restore backups, or simple data inaccessibility. We prefer a very strict definition for downtime: If a user cannot get her job done on time, the system is down. A computer syste...
2. File and Print Server Failures
Network Failures Networks are naturally susceptible to failures because they contain many components and are affected by the configuration of every component. Where, exactly, is your network? In the switch? The drop cables? Bounded by all of the network interface cards in your systems? Any of those physical components can break, resulting in network outages or, more maddeningly, intermittent network failures. Networks are also affected by configuration problems. Incorrect routing information, duplicate host...
3. Web and Application Server Failures
Web and Application Server Failures The bugs that can strike a database can also affect a web server. Of course, many web servers are part of client/server applications that query back-end database servers to service client requests. So, anything affecting the database server will have an adverse effect on the web server as well. However, there are many other places within the web server environment where things might go awry. There are many new places for bugs to crop up, including in the Common Gateway Interfa...
4. Your system fails because the operating system panics
Renewability Let’s say your system fails because the operating system panics. It reboots, restarts applications such as web servers and databases, and continues on as before the failure. What’s the probability of another failure due to an operating system panic? In all likelihood, it’s exactly the same as it was before the reboot. There are many cases, however, in which repairing a system changes the MTBF characteristics of the system, increasing the probability of another failure in the near-te...
5. Direct and Indirect Costs of Downtime
The Costs of Downtime The only way to convince the people who control the purse strings that there is value in protecting uptime is to approach the problem from a dollars-andcents perspective. In this section, we provide some ammunition that should help make the case to even the most stubborn manager. Direct Costs of Downtime The most obvious cost of downtime is probably not the most expensive one: lost user productivity. The actual cost of that downtime is dependent upon what work your user...
6. COST OF DOWNTIME IS NOT A CONSTANT
Further complicating matters is the fact that the cost of downtime is not a constant. We will assume it to be constant for the purposes of our calculations (it makes them much, much simpler), but in reality, the cost of downtime increases as the duration of an outage increases. Consider again the effects of downtime on an e-commerce site. If the site suffers a brief outage (a few seconds), the cost will be minimal, perhaps even negligible. An outage of a minute or less probably will not affect business too badly: All...
7. The Politics of Availability
To persuade others of the value of your ideas, it is necessary to delve into the dark, shadowy world of organizational politics. Fundamentally, this means that you achieve your goals by helping (or if you aren’t particularly scrupulous, appearing to help) others around you achieve their goals, so that they then help you achieve yours. Start Inside Probably the best way to convince others of the value of your ideas is to first convince them that your ideas will help them achieve their own goals. To do that, yo...
8. Rational case that explains in nontechnical terms
Start Building the Case Once you have learned what you need to know, the next step is to begin to put together a calm and rational case that explains in nontechnical terms what the vulnerabilities, risks, and costs are. The case must include a discussion of the risks of inaction. Find Allies Ask around your organization. Look for friends and colleagues who share your concerns. Maybe you’ll find someone who has tried to convince management of something in the past. At the very l...
9. 20 Key High Availability Design Principles 1
#20: Don’t Be Cheap One of the basic rules of life in the 21st century is that quality costs money. Whether you are buying ice cream (“Do I want the Ben & Jerry’s at $4.00 per pint, or the store brand with the little ice crystals in it for 79 cents a gallon?”), cars (Rolls-Royce or Saturn), or barbecue grills, the higher the quality, the more it costs. The decision to implement availability is a business decision. It comes down to dollars and cents. If you look at the business decis...
10. Consolidate Your Servers
#16: Consolidate Your Servers The trend over the last few years in many computing circles has been to consolidate servers that run similar services. Instead of having many small singlepurpose machines or lots of machines running a single instance of a database, companies are rolling them together and putting all the relevant applications onto one or more larger servers with a capacity greater than all of the replaced servers. This setup can significantly reduce the complexity of your computing envir...
