In: Categories » Computers and technology » Servers » Consolidate Your Servers
The trend over the last few years in many computing circles has been to consolidate servers that run similar services. Instead of having many small singlepurpose machines or lots of machines running a single instance of a database, companies are rolling them together and putting all the relevant applications onto one or more larger servers with a capacity greater than all of the replaced servers. This setup can significantly reduce the complexity of your computing environment. It leads to fewer machines that require backups; fewer machines that require reboots; and overall, fewer things that can fail. As a result, the labor and cost associated with administering systems is reduced. It is cheaper and easier to manage fewer larger systems that are configured similarly, using today’s technologies, than it is to manage many smaller systems, especially ones that run disparate operating systems. Consolidation is, however, a powerful force for improving the simplicity and manageability of an environment. It comes with the cost of having to invest even more engineering effort in making the larger, consolidated server more reliable and robust. The moral of the story is “Go ahead and put all your eggs in one basket; just make sure the basket is built out of titanium-reinforced concrete.” A new trend is emerging, though, that may render this particular principle moot in a few years. Vendors are linking together hundreds or even thousands of very small systems called blades, and developing pseudo-backplanes and shared networks to connect them together, enabling these blade computers to act as a massive cluster. The biggest obstacle so far is administration; few mainstream tools allow easy administration of hundreds of systems, even ones that share applications and operating systems. Those technologies will likely appear in the next few years.
#15: Watch Your Speed
When people speak of the end-to-end nature of availability, they usually think of all of the components between the user and the logically most distant server. While those components are, of course, all links in the chain of availability, they are still not the whole story. After all, either a piece of equipment is functioning, or it is not. If we maintain our user perspective on availability, another element needs to be considered: performance. Since our user cares about getting his job done on time, the performance of the system that he is attempting to use must also be considered. If the system is running but is so overloaded that its performance suffers or that it grinds to a halt, then the user who attempts to work will become frustrated, possibly more frustrated than he would if the system were down. So, in addition to monitoring system components to make sure they continue to function, system performance must be monitored from a user’s perspective. Many system tools can be used to emulate the user experience, as well as to put artificial loads on systems so that high-water marks for acceptable levels of resource consumption can be determined. We believe, however, that benchmarks are not nearly as useful or practical as vendors would have you believe. In general, benchmarks can be tweaked and modified to get almost any desired results. At least one disk array vendor has a sales agreement that specifically prohibits its customers from running or publishing any benchmarks related to the vendor’s equipment, and the vendor has enforced this prohibition in court on more than one occasion. Benchmarks rarely measure what people actually do on their systems. They seldom mimic multiuser systems, where users are working in a manner in which users actually work. And, unfortunately, benchmarks become less and less valid the more that the testing environment differs from reality, and it doesn’t take a lot of differences to make the benchmark totally worthless. To a user who is trying to use a computer system, very little difference can be detected between a down system and a very slow one, and as the system slows down more and more, no difference is discernible at all. Back in the heyday of the Internet, an Internet consulting company, Zona Research, estimated that $4 billion a year was lost due to users attempting to buy things on slow web sites, then giving up and canceling their purchases. Although that number has surely come down in the time since the Internet boom, even a single cancelled transaction should be enough to give a company reason to stop and think.
#14: Enforce Change Control
When you call system support or your help desk to complain about something being wrong with your computer, the first question the helpful person on the other end of the phone asks is always the same: “What was the last thing you changed?” The most common answer to this question is, of course, “Nothing.” But most of the time, something has changed, and it’s usually something fairly serious. Mature production environments should have change committees that include representatives from every relevant organization, including the users, networking, system administration for each operating system, database administration, management, and every internally developed application. Other groups may be required as well, depending on the environment. When someone requests a change, it is brought before the committee for approval and implementation scheduling. With a solid change control system in place, everybody’s needs can be taken into account, including schedules and deadlines. Conflicting or contradictory changes can be identified early in the process. The change control process should be such that any rejection of a change is justified and explained so that the requester has a chance to resubmit it at a later date. But no change, regardless of how small, can be made unless it has been approved in writing by all members of the committee. Any change request should include, as a minimum, the following items:
- Executive summary of the change
- Detailed description of exactly what will be changed
- The source of the code to be changed (if the changes are in software)
- Why the change is required
- What the risks are, if the change goes badly
- A back-out plan, in case the changes go badly
- The time that will be required to implement the plan
- Requested schedule for the implementation
Some enterprises impose additional restrictions on changes to production, including requiring that changes be made in a test environment first and allowed to run there for a week or more before they can be introduced to production. Another aspect to change control is keeping track of changes to files, particularly system files. Fortunately, there are utilities that can help out in this regard. One excellent utility for monitoring changes to system files is Tripwire (www.tripwire.org), which is available on many different operating systems. Tripwire is a tool that checks to see what has changed on your system. It monitors key attributes of files that should not change, including binary signature and size, and reports those changes when it detects them.
legal notice
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
Useful tools and features
related articles
Measuring Availability When you discuss availability requirements with a user or project leader, he will invariably tell you that 100 percent availability is required: “Our project is so important that we can’t have any downtime at all.” But the tune usually changes when the project leader finds out how much 100 percent availability would cost. Then the discussion becomes a matter of money, and more of a negotiation process. As you can see in Table 2.1, for many applications, 99 percent uptim...
2. Definitions for downtime vary from gentle to tough
Defining Downtime Definitions for downtime vary from gentle to tough, and from simple to complex. Easy definitions are often given in terms of failed components, such as the server itself, disks, the network, the operating system, or key applications. Stricter definitions may include slow server or network performance, the inability to restore backups, or simple data inaccessibility. We prefer a very strict definition for downtime: If a user cannot get her job done on time, the system is down. A computer syste...
3. File and Print Server Failures
Network Failures Networks are naturally susceptible to failures because they contain many components and are affected by the configuration of every component. Where, exactly, is your network? In the switch? The drop cables? Bounded by all of the network interface cards in your systems? Any of those physical components can break, resulting in network outages or, more maddeningly, intermittent network failures. Networks are also affected by configuration problems. Incorrect routing information, duplicate host...
4. Web and Application Server Failures
Web and Application Server Failures The bugs that can strike a database can also affect a web server. Of course, many web servers are part of client/server applications that query back-end database servers to service client requests. So, anything affecting the database server will have an adverse effect on the web server as well. However, there are many other places within the web server environment where things might go awry. There are many new places for bugs to crop up, including in the Common Gateway Interfa...
5. Your system fails because the operating system panics
Renewability Let’s say your system fails because the operating system panics. It reboots, restarts applications such as web servers and databases, and continues on as before the failure. What’s the probability of another failure due to an operating system panic? In all likelihood, it’s exactly the same as it was before the reboot. There are many cases, however, in which repairing a system changes the MTBF characteristics of the system, increasing the probability of another failure in the near-te...
6. Direct and Indirect Costs of Downtime
The Costs of Downtime The only way to convince the people who control the purse strings that there is value in protecting uptime is to approach the problem from a dollars-andcents perspective. In this section, we provide some ammunition that should help make the case to even the most stubborn manager. Direct Costs of Downtime The most obvious cost of downtime is probably not the most expensive one: lost user productivity. The actual cost of that downtime is dependent upon what work your user...
7. COST OF DOWNTIME IS NOT A CONSTANT
Further complicating matters is the fact that the cost of downtime is not a constant. We will assume it to be constant for the purposes of our calculations (it makes them much, much simpler), but in reality, the cost of downtime increases as the duration of an outage increases. Consider again the effects of downtime on an e-commerce site. If the site suffers a brief outage (a few seconds), the cost will be minimal, perhaps even negligible. An outage of a minute or less probably will not affect business too badly: All...
8. The Politics of Availability
To persuade others of the value of your ideas, it is necessary to delve into the dark, shadowy world of organizational politics. Fundamentally, this means that you achieve your goals by helping (or if you aren’t particularly scrupulous, appearing to help) others around you achieve their goals, so that they then help you achieve yours. Start Inside Probably the best way to convince others of the value of your ideas is to first convince them that your ideas will help them achieve their own goals. To do that, yo...
9. Rational case that explains in nontechnical terms
Start Building the Case Once you have learned what you need to know, the next step is to begin to put together a calm and rational case that explains in nontechnical terms what the vulnerabilities, risks, and costs are. The case must include a discussion of the risks of inaction. Find Allies Ask around your organization. Look for friends and colleagues who share your concerns. Maybe you’ll find someone who has tried to convince management of something in the past. At the very l...
