User documentation is often a good starting point

an article added by: Ben Smeider at 11272007


Servers :: User documentation is often a good starting point ::

 French | Spanish | Portuguese | Italian | German | Japanese | Chinese | Korean | Russian | Arabic Bookmark and Share

#3: Exploit External Resources

 

Most likely, whatever problem you are trying to solve, or whatever product you are trying to implement, someone has done it before you. The vendor probably has a consulting or professional services organization that, for a fee, will visit your site and implement your critical solutions for you, or at least offer advice on how to architect and implement your plans. Arrange for on-site consultation from vendor resources or independent contractors, and be sure a transfer-of-information or technical exchange is part of the planned work. If vendors offer training classes on a product, make sure the right people attend them so that you learn the pitfalls and issues before you start implementation. Read articles (we know an excellent one on high availability) and magazines, both on paper and online. Scan the Web; the TechTarget collection of web sites (www.techtarget.com) is an excellent set of resources for the system and database administrator on every major platform. Attend conferences; for example, the USENIX and LISA conferences (www.usenix.org) offer consistently excellent training and support services for Unix, Linux, and Windows administrators, on a variety of general and very specific topics. If you are a significant user of a particular vendor’s products, see whether they have user conferences; they represent an outstanding opportunity to meet other people who use similar products and who may have run into similar problems. Auburn University in Auburn, Alabama, maintains a collection of mailing lists, many of which are relevant to system administrators. For more information, visit http://mailman.eng.auburn.edu/mailman/listinfo. Countless other examples of mailing lists and valuable web sites can be found all over the Internet.

User documentation is often a good starting point for information, although some product documentation can be less useful than others. Many vendors have local user groups in major cities or where there are concentrations of their users. Salespeople are usually happy to point interested users to these groups. An often overlooked and yet very valuable resource for all sorts of technical information is something we think of as the grandparent of the World Wide Web, Usenet. Often simply called newsgroups, Usenet is an immense collection of bulletin boards on thousands of different topics. Arecent check showed that there are over 35,000 different newsgroups on practically every topic imaginable, serious or recreational, from medical issues to current events, from macramé to professional wrestling. Ahuge percentage of Usenet is devoted to technical topics of all sorts (although there does not appear to be a newsgroup dedicated to high availability). Any of the major web browsers can be used to access Usenet. At http://groups.google.com/groups, you can search the Usenet archives dating back to its very beginnings in the 1970s. If you have never visited Usenet, you should do so at your first opportunity. It is a tremendous storehouse of knowledge. Unfortunately, like the rest of the Internet, it has been badly contaminated in recent years by junk postings and spam. But if you can filter out the junk, there is a lot of value in Usenet. Another way to gain knowledge in your organization is to hire experienced people. If someone else has sent them to training, and they have learned most of what they need to before they joined your organization, then you can save money on training. It will likely cost you more money to hire well-trained and experienced people, but it is almost always worth the expense. Vendor salespeople represent another useful resource. Although their productspecific information may be somewhat slanted, they can often obtain experts in their specific field of interest who can come and give non–sales-specific talks. Salespeople can also provide reference sites, where you can verify that their solutions work as they advertise them and you can learn about the vendor and the product at a level that the sales team can be reluctant to provide.

#2: One Problem, One Solution

Someone once said that a good tool is one that has applications that even its inventor did not foresee. Although that’s true, most tools are designed for a single purpose. Although a butter knife can turn a screw, you wouldn’t use one as a screwdriver and expect the same results. The same holds true for software; you should not try to shoehorn software into performing a function it was not designed to do.

Don’t try to make a solution fit if the designers did not intend it to be used in the way you propose. Complex problems have many aspects (subproblems) to them and may require more than one solution. Examine one subproblem at a time, and solve it. If your solution happens to help solve another subproblem, that’s serendipity. But don’t expect it to happen every time. In fact, one solution may create other unforeseen problems, such as incompatibility with other products.

If you have faith in your vendor and its sales force, then when the salesperson recommends against your using his product in a certain way, it is best to listen to his advice. Consider the salesperson’s motivation (assuming that he is a good and honest salesperson). He ultimately wants happy customers. Happy customers buy more, and when they buy more, he makes more money. If he tells you something that discourages you from buying a product, he is turning down short-term gain for a long-term gain. That is the mark of a good salesperson; he is attempting to build a trusting relationship with his customer. When a salesperson or a sales engineer does this, consider the advice carefully; it is probably valid.

#1: K.I.S.S. (Keep It Simple . . .)

We live in an immensely complex world. We have technologies and tools about which our parents and certainly our grandparents never dreamed. And our children will likely see similar technological advances in their lifetimes.

In order for technology to become generally accepted into society, it has to be simple. If a complex technology is made sufficiently simple to use, it will be adopted. When automobiles first came out, you needed to be a mechanic to keep them operating. Nowadays, you do not need to know anything about the workings of a car to drive one. As a result, in much of the world, cars are totally pervasive. They remain complex, and have gotten much more complex over the years. But they are easy to use. The same cannot be said of computers. However, beyond adoption, simplicity allows technology to work. The fewer parts that something has, everything else being equal, the more likely it is to work. By removing unnecessary components, you reduce the number of components that can fail. Computer systems today are immensely complex, and they are likely to become more complex over time. When you sit down to work on your networked computer, you are using your computer and all of its software components, including the operating system, the applications, and all of the other administrative applications like antivirus software, the computer’s hardware components (CPU, memory, network card, and so on), your storage (whether direct-attached, networked-attached, or SAN-attached), your local LAN and all of its components (routers, hubs, switches, and so on), whatever servers your applications require, and all of their software and hardware components, and so on. All this complexity means that in any computer network, there are a large number of individually complex components that can fail, often causing larger failures. Unless special precautions have been taken (and we will spend a significant percentage of the remainder of this article discussing what we mean by “special precautions”), if any one of those components fails, work on the systems will be interrupted. What’s more, even if protective measures have been put into place, the failure of one component may hasten the failure of another component that has become overworked due to the first failure. Simplicity means many things. To help introduce simplicity into computer systems, do the following relatively easy tasks: 

  • Eliminate extraneous hardware on critical systems. Get the scanners off the production systems—unless your production work involves scanning, of course. If your servers don’t need graphical screens and mice, then remove them; they add nothing and are just two more things that can break.
  • Slim down servers so that they run only critical applications. Stop playing Doom on production systems, even if the game does run faster there. Don’t run screen savers on production systems; modern monitors don’t get burn-in (which is what screen savers were originally employed to defend against). Today’s screen savers are nothing but CPU suckers.
  • Disconnect servers from networks where they don’t need to be. There’s no reason for development or QA networks to be connected to production servers. A network storm or other problem on those networks can have an adverse effect on production.
  • Select hostnames that are easy to remember and easy to communicate on the telephone. Admittedly, hostnames like ha4pv56a may communicate a lot of information about a host, but they are hard to remember, especially for new people. Imagine a situation where the boss says, “Quick, run up to the data center and reboot the ha4pv56a file server! It’s hung, and everybody is locked up!” You run to the elevator, then ride it up to and wait for the second set of elevators in order to get to the data center. When you finally get there, you are confronted by four servers: ha4pb56a, ha4pd56a, ha4pt56a, and ha4pv56a. If you reboot the wrong one, you’ll affect 100 otherwise-unaffected users, who’ll be pretty angry.
  • If instead of those hard-to-remember names you choose pronounceable and memorable names from a theme, you’ll get it right every time. Rule of thumb: If you have to read out more than three characters in the name of a system, it’s probably a bad name.
  • Automate routine tasks. Human error is one of the leading causes of system downtime. Computers are really good at doing mundane and boring tasks. By automating them once, you significantly reduce the chance of error when the task must be repeated.
  • Remove ambiguity from the environment. If it’s not clear whom you should call when something breaks, or who has authority to take a network down, the wrong thing will happen.
  • The bottom line is that you want to minimize the points of control and contention, and the introduction of variables. The fewer things that are there, the fewer things that can break.
  • Since human error is a leading cause of downtime, one important way to improve availability is to reduce the number of mistakes that humans (administrators) make on critical systems. The best way to do so is to give them less opportunity to make mistakes (even honest ones). By making systems simpler, you do just that. Simpler systems require less administrative attention, so there is less reason for administrators to interact with them, so there is less chance for them to make a mistake that brings the system down.

legal disclaimer

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

related articles

1. Your system fails because the operating system panics
Renewability Let’s say your system fails because the operating system panics. It reboots, restarts applications such as web servers and databases, and continues on as before the failure. What’s the probability of another failure due to an operating system panic? In all likelihood, it’s exactly the same as it was before the reboot. There are many cases, however, in which repairing a system changes the MTBF characteristics of the system, increasing the probability of another failure in the near-te...

2. Direct and Indirect Costs of Downtime
The Costs of Downtime The only way to convince the people who control the purse strings that there is value in protecting uptime is to approach the problem from a dollars-andcents perspective. In this section, we provide some ammunition that should help make the case to even the most stubborn manager. Direct Costs of Downtime The most obvious cost of downtime is probably not the most expensive one: lost user productivity. The actual cost of that downtime is dependent upon what work your user...

3. COST OF DOWNTIME IS NOT A CONSTANT
Further complicating matters is the fact that the cost of downtime is not a constant. We will assume it to be constant for the purposes of our calculations (it makes them much, much simpler), but in reality, the cost of downtime increases as the duration of an outage increases. Consider again the effects of downtime on an e-commerce site. If the site suffers a brief outage (a few seconds), the cost will be minimal, perhaps even negligible. An outage of a minute or less probably will not affect business too badly: All...

4. The Politics of Availability
To persuade others of the value of your ideas, it is necessary to delve into the dark, shadowy world of organizational politics. Fundamentally, this means that you achieve your goals by helping (or if you aren’t particularly scrupulous, appearing to help) others around you achieve their goals, so that they then help you achieve yours. Start Inside Probably the best way to convince others of the value of your ideas is to first convince them that your ideas will help them achieve their own goals. To do that, yo...

5. Rational case that explains in nontechnical terms
Start Building the Case Once you have learned what you need to know, the next step is to begin to put together a calm and rational case that explains in nontechnical terms what the vulnerabilities, risks, and costs are. The case must include a discussion of the risks of inaction. Find Allies Ask around your organization. Look for friends and colleagues who share your concerns. Maybe you’ll find someone who has tried to convince management of something in the past. At the very l...

6. 20 Key High Availability Design Principles 1
#20: Don’t Be Cheap One of the basic rules of life in the 21st century is that quality costs money. Whether you are buying ice cream (“Do I want the Ben & Jerry’s at $4.00 per pint, or the store brand with the little ice crystals in it for 79 cents a gallon?”), cars (Rolls-Royce or Saturn), or barbecue grills, the higher the quality, the more it costs. The decision to implement availability is a business decision. It comes down to dollars and cents. If you look at the business decis...

7. Consolidate Your Servers
#16: Consolidate Your Servers   The trend over the last few years in many computing circles has been to consolidate servers that run similar services. Instead of having many small singlepurpose machines or lots of machines running a single instance of a database, companies are rolling them together and putting all the relevant applications onto one or more larger servers with a capacity greater than all of the replaced servers. This setup can significantly reduce the complexity of your computing envir...

8. Documentation provides audit trails to work that has been completed
#13: Document Everything The importance of good, solid documentation simply cannot be overstated. Documentation provides audit trails to work that has been completed. It provides guides for future system administrators so that they can take over systems that existed before they arrived. It can provide the system administrator and his management with accomplishment records. (These can be very handy at personnel review time.) Good documentation can also help with problem solving. 1. The first audience is the...