I've mentioned mainframes and minicomputers previously -- but I've avoided talking much about them directly. Although personal computers and LANs can do many new things, many people assume that the only computer to use for really big applications is, of course, a mainframe. Conversely, a standard assumption about LANs is that there is a strict upper limit on the size and complexity of the work they can be trusted to do. This chapter examines these two assumptions directly. I'll tackle that long-time companion, the mainframe, to see what the future might suggest, at least from a technical perspective.
Historically, building business systems was easy: start with a mainframe, plug in terminals, and grow the mainframe to keep up with the demand. With the introduction of the minicomputer, people started talking about a new breed of distributed systems: plug in computers at remote locations where the work was being done and distribute the workload across many smaller computers instead of funneling it all through a single monster computer. Attractive as this concept is, and in spite of many attempts over the past 20 years to make it work, most large business systems today are still highly centralized. However, new high-powered personal computers up the ante. With PCs, servers, WANs, and LANs, distributed systems become even more attractive than in the past. The question is: attractive as they are, can people build distributed systems? If so, how and when?
In this chapter, I duck the question of how to build distributed systems. Instead, I will show what a distributed system would look like if it were built. Then in the next part of the book, I consider the business pressures that are forcing companies to develop distributed computer systems and the somewhat surprising shifts in design approaches. These new design approaches provide a clear road map for making these distributed systems practical.
Is a mainframe intrinsically different from a personal computer? Is there special magic built into it that provides huge speed or capacity advantages? Is this special magic rare enough that it can't be reproduced in smaller and less expensive computers?
From a hardware perspective, a mainframe is not all that different from a personal computer! Heretical? Yes. But also true. In fact, what a mainframe most closely resembles is a LAN.
Any mainframe has several major components. First is the computer itself. Historically, the computer inside a mainframe was much faster than a personal computer. This speed manifested itself in two ways:
|The raw speed expressed in instructions per second, or cycles|
|The amount of memory that could be addressed directly by a program|
In the following sections, I discuss these two aspects of a mainframe in sequence,
In the chapter on LANs, I examined some of the scary implications of buying and upgrading expensive computers. One question I didn't answer is, how big can a computer get? And how expensive is it to make a computer big?
It is surprisingly hard to build really big computers. Mainframes cost millions of dollars, and super computers can cost many millions. However, neither of these big computers is thousands of times faster than personal computers. They cost literally thousands of times more than PCs, but they sure don't do thousands of times more work. The problem is that the price of a computer increases far faster than the power that can be built into it for the money. Another problem is worse: even with an infinite budget, there is a very real upper limit to the power of the biggest possible computer.
Airlines run into this problem every day. The central reservation system for an airline processes all of its work through a single computer. The rationale for this approach is that a passenger at any airport in the world can book flights that will take him to any other airport -- all within a 24-hour period. So it makes a lot of sense to just keep all of the information about flights, seat availability, and schedules in a single place. How does the central computer keep up with the resulting volume of requests streaming in
from all over the world? With great difficulty.
Many years ago, the largest airlines realized that capacity would be a huge problem for their central reservation systems. To face this challenge, they decided to write their applications in special ways that would squeeze every last ounce of computing power out of the poor, overloaded, central mainframes. So big airlines use:
|A custom operating system called Airline Control Program (ACP)|
|Applications written in assembler, the lowest-Ievel programming language
|Data that is stored not in commercial databases but in special storage systems (private databases written by the airlines for themselves) optimized for fast update and retrieval|
All of this proprietary, custom software has a huge cost. And all of this is still not enough.
As a result, the major airlines tend to have standing orders with the major mainframe manufacturers. When a newer, larger, faster mainframe is ready for production, the mainframe company merely provides appropriate advance notice and then just ships the computer to the airline with no questions asked. Price is not a consideration; anything that will help the airline manage capacity crunch is used as soon as it is available.
Airlines are far from unique. Many important computational problems cannot be solved simply because there is no computer fast enough to do the job. For example, weather simulation to a large extent is just as accurate at the amount of data that can be contained in the model at anyone time. Keeping track of wind patterns, temperature shifts, and pressure cells over the surface of any large part of the earth quickly exceeds the capacity of even the largest super computer.
As another example, modeling the structure of a car, truck, or airplane quickly brings even the biggest computer to its knees:
|Tracking how it will react to various stresses|
|Taking into account all the interactions between the parts of the structure|
Additionally, designing computer chips strains computers, too. Every advance in technology makes engineers depend even more heavily on massive computers to help them interconnect millions of elements in creative new ways.
Perhaps the most compelling examples of "computer crunchers" -- applications that eat big computers up and then spit them out exhausted -- are the animation and visualization programs used to create videos, movies, and advertisements. Building one dinosaur motion sequence in Jurassic Park -- lasting only seconds on the screen --
takes hours of computer time. Computers run smack into the brick wall imposed by the massive computational power and huge amounts of memory required to create very simple "virtually real" worlds. In the movies, the virtual worlds suddenly must be very large and quite real -- at least realistic enough to fool a critical audience. Although present animation techniques produce amazing cartoons and articulated robots, those techniques are still very far from modeling the real world.
Perhaps the class of applications that best characterizes mainframes are those called batch processing applications. Every large organization runs periodic processes that operate on large amounts of data on a daily, weekly, or monthly basis, such as the following:
|Aging accounts receivable
|Updating credit ratings
|Computing optimum routes for fleets of delivery trucks|
All of these are large, complex, heavily computational processes with two common characteristics. First, these processes manipulate large amounts of data. Second, to do the best possible job, the application often works with the entire database at one time. For example, the optimum schedule for a factory could involve juggling all the orders, sequencing delivery of parts in particular ways, and juggling the use of expensive equipment to keep that equipment constantly busy. The only way to develop that schedule is to look at the entire day's, week's, or month's production all at one time. These processes are exactly opposite to terminal-oriented applications. Instead of dealing with a relatively small amount of data, interacting constantly with a person, and doing work in small chunks, these applications process data in large batches. For that reason, this type of processing has come to be known as batch processing.
Originally computers did only batch processing. In the days of punched cards and printed reports (before terminals were invented), computers could only do one thing at a time. Each application ran straight through before the next application (or job, as it was called at the time) began. Batch processing was all there was.
People commonly assume that batch processing is a hangover from the past and that, over time, batch applications will disappear. Perhaps, but more likely not. The types of applications and processes just described are intrinsically batch-oriented in nature.
Some of today's batch processes will undoubtedly be redesigned as tomorrow's on-the-fly processes. But other processes will remain batch, particularly those that involve large-scale optimization across big parts of an organization.
Batch processing is important for two reasons. First, virtually every large organization has large amounts of batch processing. Second, batch processing is handled best by the mainframe.
In many companies, batch runs define the size of the mainframe needed just as much as the on-line demands placed on the central database. Typically, batch processes are scheduled on a calendar. Billing happens on the first and the fifteenth of the month; receivables age on the tenth and the twentieth, bills are printed and mailed every Thursday, and so on. Each batch process is carefully timed and the average run time is determined. These run times are all overlaid on the available batch windows. When the run time exceeds the available window, you've got a problem. So what is this mysterious batch window?
The batch window is the period of time when the mainframe can process the intense computations of a batch job. Running batch applications during the working day would create two serious problems. First, user response time is likely to suffer. Second, many batch processes require exclusive access to the entire database. For example, scheduling a factory might assume that certain parts are in stock. When the scheduling program makes this assumption for any given part, the inventory of that part has to be locked down until the schedule is finalized. However, if the entire plant were being scheduled, then pretty soon all the inventory would get locked up and any other applications that require access to parts wouldn't run. For these two reasons, many batch jobs require a batch window to provide exclusive access to the computer and its data. And because that access must be exclusive, the batch window must be limited in size to minimize interruptions to users. In fact, the smaller the window, the better. Over time, users tend to rely more and more on constant access to their data and resent long periods of unavailability.
Unfortunately, batch processes take a long time. Furthermore, when a batch process fails, generally it must be restarted. So allowing for the possibility of failure, the individual batch jobs must occupy less than half the available window. That way, if they must be restarted, there's enough time to try over. And there's the twist.
As companies grow and their processing requirements get more sophisticated, batch processes tend to grow, too. Pretty soon, fitting the batch processes into the available windows becomes a real problem. Although many companies feel safe because they don't run global reservation systems or construct complex virtual animated worlds,
they still have batch processes. Consequently, the growth of batch processing has been one of the reasons for the continuing need for mainframes.
Many companies constantly struggle with a fixed batch window, a computer that is not fast enough, and continual pressure for new applications that could add value only there were enough batch cycles to run them.
On an industrial scale, companies need very big computers -- bigger than can be built-to carry out a variety of industrial-strength applications. Central database servers must be able to
|Track passengers and customers worldwide|
|Constantly present up-to-date information to anybody who asks|
|Instantly allocate scarce resources such as airplane seats and credit dollars|
Complex design processes such as weather prediction and movie animation will strain even the biggest super computer. And even for companies that avoid these two classes of problems, batch processing -- that mainstay of classical computing -- will constantly cause people to ask for more, bigger, and faster computers.
Although big is important, it's not the whole story. Big computers are needed, but small computers are, too. To make the picture complete, all the sizes in between are needed. Here are some critical questions for '90s computing:
|Mainframes are great for big jobs, and personal computers work well for
little tasks, but what's in between?|
|Even if you could answer that question, isn't it complicated having to
deal with two completely different kinds of computers when building
|Last but not least, aren't mainframes really expensive and isn't there some way to do the same thing less expensively?|
As I'm about to show, the answers to these questions revolve around the concept of scalability. By implementing maximum scalability, all three questions receive complete and satisfying answers.
What is scalability? The word is not in the dictionary. Once again the computer industry has invented a term to meet a perceived need. However, the root of scalability is
scale. The American Heritage Dictionary defines the term scale as
Therefore, scalability is the ability to gracefully adapt to changes in size over a series of steps.
In the United States, there are over 11 million business locations or establishments: physical buildings or offices in which people work on a full-time basis. Many of these properties are owned by small businesses with only a few employees. However, many properties are owned by big companies. The mention of a big company, such as Coca-Cola, conjures images of big buildings such as skyscrapers, monster factories, and warehouses bigger than football fields. Servicing the needs of these large facilities calls for big computers. However, even the biggest companies have many locations with only a few employees: remote sales offices with two salespeople, small ware- houses with a dozen staff members, regional repair centers housed in two rooms, and so on. This combination of big, medium, and small operations creates the scalability challenge. The challenge calls for not only big computers and small ones -- but also all the sizes in between. The computers, the disks that attach to them, the applications that run on those computers, and the operating system software required to make it all fit together -- all of these must be scalable to meet the needs of all sizes of companies, offices, and locations.
Imagine a large manufacturing company with many warehouses. In that company, the largest product distribution center may be several stories tall, cover as much land as several football fields, and operate on a completely automated basis. This warehouse may be blacked out; robot forklifts, automated conveyer belts, and computer-controlled picking systems handle the goods. No lights are required because no humans normally work in the facility. Obviously, such a facility would require massive computer power. Each movement in the warehouse would be tracked by the computer. In fact, were the computer to slow down, so would the warehouse. And if the computer were to break or stop, the warehouse would follow right behind. Such an environment seems made-to-measure for a huge computer such as a mainframe. However, it would be nice if the operation of the entire plant weren't tied so intimately to the health of a single machine. In a big distribution center, the obvious solution is to have two or more mainframes backing each other up. Fine. But what about smaller warehouses?
Suppose that the same company had regional warehouses that are still heavily automated but each contains medium amounts of inventory .Such a facility just can't justify the expense of a multimillion-dollar mainframe. Yet, if a smaller computer were
used, would this mean that programmers would have to rewrite the software from the large warehouse to be useable in the medium-sized warehouse?
Finally, suppose that you are willing to write that software twice, once for a big center and once for a medium-size center with its medium-size computer. What about the small warehouse? The company may also have a network of small local centers that hold spare parts, products in heavy demand in local markets, and goods for walk-in customers. These little centers occupy a few thousand square feet and are largely not automated. Yet they still require computer systems to track inventory, orders, shipments, and so on.
Suppose that the computer systems at various locations must be integrated with one another so that products may be transferred back and forth without onerous and redundant paperwork. Does this mean you must maintain three completely incompatible computer systems and keep all three systems up to date with each other?
The same company could also have scalability problems in its sales organization. In major cities, the branch sales office may have hundreds of staff, handle complex orders, and require a large computer system to keep it all going. A similar office in a secondary city could have 10 to 20 salespeople and some support staff and need a moderate-size computer system. In outlying areas and new territories, the company would need a way to support solitary salespeople and offices with a staff of 5 to 10 people.
Many other parts of the organization have the same problem: accommodating extreme variations in the scale of computer power required within the same department. Recently, the desire to support mobile workers has aggravated the scalability problem.
Until recently, a company's computers were always located in its own offices and plants. However, a great deal of business activity takes place outside the company's offices and plants. Deliveries, sales calls, support activity, and site inspections all take place at the customer's offices. Why shouldn't the computer be there to support all the activity taking place where the customer is? Doesn't this make sense particularly in a customer-focused world? Of course, with laptop and notebook computers, the computer should be there. It makes tremendous sense for both efficiency and customer focus. With a notebook computer at a customer site, a sales representative can
|Book the order on the spot|
|Confirm the price down to the last penny|
|Eliminate any re-entry of the information|
|With a computer on every delivery truck, a company could|
|Track the delivery route|
|Pick up new bills of lading electronically|
Mobile computing brings the problem of scalability into acute focus. Small companies need small computers. Not surprising. Big companies need small computers just as much, though. That's more surprising. Big companies need big computers, too. The problem is that the big companies need both big computers and small ones and all the sizes in between. To make things worse, all those various sizes of computers must work with each other seamlessly. Finally, to put a sharp tip on the exclamation point, the big companies really need all these sizes of computers to work compatibly. Only by having all sizes of computers work compatibly does it become possible to write applications like distribution, sales support, manufacturing, service, and so on, so that just the right size of computer can be picked for each location, knowing that after the computer is picked, the application will be right there to run on it.
Isn't there some way to build a computer that can somehow be either big or small, depending on how it's put together? I'll look at those two questions one at a time. First, I'll dive down a level and see what makes a big computer big. What makes mainframes so fast? What makes them so expensive? Why do these factors imply strict upper limits on how big the mainframes can get, even if cost is no object? Then I'll consider various alternative approaches to the problem of building big computers. These other approaches will in fact lead directly to a solution to the scalability problem that builds, as you'd expect, on the technology you've come to know (and love?) in the previous chapters.
Computerworld was the first major newspaper of the computer world; it's been in print for over 20 years. Herb Grosch was its original editor. For much of Computerworld's life, IBM not only dominated but also virtually owned the computer marketplace -- with over 80 percent market share. Herb noticed that IBM's pricing model made it very attractive for customers to buy ever bigger computers. In retrospect it's a little hard to tell whether the pricing model was driven by the cost of the underlying hardware or whether it was established by IBM as a mechanism to encourage customers to upgrade.
In any case, in the '60s and '70s, the raw speed of a computer was directly proportional to the square of its cost. In other words, spend twice as much on a computer and get four times the speed. Spend three times as much; get nine times the speed, and so on. That's Grosch's Law. Commercially, IBM's pricing model resulted in several strange artifacts. For example, companies that could not afford a really big computer got together with other companies and shared a single large computer . Sometimes one company would buy a big computer and sell a fraction of it to other companies. Other times, entrepreneurs would start up a service bureau and sell cornputer time to big companies at better rates than those companies could do on their own. And all of this based on Grosch's Law.
At any given point in time, there are a variety of technologies available for building computers. These technologies can involve radically different transistor technologies, better circuit design, techniques for making chips run faster (perhaps at the cost of higher power consumption), and so on.
To apply Grosch's Law, IBM had to accomplish two things: invest in multiple techniques to build a series of successively faster computers and design a family of compatible computers so that customers could move from one computer to another without rewriting applications.
Before 1964, each new type of computer was different from all the types before it, irrespective of who built the computer. IBM and other companies constantly pushed the state of the art to develop faster and bigger computers. But moving to one of the new machines was always a traumatic experience because it required customers to rewrite all their applications. However, in 1964 IBM announced the product line that changed the computing world: the 360 family. The idea behind families is simple: design a line of computers, all built differently, but all with the same instruction set.
There are many ways of building a computer. Yet in the end, the instruction set defines how that computer works. An application ultimately consists of computer instructions that tell the computer what to do. So two vastly different computers will run a certain application identically as long as they both have the same instruction set. Thus, when IBM designed the 360 family of computers to all have the same instruction set, they were offering customers, for the first time, the opportunity to move applications from one computer to another without any conversion or rewriting. As time passed, IBM continued extending the 360 series, adding both bigger and smaller systems, but always maintaining instruction set compatibility. In fact, today's large IBM mainframes still have the same core instruction set as the original 360s; that 360 family continues to live on 30 years later.
In the '70s, DEC used the same concept in the VAX family. Today VAX models range from desktop microcomputers all the way to million-dollar mainframe model systems, all with the same instruction set.
In the '80s, Intel took a page out of history with the introduction of the X86 family of micro-processors, the central processing units (CPUs) of DOS-based personal computers. Examples of the X86 processor include the 8088, 8086, 80286, 80386, the 80486, and now the Pentium chips. This family, stretching from the original 8088 all the way to today's Pentium, has continued to add speed and functionality while still guaranteeing that every application written for the earliest processors in the family continue to run. In fact, if it weren't for Intel's strict adherence to the family concept, the IBM PC, DOS, and the whole personal computer software world would not be what it is today. By giving the X86 family a capable, powerful instruction set, Intel could plan a family of systems stretching through the '90s. Again, the family concept allows Intel to take advantage of new technology and build computers with substantially en hanced power every two to three years while promising to maintain the instruction set compatibility so crucial to stable business systems.
Suppose that a company faced a choice between buying several small or medium- size 360 mainframes versus buying one much larger 360, located centrally and shared among the various locations. You guessed it: in the past, Grosch's Law guar anteed that the single larger computer would always be the better choice. The economics of Grosch's Law made the case for centralization compelling:
|Even at the expense of extra complexity in operating the larger computer|
|Even at the loss of local control and flexibility|
|Even at the expense of being dependent on the single central site staying available (never breaking)|
Also, the argument involved more than just dollars and cents. Suppose spending three times your current computing budget yields nine times as much power. You would not only save a lot of money compared to buying three computers, but also have more computer power to work with. Nine times as much power in one box is considerably more than three times as much in three boxes. Even four times in one box is a knockout compared to twice as much in two boxes. So the combination of the 360 family concept, IBM's ability to build very big boxes, and the economics inherent in Grosch's Law created the economic incentive for centralization of computer facilities in large companies.
What is all this central power that mainframes provide so well? What does it mean to say that one computer is 2 or 10 (or 100) times faster than another? Let's see what computer power is really about.
The first measure of a computer's power is simply speed: the number of instructions per second that the computer can complete. When measuring an engine or motor, people talk about the number of times the machine rotates per second: the number of cycles completed each second. Similarly in a computer, the CPU must perform a number of steps to process each instruction. The combination of such steps is called a cycle. Computer professionals often discuss computer speeds in cycles per second (one cycle per second is one hertz). However, because computers operate so quickly, computer professionals talk about millions of hertz or megahertz (MHz). Thus a 66 MHz computer completes 66 million instructions per second.
So the first measure of a computer's power is processing speed in MHz. All other things being equal, a 66 MHz computer generally finishes tasks about twice as fast as a 33 MHz computer. According to Grosch's Law, spending twice as much on a com puter means you'll either be able to do four times as much work or complete most critical batch and database tasks in one-quarter of the time. As long as all other fac tors are the same, the faster computer will really do proportionately more work than the slower one. The problem is that all other factors don't stay the same.
For example, if pure speed on a single task were the only measure of performance, then personal computers would have left mainframes in the dust several years ago. Yes, in MHz, most big mainframes crank through more instructions each second than most personal computers do. But not that many more. In fact, today's personal computer grinds through as many instructions per second as the leading mainframe of only two to four years ago. If it's a race, then mainframes -- at a cost thousands of times as high -- have at best a four-year lead on personal computers in processing speed. Today's fast mainframe will be beaten by tomorrow's PC, and that PC will take just two to four years to do the catching up. So there must be more to power than just speed.
The second measure of power is address space. Just like people, computers must keep information to work with nearby. Some of this information is kept on disk, in permanent form. But the data used most frequently in the course of running an appli cation is kept in main memory .The term main distinguishes this memory , also called random-access memory (RAM), from the much slower disk memory. As the com puter uses main memory, the application must refer to the various bits and pieces of information stored there. For this reason, memory is organized into fixed-sized units called bytes. Each byte has its own address. These addresses start at 1 and range up to the total size of the memory installed on the computer. When you buy a personal computer with 5MB of memory , you have 5 million bytes of memory, and the computer reads and writes those bytes in memory using addresses from 1 to 5 million.
A computer's potential throughput is limited mainly by two factors:
|The amount of physical memory (RAM chips)|
|Its address space|
The computer's physical memory is determined by the number of RAM chips that can be plugged into the computer's motherboard.
However, the address space of a computer is defined by its instruction set. Each computer instruction consists of two parts: the instruction itself (add, multiply, and so on) and the address of the data the instruction will be applied to. In building the instruction set, the designer allocates a certain number of bits to the address. This number of bits puts a strict upper limit on the amount of memory the computer can use efficiently. Therefore, address space is the amount of memory (in bytes) that a computer's instruction set can ever use.
Early personal computers had a 16-bit address space. With a 16-bit address space, the largest amount of memory a computer can use is 64,536 bytes (64K). So pro grams written for the Apple II, early TRS 80s, and other machines of that time just couldn't deal with more than 64K of memory .Sixty-four thousand sounds like a big number until you start to build a reasonable-size spreadsheet. Even with early PC spreadsheet programs such as Visicalc, many users found their spreadsheets out- growing the capacities of their machine.
Memory limitations are frustrating because often the CPU still has enough speed to keep going when it hits the address space wall. I recall overhearing several conversa tions in computer stores in which customers mistakenly thought that their spreadsheet could grow if they plugged in more memory chips. Yet the issue was not the physical memory the computer had; rather it was the fundamental capabilities of the computer instruction set itself. On a 16-bit computer, the customer's spreadsheet couldn't be made bigger, no matter what the customer did -- short of moving to a completely new computer with a larger address space and a new spreadsheet that supported it.
Mainframes ran into the address space problem many years ago. Therefore, for over a decade, every serious mainframe instruction set has been based on at least a 32-bit address space: enough to reach out to over 4 billion bytes of memory .Eventually, this too may become a limitation. But at least through the early 1990s, the 32-bit address space defines the state of the art.
Is this what makes mainframes different from smaller computers? Originally, yes. Until recently, mainframes indeed had 32-bit address spaces, and most personal computers did not. The Intel 8088 and 8086 CPUs inside all the early IBM personal computers featured a somewhat extended 16-bit address space, capable of address ing just under 1MB of memory. This was (and still is) the famous 1MB DOS limitation.
The Apple II was even more limited. But Apple quickly leapfrogged Intel by selecting the Motorola 68000 as the basis for its Macintosh (the Mac). As a result, the Mac is essentially a 32-bit computer, just like mainframes. And, thanks to Apple and Motorola, Mac users have never had to face the same "RAM cram" problems as DOS users.
In 1984, Intel and IBM first broke the 1MB barrier with the introduction of the 80286 and the PC AT. The 286 offered a 24-bit address space, allowing programs to reach out to 16MB of memory at a time. Finally, in 1986, with the 386, the IBM-compatible world had a full 32-bit machine.
So originally one major difference between mainframes and personal computers was the address space. No matter how fast the CPU in a personal computer was, it simply couldn't address enough memory to handle the really big problems being run on mainframes. That era has come and gone.
Although personal computers now have address spaces as big as their mainframe siblings, PC applications still don't take as much advantage of those address spaces. Several factors determine how much address space an application can use. First, the computer has to have enough memory .Until recently this was a function of cost. In the 1960s, memory was so physically large and expensive that no computer had enough memory slots for even 1MB. By 1970, mainframes with over 1MB could be found, but that much memory literally cost over $1 million for the memory alone. Even in 1981, when the IBM PC was announced, memory was still so expensive that 16K machines were common, and 256K was considered a huge amount of memory . So until recently, mainframes had large amounts of memory and personal computers didn't because of cost. That's not true anymore.
[Ed. Note: Remember, these are 1995 costs.]
Today, memory costs $30 to $50 per megabyte. Personal computers routinely arrive from the manufacturer with 4MB or even 8MB preinstalled. Bumping this figure up to 16MB or 24MB is an everyday affair even in the home. Servers with 20 to 30MB are the rule. In high throughput applications, getting up to 50MB of RAM is a $2,000 decision. So memory cost no longer creates an incentive to go with mainframes on which users can share memory. And when it comes to the technical architecture required to support truly huge amounts of memory, Digital Equipment Corporation ( with its Alpha) and Silicon Graphics (with its MIPS chip) now ship personal computers with 64-bit address spaces capable of handling not just billions, but trillions of bytes of memory.
Last in the list of memory considerations is operating system support. Here's an area where mainframes do have an edge, although only a very slim one. Until 1990, per sonal computers really didn't make as effective use of memory as mainframes. Until that time, memory was one of the significant reasons to run big applications on mainframes. Yet even after memory became affordable and hardware instruction sets supported it, software companies still had to rewrite PC operating systems to take advantage of the newly found 32-bit address space. And that process is still underway.
Even today, one reason to favor mainframes over smaller computers for big jobs is that mainframe hardware and operating systems support large amounts of memory and large address spaces. However, in most ways that really count, both PCs and UNIX workstations have supported the memory and the address space needed by big applications for some time now. Yes, some database servers ran out of gas with only 16MB of memory , but patches allowed them to add memory up to 50 or 75MB -- more than enough to meet the needs of the database. What's more, both 32-bit OS/2 and 32-bit Windows NT have been around for some time now.
So until recently, mainframes were more powerful than PCs in terms of memory, address space, and operating system support for 32-bit address space. No more; now personal computers have caught up, and as they progress over time to 64 bits, personal computers may even pass by the mainframes. In the next section, I compare permanent storage in mainframes and PCs.
Mainframes have very large amounts of permanent storage. In fact, much of the cost of a mainframe installation goes to the disks, tape drives, and other forms of permanent storage. Where personal computers typically have single hard disks measured in tens or hundreds of megabytes, mainframes generally have dozens of storage devices and the measurement runs from hundreds of gigabytes to many terabytes. (A gigabyte is 1 billion bytes. A terabyte is 1,000 gigabytes. )
Mainframe storage systems are not only large but very sophisticated as well. Rather than just use a single type of disk drive, selected for either speed or capacity, a range of devices is chosen, and information is migrated automatically. Frequently accessed information is kept on special disk drives that, although expensive and relatively small, are very fast. Less frequently used data is kept on slower disk drives with more capacity. Information that is only used occasionally is kept on special, robot-driven, mass storage systems, or MSSs. An MSS looks like a large honeycombed wall with hundreds of small hexagonal cavities each holding a small roll of magnetic tape. In fact, the MSS consists of two such walls with several robot arms moving back and forth between them. As the computer asks for particular bits of information, the robot arms select the correct roll of tape, pull it out of its cave, mount it in a tape reader, and arrange to transfer the information to a disk.
The most amazing aspect of mainframe storage systems is not the individual bits of technology, but the extent to which the whole thing works automatically. Frequently used information appears on fast disk drives as if by magic. Later, as the same information is used less frequently, the system transparently transfers the data first to slower disk drives, then later to the MSS, and perhaps eventually even to a magnetic tape stored in a cabinet- all without ever losing track of any data. While all of this is going on, the system also automatically backs up all data when it's created, whenever it changes, and whenever it moves. Therefore, in case of a catastrophe, the entire database can be re-created quickly from tapes stored at other physical locations called backup sites.
The benefits of mainframe permanent storage systems -- the scale of storage, the range of storage devices, the transparent migration, and the automatic backup -- are critical if a large organization is to depend on its databases as its primary information and transaction storage medium.
Until very recently, personal computers and even UNIX workstations could not come close to the size and sophistication of the mainframe permanent storage systems. Mainframe users routinely talked about storing terabytes of data. However, until five years ago, the largest disk that could be attached to a PC was under 10 gigabytes (10GB); 1,000 times smaller than a one-terabyte file. Server disks were slow, and backup generally meant either of the following:
|Using floppy disks ("Is this a joke?" the mainframers would ask)|
|Transferring data very slowly to cartridge tapes capable of holding 250MB of data|
Storage management and backup technology was inadequate by mainframe standards.
In the last five years, PC storage technology has changed and improved at a very rapid pace. One driving force has been disk technology. Once a 10MB disk was standard on the IBM PC XT. Today drives from 540MB to 1GB are routine. Disks not only have gotten bigger in capacity, they've gotten smaller in size, considerably faster, and very inexpensive. Notebook computers costing under $4,000 and fitting easily into a briefcase contain 500MB disk drives. Ironically, as disk drives get smaller, they get faster, too. Making disk drives fast is a mechanical issue -- a struggle with the physics of moving disk heads rapidly from one track to another. However, the smaller the disk drive, the less the head weighs and the shorter the distances it travels. Presto! The same pressures that lead to more desirable smaller weights and dimensions also lead to better retrieval times.
Ultimately, advances in disk drive technology itself have made it possible for personal computer storage systems to catch up with mainframe storage systems. Furthermore, in a development that parallels the LAN revolution, database developers created a new approach to handling large amounts of data.
Historically, bigger, faster databases required bigger, faster disks. The biggest, fastest disks were devices costing tens and hundreds of thousands of dollars. Until recently, these expensive disks were considered just part of the high cost of having fast mainframe databases.
In the late 1980s, RAID technology changed that picture permanently by providing mass permanent storage for PCs. RAID stands for Redundant Arrays of Inexpensive Disks. In a RAID system, a single large, expensive disk drive is replaced by a bunch of smaller, inexpensive disk drives. A dedicated computer built right into the RAID system box translates requests from the user's computer so that the application appears to be communicating with a single big disk. In fact, one or more of the smaller drives performs the retrieving ("read") functions and saving ("write") functions. This is good for three reasons:
First, a group of small drives that collectively provide a certain capacity are still substantially less expensive than a single large drive of the same capacity. So RAID saves money. In a mainframe environment, a well-built RAID system might cost one-quarter to one-half as much as an equivalent big disk. Two to four times as much storage for the same money is a pretty compelling argument all by itself.
Perhaps most surprising, RAID is often much faster than an equivalent large disk drive. The biggest performance bottleneck in a large disk drive is the delay waiting for the disk head to get to the desired track. Mainframes typically use very sophisticated software to minimize the effect of this delay. For example, by collecting incoming requests from many users, the mainframe can rearrange the requests so that the disk head moves back and forth as few times as possible. This makes for less work on the part of the disk drive. Unfortunately, user requests will have been kept waiting for this rearrangement.
The RAID system replaces the single big disk -- with its single disk head -- with a group of smaller disk drives, each with its own head. Even though each of these disk heads may be slower than the single bigger head, there are many of them. So the RAID system can retrieve information from many disks in parallel. RAID systems can be much faster than a single large drive.
RAID systems can be more reliable, too. In fact, they can even exhibit some degree of fail-safe operation. The R in RAID stands for redundant. Because small disks are so much less expensive than large disks, it becomes practical to build extra disks into every RAID system. System designers can make these extra disks redundant so that they store information that is also being stored on other disks in the same RAID system. Therefore, a RAID system can run without interruption even if a whole disk drive breaks down. No information is lost either. Because the information was stored in two places, the system is smart enough to simply get it from the disk that's still functioning when needed.
RAID has its drawbacks, too. Although RAID systems retrieve information faster in many cases, writing information back to disk in some cases can slow down appreciably because of the extra overhead associated with redundancy. And although a RAID system as a whole is very reliable, repair frequency for individual components may go up because the systems have so many more components. RAID is not perfect. Nonetheless, it represents a huge step forward. In fact, it's even safe to say that in many ways, RAID is revolutionary.
System designers can use RAID to build arbitrarily large disk storage systems by simply including enough small disks. The resulting systems will be cheaper and faster than today's mainframe storage and will contribute fail-safe operation to boot. Of course, there's no rule that says RAID can't be used in conjunction with mainframes just as easily as with PCs. That's not the point. The point is that large, high-throughput, highly reliable storage systems are now no longer limited to use only in mainframe computer rooms due to extreme costs or size.
Particularly with the introduction of RAID systems, LAN networks of personal computers began to provide users with highly tiered forms of storage. Tiered storage on a LAN is hierarchical:
Tiered storage on a LAN is pretty sophisticated. The scenario just described may seem very ad hoc and unreliable compared to the carefully controlled mainframe environment. Such a personally-driven system may seem happenstance and uncoordinated. However, nothing about LANs and PCs says that tiered storage on a LAN has to be that way.
Lotus Notes is one example of a system that provides automatic tiered storage by performing a function called replication, first described in Chapter 7. Replication is not unique to Notes and was not even invented by Lotus. However, Lotus employed replication techniques thoughtfully in designing Notes. As a result, Notes can provide a great deal of organizational information to people who can't connect frequently to central computers. Potential Notes users could be salespeople on the road, support staff in a remote office, or the entire staff of a branch located in an area with poor communications facilities.
In a replicated environment, many copies of the information must be distributed all over the network. For example, each salesperson carries around the database describing his customers, their orders, and shipments all on the notebook computer under his arm. Of course, while the salesperson may have a copy of that data, many other copies of the same data exist in the central computer, at the regional office, and on the personal computers of many other sales and support people around the company. What the Notes Replicator does is keep all those copies up-to-date. Perhaps this process of keeping those copies up-to-date sounds simple. In fact, it puts most main- frame storage management systems to shame in terms of sophistication.
At one time, Notes replication stood alone in the PC environment. Today, however, both replication and hierarchical storage management have become hot topics; several vendors offer products that provide these facilities. Virtually all the major database vendors are now shipping replication facilities. Most vendors, including Sybase, Informix, and Computer Associates (CA), limit updates to occurring at a single site, making the service nowhere near as interesting as Notes. Oracle, however, on the server side allows updates to occur on any database and percolates the changes automatically. On the desktop side, Microsoft's Access replicates changes across both desktops and servers in a fashion very similar to Notes.
The kind of replicated database architecture I've just talked about performs very sophisticated self-adaptive storage migration. Records are changed at individual locations, and those changes ripple out gradually to reach the rest of the network. In addition, several vendors also offer hierarchical storage management products, so the user has a complete set of choices.
The replicated database architecture represents intelligent networking at its best:
|Managing a storage hierarchy turns out to be meaningful in a personal and
group setting, not just in a big mainframe data center. Personal
computers are leapfrogging the mainframes and providing benefits never
dreamed of in the days of big computers. Users can choose between
hierarchical storage management and replication, using them in whatever
combination best meets the needs of the business.|
|Mass storage was previously the preserve of the mainframe. But today, mass storage has moved to PC environments as well.|
Not very long ago, mainframes had much more storage, faster storage, and better-managed storage than PCs. And if you require a large, centralized database management facility, mainframes still offer benefits over PC-based storage. However, the mainframe's advantages are rapidly disappearing with the introduction of large PC servers, RAID storage, and 32-bit operating systems for PCs. What's more, if you start
to consider distributed databases, PC-based storage has leaped ahead of the mainframe. Replicated databases are the state of the art for this class of application -- and available only in the PC environment.
The next dimension of power that comes to mind is throughput More than just doing things fast, throughput deals with processing as many tasks as possible in any given unit of time. However, to maintain adequate throughput, a mainframe faces one class of problems for realtime transactions and a somewhat different class of problems for batch programs. The following sections explain further.
Historically, throughput has clearly distinguished mainframes from other classes of computers. Granted, supercomputers have always had bigger main memories and faster processors, and specialized database machines have boasted large arrays of disks. But when it comes to processing vast quantities of work, the mainframe stands alone even today. What is it about throughput that requires the specialized capabilities of a mainframe?
In business processes, realtime transactions are relatively small tasks that process business requests precisely when they occur. A transaction can move inventory from one location in a warehouse to another, transfer funds between bank accounts, re- serve a seat on an airplane, or cause a box to be loaded onto a truck. Each of these tasks by itself involves very little computer processing. However, in a typical big organization, dozens or even hundreds of these small transactions can occur every second. Transactions are like the synapses of a business system.
Likewise, the database is the organization's central memory. In a database management system, a transaction is defined as a series of operations that, if completed, will always leave the database in a consistent state. That is, if you start with a consistent database and then execute a transaction, the database will again be consistent after the entry is completed. One of the primary benefits of databases is that application programmers, by structuring applications around transactions, can be assured of the database always being in a consistent state. The only caveat is that the mainframe and its software must then guarantee that every transaction will run to completion to make this all come true.
Why is it so hard to guarantee that all transactions will run to completion? Normally, computers and programs run without interruption. However, in the real world, the power can fail, software errors can occur, the computer hardware can breakdown, or an operator error can bring the system to a halt. Nonetheless, the mainframe operating system (with the help of the database) must still guarantee the successful operation of all transactions. How can it do this?
The mainframe performs its special magic by remembering:
|What tasks are running|
|How far each task has progressed|
|How each task got there|
Just as a juggler can keep track of three, five, or even ten balls, the mainframe keeps track of every task being processed.
What happens after the mainframe experiences a serious problem? The mainframe knows enough about each transaction in progress to put things back the way they were before the problem occurred.
Suppose that the mainframe loses all power. Later, when the power comes back up, the mainframe will perform two general functions to recover from the system failure:
If this sounds like a huge amount of work, that's because it is! And the thing that really makes the whole sleight of hand amazing is that the mainframe is juggling thousands of items in the air at one time. Unbelievable.
To understand the real magnitude of the mainframe's juggling act, consider just a few numbers. Big computer systems measure their workload in terms of transactions per second. For example, a really big mainframe handling credit card authorizations can receive hundreds of transactions every second. Yet there are 3,600 seconds in an hour. Therefore, 3,600 times per hour, the mainframe could receive a batch of 100 or more requests for spending approvals. Assume that some of these requests take over a second to make their way through the computer. In that case, the computer would have to juggle hundreds or even thousands of transactions at any given time. Remember, the mainframe has to not only keep each task moving along, but also remember everything the program has done along the way. Then, in case of a problem, the mainframe must restore the database to the pristine state that existed before the affected tasks were started.
In addition to realtime transactions, a mainframe also must maintain good throughput for batch programs that are running at any given moment. In a batch program, the mainframe does not have the realtime constraints of transaction processing. During batch processing, the mainframe does not process hundreds of requests every second and ensure that all of them are completed in a few seconds. Instead, a batch program often takes many hours to complete. However, a batch program does share two functions with the transaction-processing program:
|Processing multiple tasks at one time|
|Tracking all work so that in case of a failure, all work in progress can be undone where needed to leave the database in a consistent state|
Big batch programs often work with large parts of a company's database at one time. For example, something as simple as producing a list of customers could require reading the entire customer database -- perhaps many times -- to convert it into sorted order. Sorting is one of those computational tasks that is easy to explain and understand, easy to request -- but amazingly hard to do. In such a sorting program, the mainframe must work with the entire customer database at one time.
Ultimately, when you consider mainframe throughput in both transaction- and batch-processing environments, anything that forces the computer to retrieve large parts of its database could be a recipe for slowness. However, the miracle is that mainframes can allow large numbers of simultaneous transactions and enormous batch jobs to access major parts of a company's database -- but still maintain adequate response time and database integrity (even if a system crash occurs). Quite a feat.
All of this legerdemain (sleight of hand) raises an interesting question. How did the computer industry come to develop such sophisticated mainframe software? And if mainframes are that great, why are so many people talking about doing away with these remarkable machines?
As explained earlier in this chapter, Herb Grosch observed that the pricing of computers in the late '60s and early '70s made it very attractive for companies to purchase relatively few large computers and centralize the company's processing on those machines. Grosch's observation turned out to be a law in two ways:
|It represented a kind of scientific model of real-world pricing.|
|The results of that numerical law had such a powerful effect on both computer culture and computer technology that centralized computing became the standard paradigm for the next few decades.|
Think about it: there are many attractive features to a decentralized computing model revolving around small computers located where the work is done. Yet, virtually every large organization in the world runs most of its operations with large central main- frames. Grosch's Law.
There were several technical effects of Grosch's Law on large organizations. Three of the most important effects are:
|Centralization of computing to gain economics of scale|
|Centralization of computing to simplify disaster protection|
|Development of multiprocessor systems and networks|
The following section briefly discusses the first two effects, multiprocessor systems and networks are discussed in-depth later in the chapter.
During the '60s and '70s, centralizing a company's computing resources led to tremendous economies of scale. Suppose that it's 1970 and you're responsible for planning the computer department of a large company. You have a choice between several different models of the IBM 360 mainframes (which commanded over 90 percent market share at the time). You can buy a smal1360 with one unit of throughput, a medium one costing twice as much as the small 360 -- but having four units of throughput -- or a big 360 costing three times the price of the small 360 but having nine times the throughput.
Your 360 system includes not only a big central computer, but also lots of disk drives, tape drives for backup and mass storage, and several large, expensive printers capable of printing hundreds of lines per minute. I'm talking about boxes costing hundreds of thousands of dollars each. For example, at that time, a disk drive with a few hundred megabytes of storage could cost tens of thousands of dollars. However, a disk drive cannot be connected directly to a mainframe. Instead, the disk drive must work through an intermediate box called a controller. Who cares? Well, if you're doing this planning and those controllers cost lots of money, you do! Controllers can be shared between quite a few disk and tape drives -- but only if those drives are located in one place to do the sharing. A pattern! Centralizing things leads to economies of scale. Is there more? You bet.
Even though IBM was committed to convincing its customers to centralize their computing, it was only a part of the centralization story .
Although IBM was certainly the dominant player in the '60s and '70s, the computer marketplace was still fiercely competitive, and factually speaking, every major player, every practical solution, involved technologies and cost curves that greatly favored centralization. The following section, titled "Cultural effects of Grosch's Law," talks
about deeper underlying forces at work that made Grosch's observation much more of a law than was understood at the time. In addition, the factors arguing for centralization involved more than just the pure hardware.
Until recently, computers were very difficult to install and run. Even today's computers are far from simple, but at least they don't require special power, heavy-duty air conditioning, and their own special rooms. Many people are familiar with the picture of the glass-enclosed computer room with dozens of huge boxes and special white-robed staff members tending to their temple. What may not be as readily appreciated is just how special that computer room environment really was.
For example, a typical mainframe involved not only dozens of boxes, but also hundreds of thick, hard-to-maneuver, and expensive cables to connect all the parts. Running those cables around on the floor creates a mess, a safety hazard, and a huge maintenance problem. As a result, most computer rooms are built with special false floors raised about a foot off the ground. The second, higher floor is built of heavy-duty metal, supported by a special steel framework, ensuring that the weight of the computers and the people can be easily supported. After the framework is in place, the upper floor is assembled from metal tiles, coated with special nonstatic laminates, and put in place one by one. The point of having tiles is that the tiles can be lifted individually to work with the cables running between the two layers of flooring (upper and lower). And yes, building a room this way is just as expensive as it sounds.
The floor is only the beginning in a high-tech computer facility. Special fire-extinguishing equipment ensures that smoke and heat are detected quickly and the associated fire snuffed out, without using water, so that the computer and its precious data are not damaged. Sophisticated security systems ensure that only authorized people can enter the room. And last but not least, keeping all those computers, printers, tape drives, and networks running requires full-time staff, often on duty around the clock.
All of these technical factors represented a compelling argument for developing large, central computer sites. If you were a customer in the late '60s, you most likely would have been asking IBM and other vendors to develop products for bigger and bigger central computer facilities. And if you were one of those vendors, building ever larger systems satisfied both your customer's needs and the logic of the time.
By the end of the '60s, it had become clear to most individuals involved with planning and building large computer systems that centralized systems represented both the state of the art and the way to build sophisticated applications. For technical people, such a vision is exciting; by definition one of the primary job-satisfaction factors for them is imagining a future and building it. Mainframe-oriented computing in the late '60s was fully as exciting to think about and work on as client/server is today. So this vision and excitement created the beginning of a major cultural movement affecting thousands of computer and business people. And the key thing to understand in looking at the second phase of the cultural movement was that those thousands of people greatly succeeded in creating centralized systems that work.
Successful cultures breed tremendous inertia, and that inertia defines the mainframe culture we live in today. Part of the inertia is simply resistance to change. In the '50s, no one even knew what a computer was. In the '60s, computers were for the most part a young person's game. Careers in the computer business were all new in those days. Today, though, entire organizations have grown up developing professional skills that all revolve around a particular style of building and running computer systems. Plans were laid, battles fought, systems built, and after 20 or 30 years, it all works. Now a major change is afoot, and people feel threatened. Part of the cultural inertia is based on resistance to change, but a larger part is based on valid reservations and concerns. The experience and expertise that those reservations and concerns represent need to be honored. To understand why that expertise is needed, look a little closer at the technical impact of Grosch's Law.
In the mid-'60s, when the mainframe as it is known today was being introduced, the systems were absolutely incapable of achieving the kind of "many balls in the air" throughput I described a few pages ago. In fact, the lack of the software to enable even simple sharing of computer facilities was a major embarrassment for IBM over a period of four years. These four years marked the amount of time required to develop OS/360, the operating system released in 1968 that first took full advantage of the capabilities introduced when the 360 family started shipping in 1964. Even OS/360 offered very limited facilities for facilitating mainframes doing many complex jobs all at the same time. As the '60s reached a close, though, it became clear to all major mainframe vendors, and IBM in particular, that throughput and ball juggling were the order of the day.
Over the next ten years a vast array of software systems were developed to allow mainframes to handle hundreds of transactions per second, run huge batch jobs with tremendous throughput, and keep massive databases running around the clock without ever losing data or processing transactions inconsistently. In many ways, the resulting products, such as MVS, CICS, IMS, HASP, and VTAM, not to mention the host of third-party software products, such as IDMS, ADABAS, and Total, truly represent some of the wonders of the Western world. To understand these products well is to be amazed and horrified at the same time.
The amazement stems from both the sophistication of the products and what they make possible. For example, it is perceived that most large companies use relational databases such as DB2 and Rdb to handle most of their data. That's wrong. Most large companies do not store most of their data in relational databases. Older databases, like IMS and IDMS and the underlying operating system's file manager (VSAM), are hard for normal people to work with, but they are fast. In comparison, relational databases are so slow that even today, after 15 years of ever faster computers, the performance advantages of the older databases are still so compelling that over 80 percent of the production data of large organizations remains in the data storage systems developed in the '70s. It is true that every large company has relational databases installed to help users get answers to questions. But when it comes to running the business, nothing comes close to the performance of the older systems -- even if they are hard to work with.
Power and sophistication are only part of the picture. Overhead is a major factor as well, and as it turns out, complexity has a particularly ugly face that is often overlooked, too. These two factors -- overhead and complexity -- lead to the feeling of horror when contemplating these amazing systems. A mainframe in many ways mirrors the large organization it serves. Today's big companies process large workloads with assurance, but the cost is a massive bureaucratic structure that is both expensive to run (overhead) and difficult to understand or change (complexity). Mainframes have exactly the same characteristics.
The nervous system of a mainframe can be monitored directly by attaching electronic probes at key points on the machine. These probes trace signals and send the information back to a separate computer that keeps track of what it sees. Analyzing the results produced by such a monitoring system can tell where a mainframe spends its time:
|Part of its effort goes into processing application logic: deciding
whether credit is to be granted, moving inventory from one place to another,
computing totals for reports. Generally speaking, this kind of directly
productive work accounts for about a quarter of the capacity available in a
|Another quarter goes into database-related activity: storing and retrieving information, tracking transactions so that the computer can be restored to a consistent state in the event of a disaster, and keeping the database organized on an ongoing basis so that users always see fast responses to requests. This quarter of the computer's time is, of course, directly productive, too.|
Adding these two pieces together, only about half of the computer's available power is accounted for.
The other half goes into the equivalent of bureaucratic overhead: operating system and utility processing. The amount eaten up by these management activities may vary from one third to one half, but in all cases it's pretty significant. By itself this figure is interesting, but not necessarily scary .It is the implication that really creates a problem. The implication is that there's a lot to manage. Pretty trite sounding, right?
Having so much to manage creates two problems in turn:
|First (and less important), the office is giving up a great deal of very expensive computer power it really can't spare. Recall that this chapter began by lamenting that even the biggest computer just isn't big enough for many problems. Now you find out those big computers are spending a major fraction of their time just keeping things sorted out -- just tracking balls in the air -- instead of doing the work they were purchased for. In a centralized environment, there's no choice; the direct result of funneling all the work through a single computer is a huge amount of administrative overhead to keep it all running. That's the rub.|
|Second, most organizations find they can't keep it all running. The computer does an amazing job optimizing all the tasks, logging all the transactions, and maximizing all the throughput. However, as more and more gets piled on, the implications of a single error become magnified. If a batch run fails, is the batch window big enough to allow a rerun? As the batch job gets more complex, how many places are there in which single errors can occur? If the invoices don't get aged on schedule, are the credit limits still correct? Does that mean that if the batch runs fail, transactions can't be processed the next day?|
Ultimately, the era of the mainframe has tested the practicality of central planning and control. In theory, by centralizing an organization's processing on either a single mainframe or a small number of mainframes, tremendous economies of scale become possible. However, just as centrally planned economies have trouble delivering on their promised economies of scale, companies have the same problems with mainframes today.
Personal computers became so popular so quickly because they promised freedom for users. Until computers became so inexpensive, nobody thought much about how restricted access to computer power was. Everybody took it for granted that a computer was a big, expensive box at the beck and call of management, but certainly not in any way a personal servant. Even if the spreadsheet had been invented first on a mainframe, it would never have taken off simply because the cost of using it would have been astronomical. Consequently, the centralized computing model automatically excludes personal computing in the way it is known today.
At root, many large organizations today have exceeded the practical limits of what their central mainframes can do. Airlines are an extreme example. An airline's largest application taxes even the largest computer. Even organizations with more modest requirements run into this limitation.
As the mainframe's limits are reached, the computer staff must scramble to improve response time for transactions, worry about how long the batch run will take, and think hard about whether a new application will require an upgrade that is too expensive.
As a company reaches the limits of its mainframes, the firm may begin to do things that enhance mainframe (machine) efficiency -- but also undercut user efficiency.
As I've already shown, most large organizations keep their "live" information (for example, order status, shipment histories, and so on) on their central mainframes. Although part of this information may be periodically copied to servers and personal computers for marketing and management analysis, the complete database resides on the mainframe. Even though it's there on the mainframe, it's hardly accessible -- because the data is not stored in relational databases. (Generally, relational databases store data in tables, allowing users to answer a wide variety of questions easily. Database models are covered extensively in Chapters 9 and 10.)
Why not keep all that data directly in a relational database? Because relational databases are too slow. However, another way to look at that problem is that mainframe power is spread too thin. Because mainframes are so expensive, you have to get a lot of work out of them, and then there isn't enough power to go around. So most companies keep most of their data in non-relational databases to which users can't posit questions flexibly.
Even the data that is in relational databases isn't really available for user questions. In many big companies, the main reason for installing relational databases is to make the programmer's life easier; the database becomes a tool to simplify application development. Fine, but what about those users?
Users also want access to data. The problem is that the questions formulated by these users often require access to major portions of the database. Any query involving a sum or an average could well cause the computer to consider every record in an entire file (a product file or a customer file, for example). This is exactly the kind of request that can bring a mainframe to its knees. Yes, the mainframe is optimized to service such requests. But even with all its vaunted throughput capability, the mainframe can service only a few such queries at one time.
In effect, queries more closely resemble batch jobs than transactions. Transactions work with only a small part of the database at one time. That's one reason thousands of them can run at one time. However, batch jobs may access the entire database. Therefore, the mainframe typically runs only a few batch jobs at once. Yet all of a sudden, users with modem query tools are asking the mainframe to process the equivalent of hundreds of batch jobs per day. A mainframe that routinely processes hundreds of transactions every second might take half an hour or more to generate the answer to a single query. And worst of all, because these queries are being formulated on the fly by users, the computer staff can't plan for or optimize around these queries. So what do they do? What any central control organization does when threatened: refuse to allow it to happen. Consequently, in most companies, even the relational databases are not really available to end users -- whether or not they have the tools to ask the questions they really need to ask.
Ultimately, the mainframe continues to be a centralized beast because of its cost. And because companies insist on funneling the total workload of an entire organization through a single machine, they can't afford to allow that machine to serve the needs of the individual.
Additionally, central systems all suffer from being so complex that the central plans just don't work. For example, a large bank with hundreds of branches suffered a computer outage in the early '90s. The entire branch banking system was shut down for over four days, with all branches operating totally manually. The situation was serious enough to make the national news.
Later, senior management commissioned a task force (what else is new?) to determine how this outage could have occurred and what could be done to ensure that it would never happen again. After six months, the task force concluded that the complete cause of the shutdown could never be determined. The overall system, the interconnections between the thousands of parts and the complex sequences of activities and events happening every day, were beyond the capabilities of any single individual or even group of humans to understand.
This type of problem is neither unique nor new. When systems get too big, they get to a point where humans can no longer understand them. Yes, the systems may still operate, but not because people are able to trace the workings and understand what makes that system succeed sometimes but fail at other times.
The standard solution to this problem is to divide such large systems up into smaller parts or components. As long as the parts have well-defined ways of communicating with each other, then a larger system may result, but no single person or group of people must "operate" that larger system. Rather the larger system results from the cooperative interaction between the smaller systems. In computer terms, this type of system is a distributed system, and it is exactly cooperating components that make these distributed systems possible to build. Not only are cooperating components the key to distributed systems in general, but cooperating components are the key to making large systems manageable.
But what about cost and throughput? Don't mainframes exhibit economies of scale that make it prohibitive to do the same work with smaller computers? Aren't large computers with their sophisticated operating systems uniquely designed to process large volumes of work efficiently? Finally, isn't a centralized database stored on a single computer a fundamental requirement for controlling access to inventory, bank accounts, airplane seats, and so on? The following section discusses that question.
Even today, many large companies are continuing to consolidate data centers. Just in the last two years, one global manufacturer combined 15 medium-size data centers into four giant locations -- each with a proportionately larger mainframe -- and saved
a great deal of money in the process. If these size-related advantages are so compelling, how can client/server systems compete? The answer lies in the successor to Grosch's Law -- the experience curve.
Bruce Doolin Henderson developed his most important ideas about the experience curve around the same time that Herb Grosch was formulating Grosch's Law. Before founding the Boston Consulting Group, Henderson had an illustrious career as an executive, government advisor, and consultant. In the early years of the Boston Consulting Group, Henderson discovered an interesting effect associated with products that were built in huge quantities. This effect has come to be known as the experience curve. Essentially, the experience curve refers to the benefits a company experiences from building a product in extremely large quantities. This section briefly describes the key benefits of the experience curve.
One benefit of the experience curve is that unit cost falls as production volume increases. Have you ever wondered how televisions could be so very inexpensive? Compared to most household appliances, all computers are far too expensive. Refrigerators, microwave ovens, stereo sound systems, VCRs, answering machines, and a host of other products -- all built in the millions -- cost well under $1,000.
When new technology is introduced to the consumer market, it generally starts out costing a great deal. For example, VCRs originally cost thousands of dollars. Most people refused to buy them at that price. Yet enough people did buy them for a next generation to be introduced costing less. The volume went up; more VCRs were produced; prices went down. Before long, a VCR that was easier to use and more powerful cost $250 instead of $2,500. Compact disc players went down the same curve. By now, everybody is familiar with this process, but back in the early '60s, technology was still being introduced at a relatively slow rate, and the process I've just described was not yet understood.
Another benefit of the experience curve is that experience with a product builds expertise. After working with a variety of manufacturing companies in the '50s and '60s, Bruce Henderson noticed that if a company gained an early lead in mass-producing a product, the firm would hold a major advantage over its competitors. If companies could only build enough of the products, they learned how to build the same product, perhaps even improved in quality, at a substantially lower cost than their competitors. The more these companies built, the cheaper they learned how to build the product; and thus was born the experience curve.
Building products in sufficiently high volume allows engineers to learn a lot about how the product is used, what parts of it are unreliable, which parts of the production process can be optimized, and so on. Building a product first confers an automatic advantage on the builder. On the one hand, if a new product fails, all the R&D will be a financial loss. On the other hand, if the product succeeds, the company will gain the
opportunity to improve both the product and the production process -- and have those improvements funded by the customers buying the product. By playing this game well, the company that mass-produces a successful product first will usually win, other things being equal.
The experience curve comes into play mainly when a product is produced in very large volumes. A volume in the millions guarantees a ride on the experience curve. However, selling just a few thousand a year, no matter what the price, really doesn't count. Volume is critical because it justifies and pays for capital investments.
To illustrate, VCRs didn't become affordable just because a few people bought them. Not at all. Instead, VCRs became inexpensive because companies sold hundreds of thousands of units -- which justified the construction of large, automated factories. In turn, those factories could build VCRs in the millions, which in turn drove the price under $500. Generalized microprocessors costing $25 to $50 made VCRs unbelievably smart -- without thousands of dollars of custom electronics. Those microprocessors are inexpensive because they are stamped out in huge quantities. (As with VCRs, the huge sales volume for the microprocessors justified both the engineering cost of the processor's design and the construction cost of the factories that produced them.)
However, in the computer industry, manufacturers developed many generations of mainframes, each genuinely better than the generation before. But mainframes still became more expensive. Those computer companies never got to ride the experience curve -- even though they gained valuable experience building mainframes over the years. A worldwide installed base of 30,000 big computers just doesn't constitute a mass market.
Actually, the ability to build products in large volume affects much more than just manufacturing costs. Developing each new generation of computers, whether mainframes or PCs, costs tens (or hundreds) of millions of dollars. In the case of the personal computer, this cost is amortized across millions of machines. Intel, for example, ships over 40 million processors every year. The cost of R&D per processor is just a few dollars -- hardly noticeable. IBM, on the other hand, ships only a few thousand mainframes each year. R&D therefore becomes a significant, sometimes huge part of the cost of those machines.
The cost of a mainframe is justified on the basis of economies of scale. Bigger computers deliver more throughput per dollar -- if you can afford to buy the mainframe in the first place. However, microprocessors in small computers are based on economies of scale, too. Small computers can be much less expensive to build than mainframes -- if you can just build a big enough factory to produce them in the first place. Of course, the factory has to be kept busy. Otherwise, the prices will go up again. But if production volume is high, the economies of scale favor small computers over mainframes.
This is the supreme irony. If you ignore how computers are built, mainframes are very cost-effective. Yet if you focus on computer production costs instead of computer throughput capacity, small computers become the efficiency champions -- because they can be built in huge volumes inexpensively.
For example, every year a few thousand mainframes are sold, and they cost anywhere from $100,000 to millions of dollars. That same year, customers will buy approximately 300,000 engineering workstations, made by companies such as Hewlett-Packard (HP), Silicon Graphics, and Sun. The workstations are priced between $5,000 and $50,000. In absolute terms, the mainframe may be faster, but the workstation on an engineer's desk will be almost as fast -- and will cost less than 10 percent of the price. In terms of how engineers use computers, these workstations are essentially mainframes packaged for individuals but built in moderately larger volumes. Workstations drive the design of the airplanes, cars, and the microprocessors discussed all through this book.
In that same year, Intel will produce about 40 million microprocessors that will go into personal computers in offices and homes. Yes, 40 million, and the number is still growing. Measured in unit volume, the personal computer industry is about l00 times bigger than the workstation industry -- which, in turn, is about l00 times bigger than the mainframe industry. In other words, the PC market is 10,000 times larger than the mainframe market in unit volume. No wonder the experience curve triggers in one industrial segment but not the other.
Take a trip to any computer store and consider the product lines offered by any of the larger computer manufacturers: IBM, HP, Apple, Compaq, Epson, AST, and so on.
Every day, in retail locations, these vendors slug it out for the consumer's dollar. Volume sales drive competition, and competition drives prices down. This is the sharp edge at the cutting point of the experience curve. Here's why mainframes essentially can't compete on a power-for-the-dollar basis with small computers. In effect, personal computers have become the buy of the century due to the price competition caused by the consumer market, the level of capital investment that this competition and volume fosters, and the resulting experience curve in this segment of the computer market.
The forces that made PCs powerful, affordable, and available have virtually repealed (though not reversed) Grosch's Law. This repeal is still limited in some respects: if you spend more on a computer, you may get more speed -- but the cost will be out of proportion to the increase in power. At one time, spending twice as much to get four times as much power represented a bargain. Today, though, spending 100 times as much merely to get ten times as much power represents a bad feeling. But is there really a choice?
Aren't mainframes really all about throughput? Even if a workstation or fast personal computer can crunch numbers as fast as a mainframe, what about processing thousands of transactions or running big batch jobs? Don't you need a single big computer to do all that in a coordinated fashion? The next section examines these questions more closely.
How does a mainframe achieve its vaunted throughput? By doing many things at one time. And how is a mainframe able to perform so many tasks so quickly? Because a mainframe is essentially a cluster of several smaller computers packaged in one box and connected together on an internal, high-speed network called a bus.
Mainframes consist of many pieces: a computer, memory , disks, tape drives, printers, controllers, and so on. One of the things the mainframe does best is keep those pieces busy as much of the time as possible.
For example, if a transaction requires information from a disk, the mainframe doesn't simply stop working until the information comes back. Instead, the mainframe asks the disk for the data, remembers that it made that request, and performs other tasks in the meantime. A mainframe's working life consists of millions of interwoven tasks, such as the following:
|Starting the disk|
|Starting the printer|
|Starting the tape drive|
|Getting back information from a disk|
|Sending data to a terminal|
|Starting another disk|
|Starting a big batch process|
Tracking all the threads keeps the computer very busy. How can one computer keep up with it all? Because the mainframe is not a single computer.
To process its formidable workload, a typical mainframe consists of several smaller computers, such as the following:
|Central computer: Exercises control over all other computer
components in the mainframe system. Quite often, even the "central
computer" consists of still more individual computers packaged as one
|Channel: A dedicated computer for talking to hard disks, tapes, and
printers. Each channel has its own little programs, its own little memory,
and the capability to complete tasks on its own. Most mainframes possess
|Disk controllers: Computers that communicate with hard disks in the
|Network controllers: Computers that manage the data traffic among other computers and terminals in the mainframe system.|
If the mainframe is a bunch of computers acting like a single big one, why couldn't you take other kinds of smaller computers and make them act like big computers, too? In fact, why not make a bunch of personal computers do the job? (In a few paragraphs, I'll address that question.)
In the '70s, some computer manufacturers noticed that mainframes were really just specialized clusters of smaller computers integrated by a custom, high-speed, internal network called a bus. Those vendors wondered if they could build better mainframes by really exploiting this "cluster of computers" idea.
Tandem was the first company to really succeed with the "cluster of computers" concept. Founded by James Treybig in 1976, Tandem's original goal was to build fault-tolerant computers. Treybig observed mainframes being used increasingly in mission-critical applications: applications so important that a company could go under if the computer was down long enough. Why not build computers that would never fail, he wondered?
Treybig used redundant components to build fault tolerance into Tandem computers. In the tradition of wearing a belt and suspenders, a Tandem computer has at least two of every component: power supplies, computers, memories, disks, disk controllers, and so on. All components were built so that they could be removed and replaced without shutting down the system or affecting other components. There were two paths between any two components in the system.
For example, every processor could talk to every other processor in two different ways, and each disk could be reached through two different processors. As a result, if any part of the system were to malfunction, the system would keep running and notify the operator about the component failure. Then a technician could replace the failed part while the system continued to run. The users would not even be able to tell precisely when the repair had been made.
Tandem called its system NonStop Computers and the concept really worked. In record time, Tandem grew from nothing to sales of several hundred million dollars a year. Oddly, while nonstop operation was a major drawing card, many customers ended up buying the system for a more compelling reason than fault tolerance.
A Tandem system can easily consist of more than just two processors; while two make it work nonstop, you could expand the system up to 16 processors. Each Tandem processor was about equivalent to a decent-sized minicomputer of the time, far less powerful than a mainframe. However, 16 of the little processors side-by-side in the same computer was another story. And that's what drew customers.
Large corporations discovered that a fully expanded Tandem computer was an absolute powerhouse at processing transactions. In fact, a fully loaded Tandem with 16 processors and additional components could easily process three to four times the number of transactions per second that even the largest mainframe could handle. Yet even fully equipped, the Tandem was less than half the price of a mainframe. It's easy to see why hundreds of customers bought these machines:
|Huge additional processing capacity (many companies were exceeding the
capacity of their existing mainframes)|
|A lower price than new mainframes commanded|
|The capacity to be expanded in easy stages without ever replacing the
entire unit (no more expensive replacement upgrades to justify)|
|The capability to run some applications in fault-tolerant mode|
Later on, certain technical limitations in the system caused Tandem to stumble at a time in its history when it might have been able to eclipse other mainframe makers. As a result, Tandem today is a billion-dollar company -- very successful, but still a
niche player in the industry. Yet Tandem proved that a system of relatively small computers could perform as well as a mainframe at transaction processing -- one of the most common mainframe tasks.
The Tandem computers represented a repeal of Grosch's Law. Essentially, in the world of Tandems and mainframes, big computers still must handle the serious work of big organizations. However, with the Tandem computers, the cost of additional computer power was now directly proportional to the increased cost due to additional components. You still would have to buy a big computer, but spending twice as much might yield twice as much computer power.
Mainframes, Tandems, and minicomputers are simply not produced in large enough quantities to ride the experience curve. After you realize that even big computers are just clusters of smaller computers, it is only natural to keep coming back to the question of using small, high-volume computers to build those clusters. Why can't you build the equivalent of a mainframe out of the same microprocessors used to build personal computers? Why not have a machine modeled after the Tandem -- but use 16 or 32 or 64 processors that cost $100 instead of $50,000 each? Why not, indeed? The change is starting to happen.
Drawing on the experience of Tandem and other firms, computer companies in the '90s created a new generation of microprocessor-based servers. Often called superservers, these computers are based on a multiprocessor design. Multiprocessor servers offer a particularly appealing way to achieve scalability. In a multiprocessor environment, a server can literally be scaled up on the spot by plugging in extra processors. Multiprocessor servers yield several benefits, which are explained in the following paragraphs.
Of course, multiprocessor computers aren't new. Tandem's already done it, so has IBM, and so have many other companies. What is new is the idea that these multiprocessor boxes are built around commodity-priced parts that are riding the experience curve and becoming more cost effective.
Even if the overall server box is relatively expensive, all of the server's component parts won't be. Suddenly, the processor you're plugging in can be a $100 component. Even if you insist that each new processor must come with its own memory and even if you insist on 50MB of memory per processor, you're still talking about $10,000 per
processor. And what a processor! With a 100 MHz Pentium microprocessor and 50MB of memory, you'd have more than the equivalent of a mainframe in a box that can be easily parked in a cupboard.
Multiprocessor servers make PC networks incredibly scalable, as shown in the following example:
|Small offices: In small offices, you could start by giving every
employee a single workstation, costing perhaps $2,000. Later, you could have
one of the workstations double as a server while still acting as a
workstation. Finally, you could later upgrade that server with more memory
and disk space at a cost of perhaps $3,500.|
|Larger offices: In larger offices, you could install dedicated
servers with yet bigger disks, tape backup devices, and more memory at
$5,000 a pop.|
|Warehouses: In warehouses and regional offices, you could spend
$10,000 to $40,000 to install small multiprocessor servers (with up to four
processors) providing the throughput of a medium-size mainframe.|
|Major locations: At major locations, you could install big multiprocessor servers (with 20 or 30 processors) at a cost of $50,000 to $250,000 (depending mostly on the amount of disk storage).|
If a Tandem can process transactions so well, it's not hard to imagine these multiprocessor client/server systems competing with mainframes by doing it even better, although some new software may still be required to make it possible. (More on that later.) By definition, transactions are small tasks. So if there are lots of transactions, they could be easily distributed among many small computers. But what about the batch jobs? Consider a real-world experiment that answers this question.
Remember the invoice aging process? In that job, the mainframe processed unpaid bills one by one and decided whether to:
|Send a letter demanding payment|
|Reduce the customer's credit limit|
One organization runs its aging application on a large mainframe, the largest available today. Processing over 500,000 invoices each month, the aging run takes 6.5 hours. This company had written the application in a way that made it easy to convert it to run on smaller computers.
In a rather casual experiment, the company set up a network with 8 database servers and 20 personal computers functioning as computational servers. The computational
servers were 66 MHz, 486-class machines. Everybody involved was convinced that the client/server application would either not run at all or take a huge amount of time to complete. After all, the total network configuration cost less than 10 percent of the mainframe's cost, so it was rather farfetched to expect a lot from the client/ server system.
Elapsed run time? Thirty minutes. Although the aging process took 6.5 hours on the mainframe, the same process took only half an hour on the network of little computers. The client/server application ran more than 12 times faster than the mainframe. What can we conclude from this?
Although the results of the invoice aging experiment are promising, you can't jump to conclusions about client/server system performance. Most batch applications can't be converted to client/server systems at all. Instead, most mainframe batch jobs would have to be rewritten to run in a client/server environment. The invoice aging application worked so quickly because it was architected to run in both environments. Remember also that some batch applications can't be split across multiple computers at all. Having expressed these caveats, client/server systems still offer remarkable potential. Let's explore what that potential is all about.
Building the networks based on small, multiprocessor servers produces startling price/performance opportunities. Grosch's Law is not only repealed, it's reversed. The less you spend on your computer, the more power you get. The less you spend, the more power you get? Historically, the best way to get more power was to buy a bigger computer. Now the best way to get more power is to buy smaller computers. True, you have to buy many of those smaller computers, but in the aggregate, many smaller computers will still produce more computer power for less money. So per computer, you get more power by spending less on each computer but buying more of them. But what about throughput (doing lots of things at one time)?
Multiprocessor servers represent a way of "building a mainframe out of personal computers." Such servers are much less expensive than mainframes and potentially far more expandable and fault-tolerant. What's more, multiprocessor servers share certain traits in common with mainframes.
Multiprocessor servers are still expensive because they still concentrate a lot of processing in a single box -- just as mainframes do. Multiprocessor servers also require complex operating systems to coordinate all the server's work. Consequently, multi-processor servers could suffer from the same kind of operating system overhead found in mainframes.
Worse, large servers may soon be cost effective in storing lots of work from various parts of a company or location. That trend could lead to the kind of manageability problems that plagued mainframes. So you may get less expensive computing and more scalability -- but you don't necessarily get qualitatively different computing with
client/server systems based around multiprocessor servers.
The invoice aging experiment points to an even more different direction for distributed computing than multiprocessor servers. If the client/server network collectively functions as one large computer anyway, why have multiprocessor servers? Why not let the network of personal computers function as the servers? If invoice aging runs fastest on 20 computational servers talking to 8 database servers, why should you have any big computers at all? This is the true appeal of the client/server approach.
You should be clear about the pluses and minuses of this approach. Using a network of small computers to replace a single large machine may turn out to be cheaper and, in terms of throughput, faster. But is it simpler? This is a hard question to answer. On one hand, managing a single large machine is likely to be more straightforward and therefore simpler. On the other hand, the network of small machines allows the computing power to parallel the organization's structure. Putting the computers where the work is has an appealing simplicity of its own as well. So the invoice aging example raises some disturbing questions about the real need for big computers. At last, companies can consider not having mainframes after all; technically, the possibility exists. Following through, however, requires some careful thought.
In thinking back over the entire progression of thought in this chapter, it has arrived at a very interesting place. The entire focus has been on the mainframe, generally considered to be the epitome of monolithic, centralized computing. The first thing discovered is that the mainframe itself is in fact built out of what? -- cooperating components. The only problem is that packaged in the form of a mainframe, those components are expensive and are not easy to work with, replace, or rearrange. Then, this chapter considered other big, mainframe-like machines and found that the more a system was built not just like cooperating components, but instead literally in that form, like the Tandem, the better the performance. Finally, this chapter considered a benchmark in which the large machine disappeared altogether, and throughput was accomplished entirely by commodity component, off-the-shelf boxes. So the surprising conclusion of this exploration of big systems is that by building them out of lots of small systems, operating together as cooperating components, you get more, not less, performance than ever imagined possible.
Why have big computers at all? What the invoice aging experiment shows is that even for batch processing, you get a huge savings by going from mainframes to networks of personal computers. If batch processing, not to mention transaction processing, is
more effectively handled by networked personal computers, why have big computers at all?
Yes, groups of computers may still be collected at central sites. And yes, those central sites may provide shared operational facilities. But it's no longer necessary to use big computers to combine work from lots of different places. This insight fully reverses Grosch's Law. The result is simpler computing, lower costs, and greater throughput. Of course, getting to this future will take time. It won't be appropriate in all cases. And even where appropriate, the transition won't happen in one step. But the concept fits in well with experience in many other areas.
The experience of the 20th century provides a compelling argument for decentralized operations of all kinds. Although mainframes have some remarkable capabilities, networks of personal computers -- cooperating components -- can replace the big machines. Not by personal computers alone, but definitely by networks.
Are we there today? No. Originally, personal computers were missing all of the key features of mainframes: the speed, the address space, the storage management, the hardware, and the software to support high throughput transaction and batch processing.
However, today brings a different picture. Speed is no longer a problem in large PC network applications. Additionally, PC address space is in the process of being expanded rapidly.
Granted, PC storage management is a mixed picture. In some areas, PC networks are still missing critical capabilities. Yet in other areas, PC-based products with sophisticated replication facilities are actually significantly ahead of mainframes.
Ultimately, PC client/server systems are a compelling model of the network as computer. In effect, the LAN can be a better "mainframe" than the mainframe was. The LAN can also eliminate problems with complexity and scalability that plagued mainframes. Distributed systems have complexities of their own, of course, but at least the possibility exists of being able to choose one complexity over the other. The question is, how do you sort through the design and infrastructure issues to take advantage of this new computing architecture so that choice becomes possible? That's what the rest of the book is about.