Infrastructure Architecturing

Infrastructure architecture essentials, Part 3:
System design methods for scaling

Tools and techniques
Sam Siewert (siewerts@colorado.edu), Principal Software Architect/Adjunct Professor, University of
Colorado
Summary: In an ideal world, all systems would have linear scaling of all resources with linear cost, but this is
rarely the case. Cost may include not only capital expenditures but operational costs for increased cooling,
power, rack space, and management requirements. System designers and solution architects who plan ahead
for scaling can at least control cost, make initial trade-offs for the long term, and provide mostly linear scaling
with similar increases in capital and operating costs. Choosing the right scaling strategyranging from simple
server-client to clusters to grid, cloud, or general Internet servicesup front is critical. This article arms
systems designers and solution architects with methods for success.
Date: 14 Oct 2008
Level: Introductory
PDF: A4 and Letter (90KB | 8 pages)Get Adobe Reader
Activity: 443 views
Comments: 0 (Add comments)
Average rating (based on 3 votes)
Scaling is most often thought of as the ability to expand services, increase access to data, or add client load.
The ability to handle more clients by providing more services and data access is most often achieved by
scaling server-side processor, input/output (I/O), memory, and storage. But this article, the third installment in
this series on infrastructure architecture, looks at alternative architectures and considers how scaling fits into
new paradigms such as grid and cloud computing.
Too often organizations overlook costly operational expenditures associated with scaling, including power,
cooling, and rack space. Furthermore, good preparation for scaling can help eliminate I/O, processor,
memory, or storage bottlenecksthe topic of the second article in this series. Resources such as power are
not discussed in this article, but the Resources section provides links to more information on that topic.
Scaling beyond a server with clients requires strategies that include clustering (both processor and file
systems), grid computing, and cloud computing and is generally defined on the uppermost bound by
ubiquitous Internet services that can meet rapidly changing public demand.
This article's first focus is on planning ahead to determine scaling bounds and strategies for scaling that are as
close as possible to linear and infinite. Second, it looks at each major service resource for server level, cluster,
grid, and cloud computing software as a service (SaaS) and hardware as a service (HaaS) scaling, including
processor, I/O, memory and storage, and strategies to scale and balance each. The potential breadth and depth
of generalized scaling strategies for servers, clusters, grids, and cloud computing is immense, but this article
provides concrete examples as well as points to greater detail so that you can tackle difficult problems on
large systems at the infrastructure level.
Principals and goals for linear scaling systems
Infrastructure architecture essentials, Part 3: System design methods for sc... http://www.ibm.com/developerworks/library/ar-infraarch3/
1 of 8 8/22/2009 6:52 PM
A linear scaling system requires a capable compute node, such as an IBM System x server or
BladeCenter system, which provides symmetrical multiprocessing (SMP) scaling for processor or memory
coupled with sufficient I/O scaling for clustering, storage networks, management, and client-access
networking. Looking forward to larger scale systems with broader scope and client access, it is most
important that the basic compute and storage subsystems be carefully designed for expansion:
Hierarchical scaling: Cloud computing centers potentially composed of grids, high performance
computing (HPC) clusters, or SMP servers with client access at all levels
Service-Oriented Architecture (SOA): Careful consideration of coupling of computations and data
access for applications along with client access networks
High speed interconnect (HSI) cluster networks: Networks such as Infiniband, Myranet, and 10GE
Scalable storage access through a storage area network (SA): 8G Fibre Channel, Internet small
computer system interface (iSCSI) over Infiniband or 10GE, or Fibre Channel over Ethernet/converged
network adapters (FCoE/CNA) and protocol offloading host bus interfaces
SMP compute nodes: Nodes with sufficient processor, memory, and I/O channel expansion for
storage, cluster, management, and client networks
Scalable file systems: Network access storage (NAS) head or gateway designs with parallel file
systems to scale with storage and the number of clients along with network file system (NFS) protocol
acceleration technologies like remote direct memory access (RDMA)
Green factor: The scalable power, cooling, and rack density of each subsystem selected
Geo-scaling: Using switch uplinks to dark fiber with dense wavelength division multiplexing (DWDM)
add and drop mulitplexors
Terrascale, petascale, and exascale challenges
Along with operational costs, management of resources is perhaps the biggest problem in scaling. Future
systems will have to provide more autonomic features, including self-configuring, healing, optimizing, and
protecting (self-CHOP), to reduce IT costs for radically scaled-out systems in terms of compute capability
and numbers of clients. Likewise, the green aspect of basic building blocks, including power, cooling, and
rack density, will become increasingly important. To date, focus has often been on acquisition cost rather than
the total cost of ownership (TCO) and the cost of providing services.
Skills and competencies: planning for growth up front
As shown in Figure 1, systems can scale from simple client computers to larger SMP servers. Client
computers connect to clusters of SMP servers that can divide and conquer algorithms, given their computing
power. Clusters of SMP servers can likewise provide concurrent services to grid systems. Built on cluster and
grid computing, centralized cloud computing services on the Internet is growing rapidly. Cloud computing
might be as simple as a shared calendar and not require HPC clusters, but then again, access to HPC over the
Internet is becoming a growing concern in both the academic world and the business computing world (see
Resources). The clusters and SMP servers in a grid should themselves scale in terms of processor, I/O,
memory, and storage access so that they can accommodate application hosting goals for services in the grid or
cloud computing center. Perhaps the best place to start is by asking yourself a series of scaling questions, such
as:
To what extent can processor, memory, and I/O channels be expanded on each SMP compute node?
How will storage access, client network access, and cluster network synchronization and data sharing
be balanced for applications in a given cluster?
How will client networks and management scale for multiple servers and clusters?
Will services be made available to a larger number of clients, necessitating grid management?
Will there be value in opening up services to the public?
2 of 8 8/22/2009 6:52 PM
Figure 1. Examples of scaling and application coupling
Note that Figure 1 shows use of IBM System x3650 nodes and Fibre Channel SAN-attached DS4800 storage
as an example of building block to go from SMP server to a cluster of System x3650 servers with DS4800
SAN storage and a parallel file system such as global parallel file system (GPFS). Clusters designed this way
can be placed in a grid; for higher density, using a BladeCenter system might be considered in addition to or
to rehost clustered services. Grid provides coordinated management, security, and client/user management
tools to simplify IT work associated with a large number of clients in an SOA.
Tools and techniques: estimating workload and scaling
A detailed overview of I/O, processor, and memory performance tools was provided in the second article in
this series. One of the best ways to estimate workload is simply to run applications and, in a simple SOA, plan
on scaling based on the number of clients expected to run each type of service. Good scaling benchmark tools
emulate client service requests using threaded or asynchronous workload and can be scaled and run directly
on compute nodes or clusters or from a high performance client-emulator node. For example, a cluster might
have a bonded 2 10 Gigabit Ethernet interface for a NAS head to 24GB Ethernet NAS clients.
Figure 2 shows a basic I/O, processing, and memory scaling model. One of the most significant drivers that
may not be immediately apparent is the extent to which host bus adapters (HBAs) for storage, host channel
adapters (HCAs) for cluster interconnect, and network adapters for client/management networks offload
protocol processingor, put another way, how much host-node loading do each of these I/O interfaces and
their stacks place on compute nodes?
Figure 2. SMP node I/O scaling and workload considerations
3 of 8 8/22/2009 6:52 PM
Server scaling
Server scaling requires detailed knowledge of the processor complex, the processor-I/O-memory bus
architecture, and the host channel design. For example, in Figure 2, if the System x3650 server is used as a
compute node, this system has two gen1 x8 and two gen1 x4 PCI-e I/O channels along with on-board
dual-GB Ethernet interfaces and a redundant array of independent disks (RAID) controller as well as several
high availability features. So, with the System x3650, one possible configuration would be a two-port 10GE
network adapter on the x8 bus slot for the HSI, one x8 8G two-port Fibre Channel HBA on one of the x8s,
and a one-port 10GE network adapter on each of the remaining x4 PCI-e slots for the client network
interface. This configuration provides 20Gbps full duplex to client uplink, 16Gbps to SAN storage, and
20Gbps to the HSI for clustering and uses the two built-in gigabit Ethernet interfaces for redundant
management interfaces. This particular configuration leaves no bandwidth unused out of the 24 gen1 2.5Gbps
PCI-e I/O channels, which is a total of 60Gbps (20Gbps client, 16Gbps SAN, 20Gbps cluster HSI, 2Gbps
management, and 12Gbps internal RAID controller = 60Gbps).
Making sure of bandwidth for the I/O interface into the processor complex is a great start and is nothing more
than accounting. It should be noted carefully here that PCI-e, gigE/10GE, and Fibre Channel are all full
duplex transports, so they are capable of simultaneous transmit and receive data transfers. As such, the total
bandwidth in this system is 120Gbps, which exceeds the memory bandwidth significantly, as you'll see.
Skills and competencies: planning for server scaling
Digging deeper than I/O channels and configuration of HBAs, HCAs, and network adapters to best use that
bandwidth, you must also look at memory bandwidth, latency and processor scaling. Memory bandwidth can
become a bottleneck, because messages and I/Os are often stored, processed, and forwarded so that memory
bandwidth becomes a critical scaling parameter. In the example with the System x3650 server, according to
the User's Guide, it supports 12 fully buffered PC2-5300 DIMMs, which is DDR2-667 with 6 ns cycle time,
supports a 333MHz I/O bus, and is capable of 667 million data transfers per second, or 5.333GB/sec. This is
essentially 53.33Gbps and equal to approximately the half-duplex I/O capability of 60Gbps. Clearly, with this
system and careful planning, you're likely to use all the memory capability. Ideally, it will be the bottleneck,
assuming you keep I/O channels near saturation at half duplex, keep code mostly running out of cache, and
4 of 8 8/22/2009 6:52 PM
mostly DMA directly into and out of memory-mapped kernel buffers.
Tools and techniques: don't leave bandwidth, processor, or memory on the table
IBM offers several tools and documents for sizing their BladeCenter systems and System x servers (see
Resources). Measuring actual memory bandwidth as well as consulting the specification is useful, and tools
for this as well as catalogues of measurement's can be found on Dr. Bandwidth's Web page (see Resources).
Processor scaling is best measured by benchmarking the most complex algorithms that will operate on data
going in and coming out of memory at line rate and based on the number of threads or asynchronous I/O and
processing contexts that the services must provide for clients. There really is no substitute for running at least
the core algorithms on the proposed SMP node to estimate processor requirements. Many tools exist for
analyzing the results, including profiling tools like VTune and basic processor loading monitor tools (see
Resources).
Cluster scaling
As shown in Figure 2, sufficient cluster bandwidth to and from each SMP node is required for message
passing and synchronization in parallel computations as well as parallel file systems like the IBM general
parallel file system (see Resources). One of the key decisions is the cluster file system that you'll use. The
IBM white paper "An Introduction to GPFS Version 3.2," provides an excellent overview of both SAN
clusters and NAS head/gateway client clustered system configurations. Clusters are built for numerous
reasons, including:
Compute scaling: Breaking algorithms up into subparts, computing intermediate results, and merging
results from numerous SMP nodes through message passing and distributed synchronization
mechanisms
High availability: Replication of NAS file services for clients to ensure access to stored data despite
potential server downtime
I/O scaling: For I/O-intensive applications that interface with SAN RAID to increase I/O bandwidth
Client service scaling: To simply handle more concurrent client service requests from one cluster
The skills, competencies, tools, and techniques for cluster scaling go beyond the scope of this article, as does
grid scaling and cloud computing center scaling. However, good practices at the SMP compute node will
provide good staging for these higher order scaling architectures. You can find numerous resources to assist
with cluster scaling in Resources after you've identified the goals for cluster scaling enumerated previously.
Grid and cloud scaling
As shown in Figure 1, clusters provide I/O, processor, and storage scaling but generally don't prescribe
management scaling methods, client-side management methods, or security. Scaling the full infrastructure,
including many services that may be hosted on multiple clusters or SMP servers and clients that use these
services, is the domain of grid computing. Grid computing is concerned with:
Resource virtualization: For storage, networks, and processors through virtual disk arrays, network
interface multipath management, and virtual machines
User interface portals: Including definitions for secure Web access such as Web Services Description
Language (WSDL) and Simple Object Access Protocol (SOAP)
System management: Including provisioning and autonomic features for management of IT assets
Complete coverage of grid scaling is not possible in this article, but the IBM Research Journal provides
5 of 8 8/22/2009 6:52 PM
in-depth studies, and many grid tools are available from IBM as well (see Resources). Likewise, cloud
computing, which is a relatively new but rapidly growing architecture is beyond the scope of this paper.
However, the basic concept of cloud computing is to provide HaaS and SaaS, which is enabled by building
well-designed SMP and cluster servers in grid architectures that make generally useful applications available
to users over the Web. For example, everything from shared calendars, code version control and
management, e-mail, and many environments for social networking have come out of Web-enabled
applications. This trend is growing and starting to include even HPC applications.
Green factors
The cost of scaling is not only the capital expenditure to add new processing capability, I/O channels,
memory, storage, or networkingbut also the cost to power, cool, host, and manage these new resources.
Although grid architecture and autonomic computing can help with management scaling, the green factor of
subsystems and components is critical for keeping operation expenditures lower. Several trends are helping,
including lower power storage using either solid state disk (SSD) flash drives and small form factor disk
arrays.
Likewise, the relentless pursuit of higher clock rate processors has given way to more cores with better SMP
scaling designs and methods for clustering nodes. Both are helping keep costs down. Most often, customers
design based on initial expense, such as cost per Gb, rather than power and performance cost measures,
which become more significant as systems are scaled up and over long-term operations. In addition,
management and TCO are important considerations, along with green factors.
Conclusion
Scaling is a planning exercise that requires estimation of future needs, budgets, and trade-offs between initial
cost and long-term operational costs. Most systems don't have HPC requirements that are difficult to meet,
rather simple client and service scaling needs. With some careful analysis and the selection of base SMP
nodes and cluster design, you can scale these systems effectively and with minimal waste if resources are
balanced initially and kept in balance. Don't overlook the scaling of management, either, and grid computing
provides some great resources for consideration to prevent IT burden from scaling up too much with systems.
Resources
Learn
See the IBM High Performance On Demand Solutions site for the latest on cloud computing and SOA
scaling.
The IBM Research Journal on grid computing provides an excellent, in-depth review of what grid can
do for scaling management and user access to large-scale computing resources.
See the IBM HPC Cluster solutions site for guidance on cluster solutions.
See the IBM white paper An Introduction to GPFS Version 3.2 site for an overview of SAN clusters
and NAS head/gateway client-clustered system configurations.
For tools and information on compute node sizing for SMP servers or clusters, see the Configuration
6 of 8 8/22/2009 6:52 PM
tools page at IBM.
See the IBM System x3650 User's Guide
The IBM General Parallel File System, or GPFS, runs on both the Linux and IBM AIX SMP
operating systems for cluster and grid scaling.
See Part 2 of this series for a more detailed overview of tools you can use to benchmark systems to
verify specifications and to find bottlenecks.
Check out the fourth installment in the "Big Iron" series, "Power, cooling, and performance: Find the
right balance" (developerWorks, Sam Siewert, 17 May 2005), for more information about power
consumption and related topics.
Read the developerWorks series, Cloud computing with Amazon Web services by Prabhakar Chaganti
for more information on cloud computing.
Performance Tuning for Linux Servers by Sandra K. Johnson, Gerrit Huizenga, and Badari Pulavarty
(IBM Press, 2005) provides a great in-depth look at performance analysis, tuning methods, and
workload generators for Linux.
Browse the technology bookstore for books on these and other technical topics.
Get the RSS feed for this series.
Get products and technologies
Download IBM product evaluation versions and get your hands on application development tools and
middleware products from DB2, Lotus, Rational, Tivoli, and WebSphere.
Check out "System x and BladeCenter system grid computing solutions from IBM.
For memory bandwidth measurements and tools, see Dr. Bandwidth's web page.
Check out Raptor 10GE/gigE switches, which employ 10GE customized layer-2 protocol to replace
trunking and provide simple solutions for using dark fiber to expand clusters over large distances.
Discuss
Check out developerWorks blogs and get involved in the developerWorks community.
About the author
7 of 8 8/22/2009 6:52 PM
Dr. Sam Siewert is a systems and software architect who has worked in the aerospace, telecommunications,
digital cable, and storage industries. He also teaches at the University of Colorado at Boulder in the
Embedded Systems Certification Program, which he co-founded in 2000. His research interests include
high-performance computing, broadband networks, real-time media, distance learning environments, and
embedded real-time systems.
Trademarks | My developerWorks terms and conditions
8 of 8 8/22/2009 6:52 PM

Infrastructure Architecturing

Uploaded by

Copyright:

Available Formats

Infrastructure Architecturing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Infrastructure Architecturing

Uploaded by

Copyright:

Available Formats

Infrastructure architecture essentials, Part 3:

System design methods for scaling

You might also like