System Models For Distributed and Cloud Computing
System Models For Distributed and Cloud Computing
System Models For Distributed and Cloud Computing
Cluster Architecture
shows the architecture of a typical server cluster built around a low-latency, high
bandwidth interconnection network. This network can be as simple as a SAN (e.g.,
Myrinet) or a LAN (e.g., Ethernet). To build a larger cluster with more nodes, the
interconnection network can be built with multiple levels of Gigabit Ethernet, Myrinet,
or InfiniBand switches.
Through hierarchical construction using a SAN, LAN, or WAN, one can build scalable
clusters with an increasing number of nodes. The cluster is connected to the Internet
via a virtual private network (VPN) gateway. The gateway IP address locates the cluster.
The system image of a computer is decided by the way the OS manages the shared
cluster resources. Most clusters have loosely coupled node computers. All resources
of a server node are managed by their own OS. Thus, most clusters have multiple
system images as a result of having many autonomous nodes under different OS
control.
Clusters exploring massive parallelism are commonly known as MPPs. Almost all HPC
clusters in the Top 500 list are also MPPs. The building blocks are computer nodes (PCs,
workstations, servers, or SMP), special communication software such as PVM or MPI,
and a network interface card in each computer node.
Most clusters run under the Linux OS. The computer nodes are interconnected by a
high-bandwidth network (such as Gigabit Ethernet, Myrinet, InfiniBand, etc.).
Major Cluster Design Issues
Without this middleware, cluster nodes cannot work together effectively to achieve
cooperative computing.
The software environments and applications must rely on the middleware to achieve
high performance. The cluster benefits come from scalable performance, efficient
message passing, high system availability, seamless fault tolerance, and cluster-wide
job management.
Energy Efficiency in Distributed Computing
Primary performance goals in conventional parallel and distributed computing systems are
high performance and high throughput, considering some form of performance reliability
(e.g., fault tolerance and security). However, these systems recently encountered new
challenging issues including energy efficiency, and workload and resource outsourcing.
This section reviews energy consumption issues in servers and HPC systems, an area
known as distributed power management (DPM).
To run a server farm (data center) a company has to spend a huge amount of money for
hardware, software, operational support, and energy every year. Therefore, companies
should thoroughly identify whether their installed server farm (more specifically, the
volume of provisioned resources) is at an appropriate level, particularly in terms of
utilization. It was estimated in the past that, on average, one-sixth (15 percent) of the full-
time servers in a company are left powered on without being actively used (i.e., they are
idling) on a daily basis.
This indicates that with 44 million servers in the world, around 4.7 million servers are not
doing any useful work.
Reducing Energy in Active Servers