Cluster Load Balancing and Failover
Cluster Load Balancing and Failover
Cluster Load Balancing and Failover
balancing
To move databases within the domain, both the LocalDomainServers group and the Change
Administrator who committed the plan must have Create Replica and Create Database rights.
1. From the Domino Administrator, click the Configuration tab, and open the Server view.
4. Under server access, add LocalDomainServers and any users with the Change Admin role
to these fields:
Activity Trends
Domino server resource utilization can be separated into two types, system activity and user
activity. System activity, which includes the level of processor, disk, memory, and network
consumption that Domino generates to keep the server running, is a fixed amount of activity,
as long as systems are healthy and performing smoothly. Domino servers typically use a
modest percentage of their resources to run. The remaining server capacity is used to support
user activity, which varies with the usefulness of the data on the server.
Using Activity Logging servers account for their time precisely, recording user activity by
person, database, and access protocol. When summarized and averaged, or trended over time,
activity logging of trended statistics provides a way to measure and compare workloads across
servers. You can use this information to identify the most active users and databases on each
server. Using the Domino Change Manager, you can automate the creation and execution of
workload redistribution plans to load a new server, decommission an old one, or balance
workloads across unevenly burdened servers
Activity Trends is part of the IBM Tivoli Analyzer for Lotus Domino, a separate product offering
from Tivoli Systems. The Activity Trends Collector is a Domino server add-in task that records
and reports statistics about database activity on a server. Information is stored in the Activity
Trends database (ACTIVITY.NSF).
The IBM Tivoli Analyzer for Lotus Domino uses the collected data to determine the load on the
server. Then, using resource-balancing functionality, the Analyzer applies trends analysis and
statistics to intelligent algorithms that can provide computer-aided load balancing on a set of
servers or simplify the server decommissioning process.
Integrated with the IBM Tivoli Analyzer for Lotus Domino, the Domino Change Manager
provides workflow capability that creates resource-balancing plans and implements database
moves, using the Tivoli Analyzer tools and analysis. The Domino Change Control database
(DOMCHANGE.NSF) and Domino Change Manager are part of the Domino server core
functionality.
Activity trends charting -- You can chart a selected group of statistics for a single server
or a group of servers.
Resource balancing -- Analyzes server resource use and creates recommendations for
balancing the servers based on specified resource goals.
In this example, two Domino servers, Mail1 and Mail3, route messages from the Acme
organization destined for other Internet domains (external addresses) and receive mail
addressed to the Acme Internet domain (acme.com). Mail1 and Mail3 have the field "SMTP
used when sending messages outside of the local Internet domain" enabled on the
Router/SMTP-Basics tab of the Configuration Settings document that applies to the servers and
have the SMTP listener task enabled on the Basics tab of their Server documents.
If a user on the Acme internal mail server Mail2 sends a message to an external address -- one
with a domain other than acme.com -- the server routes the message to Mail1, which can route
mail to external domains. If a user on the Acme internal mail server Mail4 sends a message to
an external address -- one with a domain other than acme.com -- the server routes the
message to Mail3, which can route mail to external domains. This splits the load of outbound
messages -- half route to Mail1 and half route to Mail3.
Any mail from an external Internet domain -- one other than acme.com -- is routed to either
Mail1 or Mail3. The external DNS has two MX records for the acme.com domain, one for Mail1
and one for Mail3. When an Internet mail server tries to connect to the acme.com domain to
transfer a message, it looks up acme.com in the DNS. The server finds the MX records for
acme.com and, based on the record preferences of the MX records, returns the IP address of
either Mail1 or Mail3. If the MX records have equal weight, the server randomly selects one of
the records and returns the IP address of that record's server. Should that server be
unavailable, the other MX record is selected and the IP address of the other server is returned.
This provides load balancing through the random selection of the MX records when record
preferences are equal and provides failover since the DNS shifts to another MX record when a
connection fails. Once the mail reaches Mail1 or Mail3, that server routes the message to its
destination.
The internal mail servers can route Internet mail to the server with SMTP enabled for external
mail either via Notes routing, with a Foreign SMTP Domain document and SMTP Connection
document linking to the SMTP server, or via SMTP routing, with the SMTP server configured as
the relay host.
Enabling "SMTP used when sending messages outside of the local Internet domain"
for Mail1 and Mail3.
Setting up DNS correctly to include MX records for Mail1 and Mail3, indicating to
external SMTP systems that these are the hosts that receive inbound mail for the
acme.com domain.
Either enabling "SMTP allowed outside of the local Internet domain" for the internal
mail servers, Mail2 and Mail4, and listing Mail1 or Mail3 as the relay host, or creating
a Foreign SMTP Domain document and SMTP Connection document that define the
route to Mail1 or Mail3.
See Also
Sample mail routing configurations
The Domain Name System (DNS) is a directory used by SMTP to convert a name, such as
acme.com, to a list of servers that can receive connections for that name and to find the IP
address of a specific server. By looking up a destination server's address in the DNS, the
sending server can properly route a message to a recipient. DNS uses two kinds of records:
Mail Exchanger (MX) records and A records. An MX record maps a domain name to the names
of one or more mail hosts. An A record maps a host name to the IP address of a server.
Mail servers also use other DNS records. For example, servers that receive Internet mail
perform a reverse lookup to a DNS PTR record to determine the host name for a given IP
address. Reverse lookups are useful in verifying the source of a message, an important tool for
restricting relay access through your server or preventing unsolicited commercial e-mail (UCE).
You must correctly configure DNS to support your use of SMTP. To determine the IP address of
the mail server for the destination domain, Domino does the following:
1. The server looks up the domain part of each recipient's address in DNS.
2. If DNS finds an MX record, the server tries to connect to the server listed in that MX
record. If there is more than one MX record, the server tries to connect to the record that
has the lowest cost. If more than one MX record has the lowest cost, the server randomly
selects one and tries to connect to the server listed in that MX record.
Note There may be more than one MX record for a specific domain name. The host
name is looked up in DNS to find an A record. An A record contains the IP address for the
host.
3. If DNS finds only an A record, Domino routes the message to the IP address in that A
record.
4. If DNS does not find a record, Domino cannot deliver the message and sends a
nondelivery message to the sender.
An MX record maps a domain name to one or more host names. An A record maps a host name
to the IP address of a server. You may want to use a host name in the MX record instead of just
an A record for the following reasons:
If you replace or relocate a machine, you can assign the existing host name and IP
address to the new or relocated machine. This change is transparent to users, and
messages continue to route properly.
You can use DNS to provide failover and load-balancing for your mail servers by creating
multiple MX records for a domain name on the DNS server. When you set more than one MX
record for a name, you can set preference values to control how DNS selects those records.
DNS selects lower value preferences first -- for example, DNS selects 5 before 10. If more than
one MX record has the same preference value, DNS randomly selects from among those MX
records. If one of those MX records fails -- for example, because a server is unavailable -- DNS
caches that failure and tries other MX records of equal weight, followed by less-preferred MX
records.
When a server tries to connect to acme.com, the DNS first uses MX records with preferences of
5. If there are two MX records with preferences of 5, DNS randomly selects between the MX
record for mail1.acme.com or mail2.acme.com. If the DNS returns the MX record for
mail1.acme.com and mail1.acme.com is unavailable, the DNS returns the MX record for
mail2.acme.com. If mail2.acme.com is unavailable, both MX records with a cost of 5 have
failed. The DNS then selects MX records that have a cost of 10 and uses them the same way it
used the MX records that have a cost of 5.
See Also
You can specify how the Router determines the IP address(es) for destination SMTP systems
(for example, the Internet). Known as address resolution, the method you select determines
how the Router performs domain-name-to-IP-address translation.
If you configure TCP/IP to use the Domain Name System (DNS), select Dynamic mapping only
or Dynamic then local. For Dynamic mapping only, the Router queries a DNS server to map a
fully qualified host name to an IP address.
For Dynamic then local, the Router first queries the DNS and then checks a file on your local
drive. This file, known as a hosts file, maps destination host names to IP addresses. The
Dynamic then local option can be useful if you need to connect to internal hosts that are not
listed in the DNS.
If you configure TCP/IP to use local hosts lookup, select "Local lookup only." If you use this
option, the IP address and fully qualified host name for each destination must exist in the hosts
file. This option requires more administrative attention than the Dynamic mapping only option
because you need to maintain the file.
If the DNS does not list a destination host name, the Router designates the message as non-
deliverable. If the DNS is unavailable, the Router retries delivery up to the configured number
of times as indicated in the Initial transfer retry field on the Configuration Settings document.
To set how host names are looked up
1. Make sure you already have a Configuration Settings document for the server(s) to be
configured.
2. From the Domino Administrator, click the Configuration tab and then expand the
Messaging section.
3. Choose Configurations.
4. Select the Configuration Settings document to be edited and then click Edit
Configuration.
6. On the Basics tab, complete this field, and then save the document:
Field Enter
Host name Choose one:
lookup
Dynamic lookup only (DNS only) - The Router determines
the IP address for a host by looking it up in DNS. SMTP
transfer can occur only if the destination host is listed in
DNS.
Local lookup only (host files only) - The Router
determines the IP address for a host by looking it up in a
hosts file on the local machine.
Dynamic then local - (default) The Router determines the
IP address for a host by looking it up in DNS first and
then checking the local hosts file if no DNS entry exists.
7. The change takes effect after the next Router configuration update. To put the new
setting into effect immediately, reload the routing configuration.
After distributing databases and mail servers to balance the workload, you should track the
cluster events and statistics to be sure that the workload is acceptable and that failover occurs
as expected. If the statistics show a problem, you may have to make some adjustments.
When Domino fails over to balance the workload, the event may look like this in the Domino
server log file:
08/23/2002 11:08:48 AM Load balancing off of Sales/Acme!!Customer.nsf for
replica ID 852560C9:007232D, directing open to Sales2/Acme
You can view these events in the log file. Do one of the following.
From the Domino Administrator or the Web Administrator
1. In the Server pane, expand All Servers or expand Clusters.
2. Select the server that stores the log file you want to view.
1. In the Server pane of the Domino Administrator or the Web Administrator, expand All
Servers or expand Clusters.
4. In the Task pane, expand Monitoring Results, and then expand Statistics Reports.
5. Click Clusters.
6. In the Results pane, open the document you want, and look in the "Server cluster
statistics" section of the document.
Note If you prefer, you can view these reports directly in the Monitoring Results database
(STATREP.NSF). Open the database, expand Statistics Reports, and then click Clusters.
Viewing a list of Cluster Manager statistics
You can view a list of Cluster Manager statistics from the Domino Administrator, the Web
Administrator, or the server console.
Note To see the availability index, the availability threshold, and the expansion factor of
the current server, look in the Server section of the statistics, not the Server - Cluster
section.
The Cluster Manager statistics begin with "Server.Cluster." They give you information about
failover, workload balancing, and the state of the servers in the cluster. Among other things,
the statistics tell you how often the Cluster Manager attempted failover and workload
balancing, and how many of these attempts were successful.
Note To see the availability index, the availability threshold, and the expansion factor of the
current server, send the Domino command show stat server from the server console.
As the number of Domino servers on your network increases, so does the amount of replication
required to distribute information across the network. Because replication uses memory and
processing time, plan how servers connect to perform replication. If you allow servers to
replicate at random, so that a given server replicates a single database with multiple servers,
or perhaps replicates different databases with different servers, servers can become so
overloaded with replication requests that it interferes with their ability to respond to client
requests.
To provide for efficient replication, consider setting up some servers as dedicated replication
servers. Using dedicated servers to handle replication greatly reduces the amount of work that
database servers have to devote to replication, because the database servers have to replicate
with the replication servers only, instead of having to replicate with every server that
maintains a copy of a given database. To control replication, you create Connection documents
that specify which servers to replicate with and when.
How you connect servers for replication depends on many factors, including the layout of
physical network and the size of your organization, as well as the extent to which you want to
re-use existing Connection documents created for mail routing. There are several different
configurations, or topologies, you can use to control how replication occurs between servers:
Hub-and-spoke
Peer-to-peer
Ring
Choose the replication strategy that provides the most efficient replication performance. In
many cases, you'll use different topologies in different parts of the network.
Using a hub-and-spoke topology to manage replication
A hub-and-spoke topology is generally the most common and efficient replication topology in
larger organizations, because it minimizes network traffic. Hub-and-spoke replication
establishes one central server as the hub, which schedules and initiates all replication with all
of the other servers, or spokes. The spokes update the hub server by replication (and mail
routing), and the hub in turn updates each spoke. Hub servers replicate with each other or with
master hub servers in organizations that use more than one hub. In short, the hub server acts
as the traffic manager of the system, overseeing system resources, ensuring that replication
takes place with each spoke in an orderly way, and guaranteeing that all changes are
replicated to all spoke servers.
To set up replication in a hub-and-spoke system, you create one Connection document for each
hub-and-spoke connection. To ensure that the replication task on the hub, rather than the
spokes, assumes most of the work always, in each Connection document specify the hub
server as the source server, the spoke server as the destination server, and pull-push as the
replication method.
The major drawback of hub-and-spoke topology is that it is vulnerable to single point of failure
if the hub is not working. Deploying a backup server that replicates the hub and can quickly be
reconfigured into a hub server if the primary hub goes down can alleviate this shortcoming.
3. Centralize administration of the Domino Directory, standardize database ACLs, and limit
access to the hub. You can designate the hub with Manager access and the spokes with
Reader access so that you make those changes on one replica on the hub to synchronize
the spokes.
4. Designate hubs by role -- for example, replication hubs and mail hubs.
5. Place server programs such as message transfer agents on hubs to make them easily
accessible.
8. Centralize data backup at the hub. By backing up databases on the hub only, you
conserve resources on spoke servers.
9. Improve server load balancing. However, network traffic increases on the hub LAN
segment. If you have more than 25 servers per hub, establish tiers of hubs. If a hub goes
down, replication for that hub and its spokes is disabled until the hub is repaired or
replaced.
Note Do not use hub-and-spoke replication for databases larger than 100MB that have
replicas on less than four servers. Instead, schedule replication for these databases to occur
separately from other replications.
Using a peer-to-peer topology to manage replication
In a peer-to-peer topology, replication is less centralized than in a hub-and-spoke
configuration, with every server being connected to every other server. Because peer-to-peer
replication quickly disseminates changes to all servers, it is often the best choice for use in
small organizations, or for sharing databases locally among a few servers. However, it can be
inefficient when a database resides on more than a few servers.
In a peer-to-peer topology, the potential for replication problems decreases, because only two
servers communicate for each replication and no hub or intermediary servers are involved.
However, peer-to-peer replication requires many Connection documents, increases
administration since you must avoid overlap in replication schedules, and prevents you from
standardizing ACL requirements.
Other topology strategies
Another method of managing replication is to use Cluster replication. This ensures constant
access to data, because data on one server is duplicated on one or more cluster mates. If the
primary server becomes unavailable, data can be obtained from other servers in the cluster.
For more information on setting up clusters, see the topic Setting up a cluster.
End-to-end - Also known as a chain topology, connects two or more servers in a chain.
Information travels in one direction along the chain and then travels back in the other
direction. End-to-end replication is less efficient than ring replication but is useful in
situations where information needs to travel in only one direction.
Ring - Similar to an end-to-end topology, but connects servers in a circle so that
replication occurs within a closed loop. Ring replication can be useful in a large
organization for replicating information between hub servers.
Binary tree - Connects servers in a pyramid fashion: the top server connects to two
servers below, each of which connects to two servers below, and so on. Information
travels down the pyramid and then back up.
Using existing mail routing connections for replication
As you plan for replication, consider re-using the connections you may have already set up for
Notes mail routing. If you previously created a Connection document for mail routing, you can
easily enable the replication task on that document.
Unlike mail routing, which works in one direction and requires a pair of Connection documents
to enable two-way routing, replication between servers works in both directions, and requires
only one Connection document between each pair of servers. Because the server that initiates
replication takes on the larger share of the replication workload, if decide to add replication to
one of the Connection documents already used for mail routing between two servers, add the
replication task to the document on the more powerful server in the pair.
You can view cluster replication events that the Cluster Replicator generates. Do one of the
following.
From the Domino Administrator or the Web Administrator
1. In the Server pane, expand All Servers or expand Clusters.
2. Select the server that stores the log file you want to view.
4. In the Task pane, expand Notes Log, and then click Replication Events.
5. In the Results pane, open the replication document you want to view.
From the Domino server log file
1. Open the Domino server log file.
Sales/Acme
Events
You can also run Log Analysis to gather all of the replication events into a database.
1. In the Server pane of the Domino Administrator or the Web Administrator, expand All
Servers or expand Clusters.
4. In the Task pane, expand Monitoring Results, and then expand Statistics Reports.
5. Click Clusters.
6. In the Results pane, open the document you want, and then look in the "Replica cluster
statistics" section of the document.
Note If you prefer, you can view these reports directly in the Monitoring Results database
(STATREP.NSF). Open the database, expand Statistics Reports, and then click Clusters.
The cluster replication statistics begin with "Replica.Cluster." They give you information about
cluster replication events, such as the number of documents updated, the number of times the
Cluster Replicator retried pending replication, and the number of bytes received during cluster
replication.
For an explanation of all the cluster replicator statistics, see Cluster Statistics.
See Also
Viewing Cluster Manager events and statistics
Monitoring all the servers in a cluster at the same time
Monitoring a cluster
You can use the Domino server monitor to monitor all the servers in a cluster at once. You can
decide what information you want to monitor and how to display that information. You can
monitor the cluster while you monitor other Domino servers. To tell Domino which servers to
monitor and what information to monitor on each server, you create or customize a monitoring
profile.
When you start the server monitor, the Statistic Collector task starts, if it is not already
running.
Note The Domino server monitor and monitoring profiles are not available in the Web
Administrator.
To start the server monitor manually
1. From the Domino Administrator, click the Server - Monitoring tab.
2. In the "Monitoring profiles" field, select the profile for the cluster you want to monitor. By
default, Domino creates a profile for each cluster in the domains you are monitoring.
2. Click Monitoring.
4. Make any other changes you want, and then click OK.
Customizing a monitoring profile for a cluster
You can create new profiles and edit existing profiles to customize the tasks and statistics that
Domino displays.
Selecting a profile initializes the server monitor if it is not already initialized. You cannot
make changes to a profile until the server monitor is initialized.
3. To add one or more tasks to monitor, choose Monitoring - Monitor New Task, select the
tasks you want to add, and then click OK.
For clustering, it can be useful to monitor the Cluster Database Directory Manager and
the Cluster Replicator.
4. To add one or more statistics to monitor, choose Monitoring - Monitor New Statistic, do
the following in the "Add Statistic(s) to this profile" dialog box, and then click OK.
Expand Replica - Cluster, and then select the statistics you want to monitor for
cluster replication.
There are many statistics that are helpful, but SecondsOnQueue and
WorkQueueDepth are particularly helpful in determining whether you need to
increase the number of Cluster Replicators you are running on the server.
Expand Server - Cluster, and then select the other cluster statistics you want to
monitor.
If Availability Index and Availability Threshold are not already included in your profile,
it is helpful to monitor those. It is also helpful to monitor OpenRedirects - Failover and
OpenRedirects - LoadBalance, as well as OpenRequest - LoadBalanced and
OpenRequest - ClusterBusy to track how often failover occurs.
5. (Optional) To add a server to the profile, select Monitoring - Monitor New Server, and then
select the server from the list; or drag a server from the Server pane to the server
monitor.
6. (Optional) To remove a server from the profile, click the name of the server you want to
remove, and then select Monitoring - Remove Server.
To save this profile as a new profile while also preserving the original profile, choose
Monitoring - Profiles - Save As, and then enter a name for the profile.
To have this modified profile replace the original profile, you do not have to do
anything. The profile is saved automatically when you close the Domino
Administrator.
For more information about monitoring Domino servers, see the topic Monitoring.
See Also
Viewing Cluster Manager events and statistics
Viewing cluster replication events and statistics
Monitoring a cluster
Monitoring a cluster
Domino provides several ways to find out what is happening in a cluster and make adjustments
to keep the cluster running smoothly and efficiently, so that no server is overloaded. When
running as part of a cluster, a Domino server constantly monitors its workload, the workload of
the other servers in the cluster, and the availability of databases throughout the cluster. In
addition, Domino monitors statistics and events that are relevant to a cluster.
There are many ways to view this information. For example, you can view it from the server
console or in the log file or in the Statistics pane in the Domino Administrator. In addition, you
can collect statistic reports in the Monitoring Results database and then use the Domino
Administrator to look at the statistic reports.
When using the ICM, failover and workload balancing work the same as in standard Domino
clusters. Domino computes the server availability index based on all open sessions, whether
they are from Notes clients, HTTP clients, or other Domino services. To configure workload
balancing and failover, you use the same settings, such as Server_Restricted,
Server_Availability_Threshold, and Server_MaxUsers. For database availability, you also use the
same settings, such as marking a database out of service or pending delete.
Note Unlike in a standard Domino cluster, the ICM can direct a client to a server that is in the
MAXUSERS state, if no other server is available.
The ICM maintains the following information so that it can find a replica when a client asks for
one:
Information about which databases are available in the cluster and where they are
stored. The ICM obtains this information from the Cluster Database Directory.
Information about the availability of each server. The ICM obtains this information each
time it probes the servers in the cluster.
Information about which Web servers are configured for HTTP and which are configured
for HTTPS. The ICM obtains this information from the Server documents of each server in
the cluster.
To determine which replica of a database to open, the ICM does the following:
Determines where replicas reside and whether they are marked out of service or pending
delete.
Checks the server availability index of each server that contains a replica.
Checks the availability of the server by pinging the HTTP port or the HTTPS port,
depending on the client request.
Selects a server from those remaining. If there are no servers remaining, the ICM chooses
a server that is BUSY or in the MAXUSERS state, if one is available. If there are multiple
servers remaining, the ICM chooses the server with the lightest current workload.
After choosing the server to access, the ICM looks at the Server document to determine which
port to use to access the server.
Click the Back button in the browser one or more times to connect to a page through the
ICM
Use a bookmark
The user may or may not have to reauthenticate with the new server. This is determined by the
following factors:
If the user already authenticated with the new server during this session, no
authentication is necessary
If the HTTP client and the server both support SSL3, reauthentication occurs
automatically
See Also
How the Internet Cluster Manager works
1. From the Domino Administrator or the Web Administrator, click the Configuration tab.
2. In the Task pane, expand Messaging.
3. Click Configurations.
From the Domino Administrator, select the Configuration document for the server or
server group you want, and click Edit Configuration
From the Web Administrator, open the Configuration document for the server or
server group you want, and click Edit Server Configuration.
If you do not have a Configuration document for the server or server group you want,
create one by clicking Add Configuration.
Disabled
Note This setting affects delivery to a client but does not affect sending a message from a
client when the mail server is unavailable. If a user sends a message when the mail server is
unavailable, the delivery fails over to another server in the cluster, and the router on that
server sends the message.
To set up shared mail in a cluster and have replicated messages stored in the shared mail
database, you use the same procedure you use for setting up shared mail with replicas that are
not in a cluster. This procedure includes the Load Object Set - Always command. You do this on
every server that uses shared mail in the cluster.
See Also
Mail failover in a cluster
Setting up a cluster