Windows Cluster Service Troubleshooting and Maintenance
Windows Cluster Service Troubleshooting and Maintenance
Windows Cluster Service Troubleshooting and Maintenance
Written by Martin Lucas, Microsoft Alliance Support Published on July 14, 2003 Abstract This white paper discusses troubleshooting and maintenance techniques for the Cluster service that is included with Microsoft Windows 2000 Advanced Server and Microsoft Windows 2000 Datacenter Server. Because cluster configurations vary, this document discusses techniques generally. Many of these techniques can be applied to different configurations and conditions.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2003 Microsoft Corporation. All rights reserved. Active Directory, Microsoft, Windows, Windows Server, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
CONTENTS
WINDOWS CLUSTER SERVICE TROUBLESHOOTING AND MAINTENANCE................1 MICROSOFT PRODUCT SUPPORT SERVICES WHITE PAPER.....................................1 CONTENTS..................................................................................................................3 INTRODUCTION...........................................................................................................8 Server Clustering 8 CHAPTER 1: PRE-INSTALLATION....................................................................................................9 Cluster Hardware Compatibility List 9 Configuring the Hardware 9 Installing the Operating System 11 Configuring Network Adapters 11 Name Resolution Settings 12 Configuring the Shared Storage 13 Configuring SCSI Host Adapters 13 SCSI Cables 13 SCSI Termination 14 Fibre Channel Storage 14 Drives, Partitions, and File Systems 15 CD-ROM Drives and Tape Drives 15 Pre-Installation Checklist 15 Installation on Systems with Custom Disk Hardware 17 CHAPTER 2: INSTALLATION PROBLEMS.......................................................................................18 Installation Problems with the First Node 18 Is the Hardware Compatible? 18 Is the Shared SCSI Bus Connected and Configured Correctly? 18 Does the Server Have a Correctly Sized System Paging File? 18 Do All Servers Belong to the Same Domain? 19 Is the Primary Domain Controller Accessible? 19 Are You Installing While Logged On as an Administrator? 19 Do the Drives on the Shared Bus Appear to Function Correctly? 19 Are Any Errors Listed in the Event Log? 19 Is the Network Configured and Functioning Correctly? 19 Problems Configuring Other Nodes 20 Are You Specifying the Same Cluster Name to Join? 20 Is the RPC Service Running on Each System? 20 Can Each Node Communicate Over Configured Networks? 20 Are Both Nodes Connected to the Same Network or Subnet? 21
21
CHAPTER 3: POST-INSTALLATION PROBLEMS.............................................................................22 The Whole Cluster Is Down 22 One Node Is Down 22 Applying Service Packs and Hotfixes 22 One or More Servers Stop Responding 22 Cluster Service Does Not Start 23 Cluster Service Starts but Cluster Administrator Does Not Connect 24 Group/Resource Failover Problems 24 Physical Disk Resource Problems 25 File Share Does Not Go Online 25 Problems Accessing a Drive 25 Network Name Resource Does Not Go Online 26 CHAPTER 4: PROBLEMS ADMINISTERING THE CLUSTER............................................................27 Cannot Connect to the Cluster Through Cluster Administrator 27 Cluster Administrator Stops Responding on Failover 27 Cannot Move a Group 27 Cannot Delete a Group 28 Problems Adding, Deleting, or Moving Resources 28 Adding Resources 28 Using the Generic Resource for Non-Microsoft Applications 29 Deleting Resources 31 Moving Resources from One Group to Another 32 Chkdsk and Autochk 32 Failover and Failback 33 Failover 33 Failback 33 Move Group 33 Factors That Influence Failover Time 33 CHAPTER 5: SCSI STORAGE .........................................................................................................35 Verifying Configuration 35 Adding Devices to the Shared SCSI Bus 36 Verifying Cables and Termination 36 CHAPTER 6: FIBRE CHANNEL STORAGE.......................................................................................37 Verifying Configuration 37 Adding Devices 38 Testing Devices 38 Snapshot Volumes 39
CHAPTER 7: CLIENT CONNECTIVITY PROBLEMS.........................................................................40 Clients Have Intermittent Connectivity Based on Group Ownership 40 Clients Do Not Have Any Connectivity with the Cluster 41 Clients Have Problems Accessing Data Through a File Share 41 Clients Cannot Access Cluster Resources Immediately After You Change an IP Address 41 Clients Experience Intermittent Access 42 CHAPTER 8: PRINT SPOOLER PROBLEMS....................................................................................43 About Printer Drivers 43 Driver Problems 43 Point and Print 43 Driver Synchronization 44 Rolling Upgrade from Windows NT 4.0 to Windows 2000 45 Migration from Windows NT 4.0 (as Opposed to Upgrade) 45 CHAPTER 9: KERBEROS FEATURES AVAILABLE WITH WINDOWS 2000 SERVICE PACK 3............................................................................46 Publishing Printers to Active Directory 46 Effects on Microsoft Exchange 2000 Server or Microsoft SQL Server 46 Effects on File Shares 47 More Information About the Cluster Service and Kerberos 47 CHAPTER 10: DHCP AND WINS........................................................................................................48 WINS 48 DHCP 48 Jetpack 49 DHCP and WINS Reference Material 49 CHAPTER 11: FILE SHARES.............................................................................................................50 Basic File Shares 50 Dynamic File Shares (Share Subdirectories) 50 DFS Root 50 CHAPTER 12: CHKDSK AND NTFS...................................................................................................52 The Importance of Chkdsk 52 Delaying Chkdsk 52 Handling Corrupted Volumes 53 Duration of Chkdsk 53 Faster Chkdsk 54 Proactive Ways to Test a Volume 54
Chkdsk and the Cluster Service Chkdsk Performance: Windows NT 4.0 vs. Windows 2000 SAN Solutions with Data Snapshot Capability Things You Can Do to Increase Speed Related Microsoft Knowledge Base Articles
55 55 55 56 56
CHAPTER 13: MAINTENANCE..........................................................................................................58 Installing Service Packs 58 Service Packs and Interoperability Problems 58 Replacing Adapters 58 Shared Disk Subsystem Replacement 58 System Backups and Recovery 58 Administrative Suggestions 59 Use the Resource Description Field 59 Remove Empty Groups 59 Avoid Unnecessary or Redundant Dependencies 59 Creating Redundancy for Network Adapters 60 Check the Size of the Quorum Log File 60 Monitor Performance 60 Load Balance 61 Practices to Avoid on a Cluster 61 APPENDIX A: EVENT MESSAGES....................................................................................................63 Cluster Service Events 63 Related Event Messages 88 APPENDIX B: THE CLUSTER LOG FILE...........................................................................................93 CLUSTERLOG Environment Variable 93 Annotated Cluster Log 93 Version and Service Pack Information 93 Initialization 93 Determining the Node ID 94 Determining the Cluster Service Account That the Node Uses 94 Trying to Join First 94 No Join Sponsors Are Available 95 Reading from the Quorum Disk 96 Enumerating Drives 98 Reading from Quolog.log File 99 Checking for a Quorum Tombstone Entry 100 Checking the Time to Load a New CLUSDB 100 Enumerated Networks 102 Disabling Mixed Operation 103 Forming Cluster Membership 103
Identifying Resource Types Cluster Membership Limit Cluster Service Started Log File Entries for Common Failures Example 1: Disk Failure Example 2: Duplicate Cluster IP Address
APPENDIX C: COMMAND-LINE ADMINISTRATION........................................................................106 Using Cluster.exe 106 Basic Syntax 106 Cluster Commands 106 Node Commands 106 Group Commands 108 Resource Commands 109 Example of a Batch Job 111 INDEX......................................................................................................................114
INTRODUCTION
This white paper discusses troubleshooting and maintenance techniques for the Cluster service that is included with Microsoft Windows 2000 Advanced Server and Microsoft Windows 2000 Datacenter Server. Although there is great similarity between these versions and the first implementation of the service, Microsoft Cluster Service (MSCS) version 1.0, this document is specific to Windows 2000. (MSCS was included with Microsoft Windows NT 4.0.) MSCS and Windows 2000 Advanced Server support a maximum of two servers in a cluster. These are frequently referred to as nodes. Windows 2000 Datacenter Server supports up to four nodes. Because there are many different types of resources that can be managed in a cluster, it may be difficult sometimes for an administrator to determine what component or resource is causing failures. In many cases, the Cluster service can automatically detect and recover from server or application failures. However, sometimes you may have to troubleshoot attached resources, devices, or applications. This document is a based on the original cluster troubleshooting white paper, Microsoft Cluster Server Troubleshooting and Maintenance. That paper was specific to Windows NT Server, Enterprise Edition, version 4.0.
Server Clustering
Clustering is an old term in the computing industry. Many readers think that clustering is a complicated subject because early implementations were large, complex, and sometimes difficult to configure. These early clusters were difficult to maintain unless you had an extensively trained and experienced administrator. Microsoft extended the capabilities of the Windows NT Server operating system through the Enterprise Edition. Microsoft Windows NT Server, Enterprise Edition, includes MSCS. MSCS adds clustering capabilities to Windows NT to achieve high availability, easier manageability, and greater scalability through server clustering. Windows 2000 Advanced Server and Windows 2000 Datacenter Server also include these high-availability features through the Cluster service (ClusSvc). In these versions of Windows, the core functions and features of the Cluster service have not changed, although they include improvements and new features.
Although you can configure a cluster that uses only one network adapter in each server, Microsoft strongly recommends that you have a second isolated network for cluster communications. Validated cluster solutions contain at least one isolated network for cluster communications. This is referred to as the private interconnect. You can also configure the cluster to use the primary nonisolated network for cluster communications if the isolated network fails. The cluster nodes must communicate with each other on a time-critical basis. Communication between nodes is sometimes referred to as the heartbeat. It is important that the heartbeat packets are sent and received on schedule, so Microsoft recommends that you use only PCI-based network adapters, because the PCI bus
has the highest priority. Fault-tolerant network adapters are not supported on the private interconnect. During port failure recovery, fault-tolerant network adapters can delay heartbeat packets significantly and actually cause cluster node failure. For redundancy of the private interconnect, it is more effective to form a second isolated network to function if the primary private interconnect fails. Figure 1
Cluster storage is made up of a compatible PCI-based storage adapter in each server that is separate from local storage. Each server in the cluster is connected to the storage that is allocated specifically for cluster use. When you use SCSI technology for cluster storage, each cluster uses at least one SCSI adapter that is dedicated for use with the intended external cluster storage. Because both servers in the cluster connect to the same bus at the same time, one SCSI host adapter uses the default ID, 7, and the other adapter uses ID 6. This configuration makes sure that the host adapters have the highest priority on the bus. The bus is referred to as the shared SCSI bus, because both systems share connectivity on this bus but arbitrate (negotiate) for exclusive access to one or more attached disk devices. The Cluster service controls exclusive access to the disk devices through the SCSI reserve and release commands. Fibre Channel storage is frequently used as the storage medium for highly available clusters. Common implementations use Fibre Channel Arbitrated Loop (FC-AL) or Switched Fibre Channel Fabrics (FC-SW). Although the word fibre suggests that the technology uses fiber optic cables, the Fibre Channel specifications allow use of fiber optic or copper cable interconnects. Before you configure the Cluster service, see the documentation that is included with the validated cluster hardware for installation guidelines and instructions. Some configurations may require a different sequence of installation steps than the steps that are described in the Cluster service documentation.
10
11
Figure 2 In fact, because the private network is isolated, you can use any matching IP address combination that you want for this network. If you want, you can use addresses that the Internet Assigned Numbers Authority (IANA) designates for private use. The private use address ranges are noted in Figure 3.
Address class Class A Class B Class C Starting address 10.0.0.0 172.16.0.0 192.168.0.0 Ending address 10.255.255.255 172.31.255.255 192.68.255.255
Figure 3 The first and last addresses are designated as the network and broadcast addresses for the address range. For example, in the reserved Class C address, the actual range for host addresses is 192.168.0.1 through 192.68.255.254. Use 192.168.0.1 and 192.168.0.2 to keep it simple, because you will have only two adapters on this isolated network. Do not declare default gateway, Windows Internet Naming Service (WINS), or DNS server addresses for this network. You may want to talk to your network administrator about these addresses, in case any of the addresses may already be in use in your organization. When you have obtained the correct addresses for network adapters in each system, set these options appropriately for the adapters in the system. Use the Ping utility at a command prompt to check each network adapter for connectivity with the loopback address (127.0.0.1), the cards own IP address, and the IP address of another system. Before you try to install the Cluster service, make sure that the driver for each adapter loads correctly without errors and that each adapter communicates correctly on each network. For more information about network adapter configuration, see Windows Help, the Windows 2000 Resource Kits, or the Microsoft Knowledge Base.
12
use static mappings in WINS for cluster resources. Static mappings in a WINS database for cluster-related resources may cause cluster resource failures.
13
that uses a 5-megabit transfer rate can have a maximum total cable length of approximately 6 meters, and the maximum length decreases as the transfer rate increases. Most SCSI devices that are available today reach much higher transfer rates and demand a shorter total cable length. Some manufacturers of complete systems that are validated for the Cluster service may use differential SCSI with a maximum total cable length of 25 meters. Consider these implications when you add devices to an existing bus or a validated system. Sometime, you may have to install another shared SCSI bus. SCSI Termination Microsoft recommends that you use active terminators for each end of the shared SCSI bus. Passive terminators may not reliably maintain satisfactory termination under certain conditions. A SCSI bus has two ends and must have termination on each end. For best results, do not rely on automatic termination that is provided by host adapters or newer SCSI devices. It is best to avoid duplicate termination and to avoid putting termination in the middle of the bus. Fibre Channel Storage Fibre Channel storage typically requires device drivers to be loaded for the associated interface adapters in each server. With technologies that support redundant paths, you might have to use multiple-path software to achieve redundancy. Fibre Channel Arbitrated Loop solutions do not require that you set IDs, as you do with SCSI. Each device that is connected to the loop has its own identifier. When a device connects to the loop, it generates a loop initialization primitive (LIP) event. A LIP event is similar to a reset on a conventional bus. LIP events occur when a device attaches to the loop, restarts, or is otherwise removed from the loop. Too many LIP events may cause data loss or other instability. Whenever a device on the loop wants to communicate with another device, it first negotiates for access to the loop. While most FC-AL implementations support up to 126 devices, the more devices on the loop, the longer the path. Therefore, the latency of individual operations on the loop increases with the number of attached devices. Arbitrated loops are a good solution for a small number of hosts and devices. Switched Fibre Channel Fabric is similar to a routed technology because it uses Fibre Channel switches. With this technology, devices that want to communicate set up point-to-point connectivity. Some switches may allow you to define storage zones or to connect arbitrated loop systems to a larger switched storage area network. Zoning is a mechanism that is provided by switches to isolate host or storage ports. Ports or devices in a zone are only visible to devices in the specific zone and are not visible to other devices attached to the fabric. This feature helps protect access to data by allowing only specific hosts to access specific storage ports. Therefore, if two separate clusters, ClusterA and ClusterB, are attached to the same certified multiple-cluster storage area network through zoning, ClusterA cannot access ClusterBs storage on the same SAN.
14
Pre-Installation Checklist
Before you install the Cluster service, there are several items to check to help make sure of correct operation and configuration. After you correctly finish configuring and testing the hardware, most installations will complete without error. The following checklist is general. It may not include all possible system options that you might evaluate before installation: Use only validated ("certified") hardware as listed on the cluster HCL.
Determine what role these servers will have in the domain. Will each server be a domain controller or a member server? Install the operating system on each server.
15
Install the latest Windows service pack. Verify the cables and the termination of the shared storage.
Verify the drive letter assignment and the NTFS formatting of shared drives with only one server turned on at a time. If both systems have ever been permitted to access drives on the shared bus at the same time (without the Cluster service installed), the drives must be repartitioned and reformatted before the next installation. Failure to do so may result in unexpected file system corruption. Make sure that only physical disks or hardware RAID arrays are attached to the shared bus. Make sure that disks on the shared SCSI bus are not members of any software fault-tolerance sets. Check network connectivity with the primary network adapters on each system. Evaluate network connectivity on any secondary network adapters that will be used for private cluster communications. Make sure that the system and application event logs show no errors or warnings. Make sure that each server is a member of the same domain and that you have administrative rights to each server. Make sure that each server has a correctly sized paging file and that the paging files reside only on local disks. Do not store paging files on any drives that are attached to the shared bus. Determine what name you will use for the cluster. This name will be used for administrative purposes in the cluster and must not conflict with any existing names on the network (such as computer, server, printer, or domain names). This is not a network name for clients to attach to. Obtain a static IP address and subnet mask for the cluster. This address will be associated with the cluster name. You may need additional IP addresses later for groups of resources (virtual servers) in the cluster. Set multiple-speed network adapters to a specific speed. Do not use the autodetect setting if the adapter has one. For more information, click the following article number to view the article in the Microsoft Knowledge Base: 174812 Effects of Using Autodetect Setting on Cluster Network Interface Card (NIC) Decide the name of the folder and the location for cluster files that will be stored on each server. The default location is %WinDir%\Cluster, where %WinDir% is your Windows folder.
16
Determine what account the Cluster service (ClusSvc) will run under. If you must create a new account for this purpose, create that account before installation. Make the domain account a member of the local Administrators group. Although the Domain Admins group may be a member of the Administrators group, this is not sufficient. The account must be a direct member of the Administrators group. Do not put any password restrictions on the account. Also make sure that the account has Logon as a service and Lock pages in memory rights.
17
18
system failure occurs. Also, make sure that paging files are stored on local disks only, not on shared drives. Note The System Monitor ActiveX control may be a valuable resource for troubleshooting virtual memory problems. Access this control from the Performance Console under Administrative tools. Do All Servers Belong to the Same Domain? Each server in the cluster must have membership in the same domain. Also, the service account that the Cluster service uses must be the same on each server. Cluster nodes may be domain controllers or domain member servers. However, if a domain controller is functioning as a domain member server, that domain controller must be accessible for Cluster service account authentication. This is required for any service that uses a domain account when it starts. Is the Primary Domain Controller Accessible? During configuration, Setup must be able to communicate with the primary domain controller (PDC), if you are operating in a Windows NT 4.0 domain, or a Windows 2000 domain controller. Otherwise, configuration will fail. Additionally, the Cluster service may not start if domain controllers are not available to authenticate the Cluster service account. To obtain the best results, make sure that each system can connect and communicate with one or more domain controllers. Are You Installing While Logged On as an Administrator? To configure the Cluster service, you must have administrator rights on each server. To obtain the best results, log on to the server with an administrator account before you configure the service. Do the Drives on the Shared Bus Appear to Function Correctly? Devices on the shared SCSI bus must be turned on, configured, and functioning correctly. Are Any Errors Listed in the Event Log? Before you install any new software, it is good practice to check the system and application event logs for errors. These resources indicate the state of the system before you make configuration changes. Events may be posted to these logs if errors or hardware malfunctions occur during the configuration process. Try to correct any problems that you find. Appendix A of this document, "Event Messages," contains information about some events that may be related to the Cluster service, and it provides possible resolutions. Is the Network Configured and Functioning Correctly? The Cluster service relies on configured networks for communication between cluster nodes and for client access. If the network is not configured correctly or functioning as expected, the cluster software cannot function correctly. The configuration process tries to validate attached networks and must use them
19
during the process. Make sure that the network adapters and TCP/IP protocol are correctly configured with correct IP addresses. You may have to discuss correct addressing with your network administrator. For best results, use statically assigned addresses and do not rely on Dynamic Host Configuration Protocol (DHCP) to supply addresses for these servers. Also, make sure that you are using the correct network adapter driver. Some adapter drivers may appear to work because they are similar to the actual driver needed but are not an exact match. For example, an OEM or integrated network adapter may use the same chipset as a standard version of the adapter. If you use the same chipset, the standard version of the driver may load instead of an OEMsupplied driver. Some of these adapters work more reliably with the driver that is supplied by the OEM and these adapters may not achieve acceptable performance if you use the standard driver. Sometimes, this combination may prevent the adapter from functioning at all, even though no errors appear in the system event log for the adapter.
20
connect to the cluster. Try using the name of the other node for the join process. If that succeeds, there may be a name resolution problem. Are Both Nodes Connected to the Same Network or Subnet? Both nodes must use unique addresses on the same network or subnet. The cluster nodes must be able to communicate directly without routers or bridges between them. If the nodes are not directly connected to the same public network, the Cluster service cannot fail over IP addresses. Why Can You Not Configure a Node After It Was Evicted? If you evict a node from the cluster, that node can no longer participate in cluster operations. If you restart the evicted node but have not removed the Cluster service from it, the node will still try to join, and cluster membership will be denied. You must use the Add/Remove Programs utility in Control Panel to remove the Cluster service.
21
22
23
24
25
If the RequireDNS property is set to 1, the DNS HOST (A) record for the virtual server must be registered. If it is not registered, the network name resource does not come online. If the DNS server accepts dynamic updates but the record cannot be updated, the resource fails. If the DNS server does not accept dynamic updates (older implementations of the Domain Name System [DNS] protocol) or if there are no DNS servers specified for the resource's associated network, the network name resource may go online without error. If a network name resource does not come online after Kerberos support is turned on, the Cluster service account may not have correct permissions for Active Directory. For additional information about a resource that does not come online, click the article numbers below to view the articles in the Microsoft Knowledge Base: 307532 How to Troubleshoot the Cluster Service Account When It Modifies Computer Objects 257903 Cluster Network Name May Not Come Online with Event ID 1052 217199 Static WINS Entries Cause the Network Name to Go Offline
26
CHAPTER 4: PROBLEMS ADMINISTERING THE CLUSTER Cannot Connect to the Cluster Through Cluster Administrator
The most common way to administer the cluster from a remote workstation is to use the network name that you defined during the setup process as ClusterName. This resource is located in the Cluster group. Cluster Administrator needs to use RPC to establish a connection. If the RPC service has failed on the cluster node that owns the Cluster group, you cannot connect through the cluster name or the name of the computer. Instead, try to use the computer names of each cluster node to connect. If this works, there is a problem with either the IP address or network name resources in the Cluster group. There may also be a name resolution problem on the network that may prevent access through the cluster name. If you cannot connect by using the cluster name or computer names of the nodes, this may indicate problems with the server, RPC connectivity, or security. Make sure that you are logged on with an administrative account in the domain and that the account has access to administer the cluster. You can use Cluster Administrator on one of the cluster nodes to grant administrator access to additional accounts. If Cluster Administrator cannot connect from the local console of one of the cluster nodes, verify that the Cluster service is started. Check the system event log for errors. You may want to enable diagnostic logging for the Cluster service. If the problem occurs shortly after you start the system, wait 30 to 60 seconds for the Cluster service to start, and then try to run Cluster Administrator again.
27
paused state, the node is a fully active member in the cluster but cannot own or run groups. Both cluster nodes should be listed in the Possible Owners list for the resources in the group. If they are not on the list, the group may only be owned by a single node and will not fail over. While this restriction may be intentional in some configurations, generally it is a mistake because it prevents the whole group from failing over. You also cannot move a group if resources in the group are in a pending state. Before you can initiate a Move Group request, resources in the group must be in one of the following three states: online, offline, or failed.
28
the network name implies a dependency on the address. You can think of this as a cascading dependency. You might ask, "What about the disk where the data will be? Should the share depend on the existence or online status of the disk? The answer is yes. It is a good idea to create a dependency on the physical disk resource, although this dependency is not required. If the New Resource wizard did impose this requirement, it would imply that the only data source that can be used for a file share is a physical disk resource on the shared SCSI bus. However, for volatile data, shared storage is the best option, and it is best to create a dependency for it. When you use shared storage, if the disk experiences a momentary failure, the share is taken offline and restored when the disk becomes available. However, because there is no requirement for dependency on a physical disk resource, the administrator has additional flexibility to use other disk storage for holding data. If you use nonphysical disk data storage for the share and you want to move the share to the other node, that node must have equivalent storage, and the same driver letter and information must be available on that node. Also, if the data is volatile, Microsoft recommends that you provide some method of data replication or mirroring for this type of storage. You might consider using software from another manufacturer to handle this situation. Microsoft does not recommend that you use local storage in this manner for read-and-write shares. For readonly information, the two data sources can remain synchronized ("in sync"), so that you avoid problems with "out-of-sync" data. If you use a shared drive for data storage, make sure to establish the dependency with the share and with any other resources that depend on it. If you do not establish this dependency, you may experience irregular or unwanted behavior of resources that depend on the disk resource. With the dependency established, some applications or services that rely on the disk may quit. If you use Cluster.exe to create the same resources, you can create a network name resource without the required IP address resource. However, the network name will not go online, and any attempt to bring it online will generate errors. Using the Generic Resource for Non-Microsoft Applications Although some non-Microsoft services may require that you modify them to use them in a cluster, many services may function as usual while they are controlled by the generic service resource type that is provided with the Cluster service. If you have a program that runs as an application on the servers desktop and you want that program to be highly available, you may be able to use the generic application resource type to control this application in the cluster. The parameters for each of these generic resource types are similar. However, when you plan to have the cluster manage these resources, you must be familiar with the software and with the resources that the software requires. For example, the software might create a share of some kind for clients to access data. Applications frequently need access to their installation directory to access .dll or .ini files, to access stored data, or to create temporary files. In some cases, it
29
may be best to install the software on a shared drive in the cluster so that the software and the necessary components are available to any node if the group that contains the service moves to another cluster node. For example, consider the following scenario: You have a service that is named SomeService. Assume that this is a non-Microsoft service that does something useful. The service requires that the share, SS_SHARE, must exist, and that the share maps to a directory that is named Data and that is located under the installation directory. The startup mode for the service is set for automatic so that the service starts automatically when the system starts. Typically, the service is installed in the C:\SomeService folder, and it stores dynamic configuration details in the following registry key: HKEY_LOCAL_MACHINE\Software\SomeCompany\SomeService If you want to configure the Cluster service to manage this service and make it available through the cluster, follow steps similar to these (the details might vary depending on the specific service's requirements): 1. Use Cluster Administrator to create a new group. Name the group SomeGroup to remain consistent with the software naming convention. 2. Make sure that the group has a physical disk resource to store the data and the software, an IP address resource, and a network name resource. For the network name, use SomeServer. Clients will use this name on the network to access the share that will be in the group. 3. Install the software on the shared drive (drive Y, for example).
4. Use Cluster Administrator to create a file share resource and name it SS_SHARE. Make the file share resource dependent on the physical disk and the network name. If either of these resources fails or goes offline, you want the share to follow the state of either dependent resource. Set the path of the Data directory on the shared drive. According to what you know about the software, this is Y:\SomeService\Data. 5. Set the startup mode for the service to manual. Because the Cluster service will control the service, the service does not have to start itself before the Cluster service starts and brings the physical disk and other resources online. 6. Create a generic service resource in the group. Use a descriptive name, such as SomeService, to match the service name. Make both cluster nodes possible owners. Make the resource dependent on the physical disk resource and network name. Specify the service name and any necessary service parameters. Select the Use network name for computer name check box. This causes the applications API call that requests the computer name to return the network name in the group. Specify to replicate the registry key by adding the following line to the Registry Replication tab: Software\SomeCompany\SomeService
30
7.
Bring all the resources in the group online, and then test the service.
8. If the service works correctly, stop the service by taking the generic service offline. 9. Move the group to the other node.
10. Install the service on the other node. Use the same parameters and installation directory on the shared drive. 11. Use the Devices utility in Control Panel to set the startup mode to manual. 12. Bring all the required resources and the generic service resource online, and then test the service. Note If you evict a node from the cluster and have to completely reinstall the node from the beginning, you will likely have to repeat steps 10 through 12 on the node when you add it back to the cluster. The procedure described here is generic in nature and may be adapted to various applications. If you do not know how to configure a service in the cluster, contact the application software vendor for more information. The procedure to configure the cluster to manage an application is similar, except that you must substitute the generic application resource type for the generic service resource type that was used in the previous procedure. If you have a simple application that is already installed on both systems, you may adapt the following steps to the procedure that was previously described. 13. Create a generic application resource in a group. This example will make Notepad.exe a highly available application. 14. For the command line, specify c:\winnt\system32\notepad.exe (or a different directory, depending on your Windows installation directory). The path must be the same on each cluster node. Make sure that you specify the working directory, and then select the Allow application to interact with the desktop check box so that Notepad.exe does not go into the background. 15. Skip the Registry Replication tab, because Notepad.exe does not have registry keys that require replication. 16. Bring the resource online, and see that it appears on the desktop. Click Move Group, and the application should appear on the other nodes desktop. Some cluster-aware applications may not require this type of setup, and they may have setup wizards to create necessary cluster resources. Deleting Resources Some resources may be difficult to delete if any cluster nodes are offline. For example, you may be able to delete an IP address resource if only one cluster node is online. However, if only one cluster node is online when you try to delete
31
a physical disk resource, an error message may appear that indicates other nodes must be online to perform the action. Physical disk resources affect the disk configuration on each node in the cluster. Therefore, all cluster nodes must be online to remove this type of resource from the cluster. If you try to remove a resource that other resources depend on, a dialog box like the following that lists the related resources appears:
These resources will also be deleted because they are linked by dependency to the resource that you chose to remove. To avoid removal of these resources, first change or remove the configured dependencies. Moving Resources from One Group to Another To move resources from one group to another group, both groups must be owned by the same cluster node. If you try to move resources between groups with different owners, you may receive the following error message: The cluster node is not the owner of the group. To easily correct this error, move one of the groups so that both groups have the same owner. Remember that resources that you move may have dependent resources. When you try to move resources between groups, if you experience problems other than those mentioned in this section, they may be caused by system problems or configuration-related problems. Check event logs or cluster log files for more information that may relate to the specific resource.
32
perform file system checks on shared drives when the system starts, even if the operations are required. The Cluster service performs a file system integrity test for each drive when it brings a physical disk online. The cluster automatically starts Chkdsk if it is necessary. If you have to run Chkdsk on a drive, click the following article numbers to view the articles in the Microsoft Knowledge Base: 174617 CHKDSK Runs While Running Microsoft Cluster Server Setup 176970 How to Run the CHKDSK /F Command on a Shared Cluster Disk
33
be necessary before a database resource can be brought online. Taking the resource offline may require that pending database transactions be completed first so they do not have to be rolled back at online time.
34
35
36
37
your configuration is an arbitrated loop that serves multiple clusters, make sure that the firmware settings on all related devices match what the vendor requires for the configuration. For this type of configuration, the vendor may disable LIPs in the firmware. LIP events can be a natural response to a virtual bus reset or a reset to a device when the configuration is incorrect. When you replace HBAs or add new servers to a SAN, make sure that you use the correct HBA for the topology. An FC-AL HBA on a switched fabric can cause many problems. If you use multiple HBAs to provide redundancy, make sure that the multiple path software for this feature operates correctly. Make sure that the software was installed on all applicable nodes that are attached to the SAN. Without multiple-path software, Windows will see each HBA as a separate storage bus and enumerate the devices that the operating system sees through each HBA. Duplicate enumeration of these devices may quickly result in data corruption or loss of disk signatures.
Adding Devices
When you add disk devices to a fiber-based storage system, it is a good idea to view the vendor documentation. Typically, you do not have to turn everything off to plug a new device into the hub or switch. However, it is better to shut down the nodes (except for one) when you add a new disk. Until you make the cluster aware of the new disk device where access may be managed by reserve/release methods, the disk may be open to writes from multiple hosts. If multiple hosts can write to a disk without the reservations in place, corruption is very likely to occur. Some corruption may not be immediately obvious and may appear later when the device contains critical data. For best results, add the device and configure it with one node, and then start any remaining cluster nodes. Test the device and the ability to change ownership of the device before you trust it for critical data. If the SAN allows you to restrict access to the new disk to only one node, you may do that instead of shutting down nodes. In this way, you can prevent multiple nodes from accessing an otherwise unprotected disk. With a shared SCSI bus, removable devices and tape drives are not accepted. However, you can connect such devices with fiber cables. Generally, such devices must reside in a separate isolated zone without any cluster-related devices.
Testing Devices
Follow vendor guidelines to test SAN devices. You might think that a good way to test a working cluster that is attached to a SAN is to unplug the fiber cable from the HBA on a node. However, this is not a good way to test the configurations ability to recover from failures. SAN implementations respond differently when you disconnect cables. Some HBAs do not signal the operating system and may actually cause a failure.
38
Snapshot Volumes
Many SAN implementations support the use of snapshot volumes. This is a great way to continue operations on one volume with a snapshot of data that exists on another volume. You can think of this as having a mirrored volume at the SAN level that you can break whenever you want. You can use one copy to continue operations and the other copy as a snapshot to go back to or perform a backup. You must make sure that the snapshot volume is hidden from the operating system because it will have the same disk signature as the live volume.
39
CHAPTER 7: CLIENT CONNECTIVITY PROBLEMS Clients Have Intermittent Connectivity Based on Group Ownership
If clients can successfully connect to clustered resources only when a specific node is the owner, there are several possible causes for this problem. Check the system event log on each server for possible errors. Make sure that the group has at least one IP address resource and one network name resource, and that clients use one of these to access the resource or resources within the group. If clients connect with any other network name or IP address, they may not access the correct server if ownership of the resources changes. As a result of improper addressing, access to these resources may appear limited to a particular node. If you can confirm that clients use correct addressing for the resource or resources, check the IP address and network name resources to see that they are online. Check network connectivity with the server that owns the resources. For example, try the following tests: ping server's primary adapter IP address (on the client network) ping other server's primary adapter IP address (on the client network) ping IP address of the group ping network name of the group ping router/gateway between client and server (if any) ping client IP address If the tests up to the router/gateway test work correctly, the problem may be elsewhere on the network because you have connectivity with the other server and local addresses. If tests complete successfully up to the client IP address test, there may be a client configuration or routing problem. From the client, run these tests: ping client IP address ping router/gateway between client and server (if any) ping server's primary adapter IP address (on the client network) ping other server's primary adapter IP address (on the client network) ping IP address of the group ping network name of the group If the tests from the server all pass but you experience failures when you do the tests from the client, there may be client configuration problems. If all tests complete successfully except the test that uses the network name of the group, there may be a name-resolution problem. This may be related to client configuration, or it may be a problem with the client's designated WINS server. These problems may require a network administrator to intervene. If the ping tests work only when a particular node owns the group, try the tests again with both nodes attached to a single hub that is feeding into the same switch
40
port. If this works, the problem may be that the switch is not handling the IP address ownership transition. Additionally, if you use fault-tolerant network adapters for redundancy or traffic load balancing, disable this feature, and then try the tests again. These adapters provide additional network features through vendor-supplied hardware and software in a transparent manner from the operating system.
Clients Cannot Access Cluster Resources Immediately After You Change an IP Address
If you create a new IP address resource or change the IP address of an existing resource, clients may experience some delay if you use WINS for name resolution on
41
the network. This problem may occur if there are delays in replication between WINS servers on the network. The Cluster service cannot control these delays. Allow sufficient time for the WINS servers to replicate. If you suspect there is a WINSdatabase problem, ask your network administrator or contact Microsoft Product Support Services for TCP/IP support. Verify that static DNS entries are correct. If you use a DNS server that supports the dynamic update protocol, check the system event log for errors that may indicate a problem updating the registration. If you detect from the event log that the registration update did not occur, make sure that the DNS server configuration supports dynamic updates. Check permissions on the DNS entry in case they have been locked down. Also, check to make sure that the Register this connections addresses in DNS setting is enabled for the corresponding adapter in the node.
42
Driver Problems
Changing from independent software vendor (ISV) drivers to Microsoft drivers may improve stability because of the large amount and quality of testing that goes into the Microsoft-version drivers. However, the Microsoft-version drivers may not provide all available printer features. There may also be a change in settings or a change in data structure between drivers. In other words, data stored in the registry for an ISV driver may not be compatible with a Microsoft-version driver.
43
print server and stored on the client. The print spooler resource that is provided by the Cluster service supports Point and Print. Clients that use the same print architecture as the server do not require any other special drivers. However, for example, if a Windows NT 4.0 client connects to a Windows 2000-based server, the client requires that a version-2 driver be loaded on the server so that the client can automatically obtain the version-2 driver. A Windows XP client connecting to a Windows 2000-based server obtains a copy of the version-3 driver that is already on the server. Microsoft Windows 95, Microsoft Windows 98, and Microsoft Windows Millennium Edition clients obtain their driver from the server if the driver is loaded for them on the server. However, these clients do not obtain settings. Printer settings for Windows 95, Windows 98, and Windows Millennium Edition clients still require configuration to match the printers on the server.
Driver Synchronization
Server and client drivers must be synchronized. If the client has a newer version of a driver than the server has, the client will send a temporary copy of its driver with a print job for rendering the document. This will at least affect performance. If the driver has compatibility problems with other drivers that are loaded into the spooler, other problems may also occur. If a client has an older driver than the server has, the client will obtain the driver update when it connects to the server. On a cluster, make sure that all servers and all clients have the same version of each driver. The exception to this is if the server stores drivers for other operating systems. In this case, make sure that the stored drivers on the server match those that are loaded on the clients that have the same operating system. Some IHV drivers may cause problems, while other IHV drivers may work perfectly. Some drivers may include their own port or language monitors that you do not need because of the monitors that are provided by Windows 2000 and Windows Server 2003. If you experience problems with a particular printer or a clustered print spooler, remove any non-Microsoft port or language monitors and use those monitors that are supplied by Windows. Also try removing any non-Microsoft print processors that are not required. A good way to synchronize driver versions on clients with those that are loaded on the servers is to load the same driver on each node. Then, take a subset of clients and remove the relevant printer driver from the clients. The next time that the client connects to the print spooler, the client will obtain the correct driver through Point and Print. If the drivers are not synchronized, the system event log on the server may report Event ID 20 to signal that a temporary copy of the client's driver was loaded on the server to render the clients job.
44
45
46
on other network name resources in the same cluster but not enable Kerberos on the network name resource that either of these applications depend on.
47
WINS
When you configure WINS on a server in a cluster, the database files are stored in the cluster storage in a path that you specify during configuration. The database files are all stored there, with one exception. The Winstmp.mdb file is in the %SystemRoot%\System32\WINS folder on the local hard disk. This behavior is different from a single-server environment that runs WINS. Do not move or rename the Winstmp.mdb file. The WINS service requires the Winstmp.mdb file for reindexing the WINS database. The file is not used for database recovery and must not be backed up. If you try to back up this file, you may receive an error message that states that the file is open or that there is a sharing violation. You can ignore these errors messages. WINS and DHCP both use .mdb files, but DHCP does not have a file that is equivalent to Winstmp.mdb. Do not change the default settings for the NTFS permissions on the %SystemRoot%\System32\WINS folder. These settings include full control permissions for the System account. After you configure WINS on a cluster, go back and change the configuration for any adapters that serve clients to contain the virtual IP address of the WINS server. When you do this on each cluster node, the Cluster service on each node registers the addresses with WINS.
DHCP
As with WINS, when you configure DHCP on a server in cluster, the database files are stored in the cluster storage in a path that you specify during configuration. This is necessary for failover to work correctly. Do not configure address scopes before you configure the DHCP resource. If the server is a clustered DHCP server in a Windows 2000 Active Directory environment, the DHCP server must be authorized in
48
Active Directory before it can respond to client requests for IP addresses in configured scopes.
Jetpack
You can use Jetpack.exe on a cluster to compact a WINS or DHCP database for the same reasons that stand-alone on a stand-alone server that is running either of these network services. The procedures for using it on a cluster differ slightly because of the location of the data files. Also, you must first use Cluster Administrator to take the resource offline.
49
DFS Root
If you use the Distributed File System, a cluster can host a single DFS root structure. If you try to add additional DFS roots to a cluster, you will receive the following error message:
Error ID: 5088
50
DFS child nodes may have shares on other stand-alone servers, other clusters, or even on the same cluster as the DFS root. You can even use the DFS root to create a single namespace on the network for file shares in a single cluster that are supported by different virtual servers. For more information about DFS on a cluster and specific administrative steps, click the following article numbers to view the articles in the Microsoft Knowledge Base: 301588 How to Use DFS on Server Cluster to Maintain a Single Namespace 224508 How to Migrate a DFS Root Configuration to a Windows 2000 Cluster 224384 How to Force Deletion of DFS Configuration Information For more information about creating file shares on a cluster, click the following article numbers to view the articles in the Microsoft Knowledge Base: 224967 How to Create File Shares on a Cluster 284838 How to Create a Server Cluster File Share with Cluster.exe 254219 Security Considerations When Implementing Clustered File Shares 256926 Implementing Home Folders on a Server Cluster
51
If the operating system and the file system succeed in these tasks, the file system generally does not require a Chkdsk operation, even if an unexpected power failure or system failure occurs. In rare cases, a system failure may cause one of these rules to be violated. This can cause corruption and require you to run Chkdsk. If the system needs Chkdsk to run because of a detected problem, something under the file system is experiencing a problem that causes a violation of one of the four rules. The rules are the same for local storage as for cluster storage. However, the ability to follow these rules and help protect disks against multiple systems relies heavily on the external disk storage system or SAN hardware and the related drivers. The file system contains a number of checks to make sure that the data looks correct. When the system finds something that does not look right, it indicates in the system event log that the file system may be corrupted and that Chkdsk must be run. The "dirty" bit on the volume is set so that Chkdsk will run the next time that the disk is mounted. As an administrator, if you see Event ID 55 in the system event log, it is a good idea to run Chkdsk as soon as possible. Do not just ignore the message. Although the problem may be as minor as a corrupted attribute on a rarely used file, it may be more serious. The longer the situation is left waiting for Chkdsk, the greater the potential is for additional corruption. Microsoft's goal is for customers to never have to run Chkdsk on a volume. However, situations may exist when you do have to run Chkdsk to make sure that you maintain data integrity. When you start the operating system, the system performs a check of locally attached storage. During this process, Autochk (the boot-time version of Chkdsk) gives you the option to bypass the Chkdsk operation on the volume.
Delaying Chkdsk
It may be dangerous to delay running Chkdsk. This delay may allow you to use a
52
problem volume meanwhile, but the volume might become more corrupted. Generally, Microsoft recommends that you only delay Chkdsk for emergency reasons, such as creating a read-only emergency backup on another disk so that the problem disk can be repaired. Think carefully before you bring the potentially corrupted data online.
Duration of Chkdsk
Disk size has only a minor effect on the amount of time that it takes to perform a Chkdsk operation. The number of files and directories in the file system is the most important factor that affects Chkdsk duration. Other factors include available resources (such as the CPU and memory), speed of disk hardware, and degree of fragmentation. If you manually invoke chkdsk and specify the /r option, Chkdsk reads every sector on the disk drive. The disk may have bad sectors. In some situations, you may want to use this type of scan, but with a large volume, this type of scan will take significantly longer.
53
Faster Chkdsk
With Windows NT 4.0 SP4, the /c and /i options became available. These switches allow you to run an abbreviated chkdsk to target common forms of corruption. For more information, click the following article number to view the article in the Microsoft Knowledge Base: 314835 An Explanation of the New /C and /I Switches That Are Available to Use with Chkdsk.exe Although this article was written for Windows XP, most of the information is not unique to Windows XP. Note The /c and /i switches are available when you run chkdsk interactively. When you use these switches, a volume can remain corrupted even after you run chkdsk. Therefore, Microsoft recommends that you use these switches only if you must keep downtime to a minimum. These switches are intended for situations when you must run chkdsk on exceptionally large volumes and you require flexibility in managing the downtime that occurs. You can use the /c and /i switches on a Cluster service volume, but you must first bring the cluster down. This is useful on extremely large volumes where chkdsk might otherwise take hours or even days to complete. To do this on a two-node cluster: 1. Turn off one node. 2. On the other node, disable the Cluster service and the Clusdisk driver. 3. Restart the node where you disabled the Cluster service in step 2. 4. Run chkdsk on all volumes in the array. 5. Reenable the service and the Clusdisk driver. 6. Restart the node where you disabled and reenabled the Cluster service, and then restart the other cluster nodes.
54
volume - Specifies the drive letter (followed by a colon), mount point, or volume name. /d - Restores the computer to the default behavior. All drives are checked at boot time, and Chkdsk is run on those that are "dirty." /t:time - Changes the Autochk initiation countdown time to the specified amount of time in seconds. If a time is not specified, the current setting is displayed. /x - Excludes a drive from the default boot-time check. Excluded drives are not accumulated between command invocations. /c - Schedules a drive to be checked at boot time. Chkdsk will run if the drive is "dirty." Example: C:\>chkntfs c:
The type of the file system is NTFS. C: is not dirty.
The command chkdsk driveletter, without additional parameters, performs a readonly scan of the disk. So, it looks for problems at a high level but does not fix any that it may find.
55
the last backup if you experience a failure. Snapshots are one potential way to avoid potentially lengthy Chkdsk operations if you have a mechanism to achieve data consistency between the live volume that is potentially corrupted and the snapshot volume that you may have run an offline Chkdsk on by using another server. Contact your SAN vendor for more information about these capabilities.
56
272244 Location of the CHKDSK Results for Windows Clustering Resources 223023 Enhanced Disk Resource Private Properties Using Cluster Server 218461 Description of Enhanced Chkdsk, Autochk, and Chkntfs Tools in Windows 2000
57
Replacing Adapters
When you replace a network adapter, make sure that the new adapter configuration for TCP/IP exactly matches the configuration of the old adapter. In some configurations, if you replace a SCSI adapter that has Y cables with external termination, you may be able to disconnect the SCSI adapter without affecting the remaining cluster node. Contact your hardware vendor for correct replacement techniques if you want to try to replace adapters without shutting down the whole cluster.
58
Some non-Microsoft backup software may not include this registry hive when it backs up system registry files. Therefore, if you rely on a non-Microsoft backup solution, you must verify your ability to back up and restore this hive. The registry file for the cluster hive is in the directory where the cluster software was installed, not on the quorum disk. Because most backup software (at the time of this writing) is not cluster-aware, it may be important to establish a network path to shared data for use in system backups. For example, if you use a local path to the data (for example, G:\) and if the node loses ownership of the drive, the backup operation may fail because it cannot reach the data by using the local device path. However, if you create a cluster-available share to the disk structure and map a drive letter to the share, the connection may be reestablished if ownership of the actual disk changes. Although the ultimate solution would be a fully cluster-aware backup utility, this technique may provide an alternative until such a utility is available.
Administrative Suggestions
Because the main purpose of a cluster is to keep important resources highly available, it is best to administer the cluster with this goal in mind. This section contains hints, suggestions, and techniques for optimizing administrative efforts. Use the Resource Description Field Many administrators overlook the use of the description field for cluster resources. You can use this field to describe a resource, but more importantly, you can use it as a quick reference for correctly identifying resources. For example, the cluster IP address resource does not indicate the actual IP address. To find the address, you must view the resource properties and select the Parameters tab. You can simplify this process by typing the IP address in the description field so that you can see the address quickly while you view other resources. This also prevents you from accidentally modifying the information on the Parameters tab of a resource when you look for the IP address in the usual way. Remove Empty Groups If you create new groups and move disk resources from their default groups, do not retain the empty default groups. You will make administrative tasks easier if you delete these empty groups. Avoid Unnecessary or Redundant Dependencies Avoid configuring resources with redundant or unnecessary dependencies. The goal of the dependency relationship between resources is to specify that a resource must have another resource online to function correctly. Conversely, it also specifies an order for resources to be taken offline to prevent resource failure. For example, consider the following group and resources:
59
File Share Group Physical Disk IP Address Network Name File Share Disk H: 192.168.0.12 VirtualServer1 H:\Users
The correct dependency relationship is for the file share resource to depend on a physical disk resource and a network name. The file share resource must also have a dependency on the IP address resource because the network name resource requires a dependency on the IP address resource. You can specify a dependency between the file share resource and the IP address but that is redundant and requires additional overhead. Creating Redundancy for Network Adapters Many clusters include a public network for clients and a private network for cluster communications. Common roles for this configuration are all communications for the public network and internal cluster communications only for the private network. These roles provide an alternate network for cluster communications if the private network fails. However, some administrators who use the same two networks make the public network client access only. This choice only provides one network for cluster communications. If you want to keep cluster communication traffic completely isolated from the public network, install a second private network with the role of internal cluster communications for the cluster nodes to use. Note Microsoft does not support the use of fault-tolerant network adapters on the private interconnects between cluster nodes. Adapters of this type are suitable for other attached networks. Check the Size of the Quorum Log File The default size for the quorum log file may not be sufficient for your cluster. To check the size, right-click the cluster name in Cluster Administrator, click Properties, and then click the Quorum tab. A size of 2,048 KB or 4,096 KB is sufficient for most clusters that have 10 to 20 resources and many printers. If you notice a loss of printers or if resources remain offline after a group moves to another node, check the size of the quororum log file that is reported on the Quorum tab in the Reset quorum log at field. Monitor Performance When you first bring a cluster into production, it is a good idea to obtain some baseline performance information. You can use System Monitor to collect performance data every 10 or 15 minutes for a day. You can then save this file and refer back to it later. Note the number of clients or any other pertinent information. This information may be helpful later as a comparison with logs for capacity planning, troubleshooting, or diagnosing memory leaks.
60
Load Balance The Cluster service does not provide any mechanism for automatic load balancing. As an administrator, you can statically balance the load between servers in a cluster. A single cluster node does not have to do all the work while other cluster nodes wait to be useful. You can distribute groups of resources between cluster nodes to achieve better performance. You can manually move groups to do this, but you can also specify a preferred owner for a group so that groups can be owned (when possible) by the nodes that you specify. However, if resources and applications on each node consume more than 50 percent of each servers available CPU capacity, and if a single cluster node must carry the whole load sometimes, do not expect performance at those times to be spectacular. Plan server capacity so that if a failure occurs that requires one node to host all cluster resources, that node can provide an acceptable level of performance.
61
Do not try to configure cluster resources to use unsupported network protocols or related network services (such as IPX, DLC, APPLETALK, or Services for Macintosh). The Microsoft Cluster service works only with the TCP/IP protocol. Do not use non-Microsoft print monitors, language monitors, or print processors for printers that are hosted by a print spooler resource.
62
Event ID
Source ClusSvc Description
1000
Microsoft Clustering Service suffered an unexpected fatal error at line LineNumber of source module ModuleNumber. The error code was ErrorNumber.
Event ID
Source ClusSvc Description
1001
Microsoft Clustering Service failed a validity check on line LineNumber of source module ModuleNumber.
Problem Messages like this may appear when a fatal error occurs that may cause the Cluster service to quit on the node that experienced the error. Solution Check the system event log and the cluster diagnostic log file for additional information. The Cluster service may restart itself after the error. This event message may indicate serious problems that may be related to hardware or other causes.
Event ID
Source ClusSvc Description
1002
Microsoft Clustering Service handled an unexpected error at line LineNumber of source module Module. The error code was ErrorCode.
Problem Messages like this may occur after you install the Cluster service. If the Cluster service starts and successfully forms or joins the cluster, you can ignore the message. Otherwise, these errors may indicate a corrupted quorum log file or other problem.
63
Solution Ignore the error if the cluster appears to be working correctly. Otherwise, you may want to try to create a new quorum log file. For more information, click the following article number to view the article in the Microsoft Knowledge Base: 257905 Error Message: Could Not Start the Cluster Service on the Local Computer
Event ID
Source ClusSvc Description
1003
The DllName value for the type resource type does not exist. Resources of this type will not be monitored. The error was error.
Event ID
Source ClusSvc Description
1004
The LooksAlive poll interval for the type resource type does not exist. Resources of this type will not be monitored. The error was error.
Event ID
Source ClusSvc Description
1005
The IsAlive poll interval for the type resource type does not exist. Resources of this type will not be monitored. The error was error.
Event ID
Source ClusSvc Description
1006
Microsoft Clustering Service was halted due to a cluster membership or communications error. The error code was ErrorCode.
Problem An error occurred between communicating cluster nodes that affected cluster membership. This error may occur if nodes lose the ability to communicate with each other. Solution Check network adapters and connections between nodes. Check the system event log for errors. There may be a network problem that is preventing reliable communication between cluster nodes.
Event ID
Source ClusSvc Description
1007
Event ID
Source ClusSvc Description
1008
The Cluster Resource Monitor was started with an invalid command line option OptionName.
64
Event ID
Source ClusSvc Description
1009
The Clustering Service could not join an existing cluster and could not form a new cluster. The Clustering Service has terminated.
Problem The Cluster service started and tried to join a cluster. The node may not be a member of an existing cluster because it was evicted by an administrator. After a cluster node has been evicted from the cluster, the cluster software must be removed from that computer and reinstalled if you want the computer to join the cluster again. And because a cluster already exists with the same cluster name, the node could not form a new cluster with the same name. This event may also mean that there is a communication problem that is preventing the node from joining a cluster that already owns the quorum disk. Solution Examine event logs and cluster log files to determine if a communication problem exists. If this node was previously evicted, remove the Cluster service from the affected node, and then reconfigure the Cluster service on that system if you want.
Event ID
Source ClusSvc Description
1010
The Clustering Service is shutting down because the current node is not a member of any cluster. The Clustering Service must be reinstalled to make this node a member of a cluster.
Event ID
Source ClusSvc Description
1011
Event ID
Source ClusSvc Description
1012
Microsoft Clustering Service did not start because the current version of Windows 2000 is not correct. Microsoft Clustering Service runs only on Windows 2000, Advanced Server and Datacenter Server.
Event ID
Source ClusSvc Description
1013
The quorum log checkpoint could not be restored to the cluster database. There may be open handles to the cluster database that prevent this operation from succeeding.
Event ID
Source ClusSvc Description
1014
Microsoft Clustering Service can be started from Services in the Control Panel or by issuing the command net start clussvc at a command prompt.
65
Event ID
Source ClusSvc Description
1015
No checkpoint record was found in the log file FileName, the checkpoint file is invalid or was deleted.
Event ID
Source ClusSvc Description
1016
Microsoft Clustering Service failed to obtain a checkpoint from the cluster database for log file FileName.
Event ID
Source ClusSvc Description
1017
The log file FileName exceeds its maximum size. An attempt will be made to reset the log, or you should use the Cluster Administrator utility to adjust the maximum size.
Solution You may have to change the quorum log size limit. For more information, click the following article number to view the article in the Microsoft Knowledge Base: 225081 Cluster Resources Quorum Log Size Defaults to 64 KB
Event ID
Source ClusSvc Description
1019
The log file FileName was found to be corrupt. An attempt will be made to reset it, or you should use the Cluster Administrator utility to adjust the maximum size.
Solution You may have to change the quorum log size limit. For more information, click the following article number to view the article in the Microsoft Knowledge Base: 225081 Cluster Resources Quorum Log Size Defaults to 64 KB
Event ID
Source ClusSvc Description
1021
There is insufficient disk space remaining on the quorum device. Please free up some space on the quorum device. If there is no space on the disk for the quorum log files then changes to the cluster registry will be prevented.
Solution Free disk space. It is best if the quorum disk partition is not used by other applications, both to avoid performance degradation and to prevent problems that may cause the cluster to be unavailable.
Event ID
Source ClusSvc
1022
66
Description
There is insufficient space left on the quorum device. The Microsoft Clustering Service cannot start.
Solution Free disk space. It is best if the quorum disk partition is not used by other applications, both to avoid performance degradation and to prevent problems that may cause the cluster to be unavailable.
Event ID
Source ClusSvc Description
1023
The quorum resource was not found. The Microsoft Clustering Service has terminated.
Problem The device that is designated as the quorum resource was not found. This may be because the device failed at the hardware level or because the disk resource that corresponds to the quorum drive letter does not match or no longer exists. Solution Use the -fixquorum startup option for the Cluster service. Investigate and resolve the problem with the quorum disk. You can designate another disk as the quorum device, restart the Cluster service, and then start the other nodes.
Event ID
Source ClusSvc Description
1024
The registry checkpoint for cluster resource ResourceName could not be restored to registry key HKEY_LOCAL_MACHINE\Subkey. The resource may not function correctly. Make sure that no other processes have open handles to registry keys in this registry sub-tree.
Problem The registry key checkpoint that was imposed by the Cluster service failed because an application or process has an open handle to the registry key or subkey. Solution Close any applications that may have an open handle to the registry key so that the key can be replicated as specified in the resource properties. If you need more information, contact the application vendor. This error may indicate that a thread in a service stopped responding with the handle open while other process threads may be functioning as usual. If you receive this error, troubleshoot the related application or service.
Event ID
Source ClusSvc Description
1034
The disk associated with cluster disk resource ResourceName could not be found. The expected signature of the disk was Signature. If the disk was removed from the cluster, the resource should be deleted. If the disk was replaced, the resource must be deleted and created again in order to bring the disk online. If the disk has not been removed or replaced, it may be inaccessible at this time because it is reserved by another cluster node.
Information The Cluster service tried to mount a physical disk resource in the cluster. The cluster disk driver was not able to find a disk with this signature. The disk may be offline or may have failed. This error may also occur if the drive has been replaced, reformatted, or is currently mounted and reserved by another server cluster node. This error may also occur if the node cannot join an existing cluster where the disk is already mounted. Therefore, this error may be a result of a communication problem or a problem joining an existing cluster. There may not actually be anything actually wrong with the disk.
67
Event ID
Source ClusSvc Description
1035
Problem The Cluster service tried to mount a disk resource in the cluster and was not able to complete the operation. This may be because of a file system problem, a hardware problem, or a drive letter conflict. Solution Look for drive letter conflicts, evidence of file system problems in the system event log, and hardware problems.
Event ID
Source ClusSvc Description
1036
Cluster disk resource ResourceName did not respond to a SCSI maintenance command.
Problem The disk did not respond to a SCSI command. This typically indicates a hardware problem. Solution Check SCSI bus configuration. Check the configuration of SCSI adapters and devices. This may indicate a hardware configuration problem or a device failure.
Event ID
Source ClusSvc Description
1037
Cluster disk resource ResourceName has failed a file system check. Please check your disk configuration.
Problem The Cluster service tried to mount a disk resource in the cluster. A file system check was necessary, and that check failed during the process. Solution Check cables, termination, and device configuration. If the drive has failed, replace the drive, and then restore the data. The volume may be corrupted. The error may also indicate that you must reformat the partition and restore data from a current backup.
Event ID
Source ClusSvc Description
1038
Reservation of cluster disk DiskName has been lost. Please check your system and disk configuration.
Problem The Cluster service had exclusive use of the disk and lost the reservation of the device on the shared SCSI bus. Solution The disk may have gone offline or failed. Another node may have taken control of the disk, or a SCSI bus reset command was issued on the bus that caused a loss of the reservation. A device may be malfunctioning or misconfigured.
Event ID
Source ClusSvc Description
1039
68
Event ID
Source ClusSvc Description
1040
Problem The Cluster service tried to bring the specified generic service resource online. The Cluster service could not find or manage the service. Solution Remove the generic service resource if this service is no longer installed. The parameters for the resource may not be valid. Check the generic service resource properties and confirm that the configuration is correct. Try using the name of the service's corresponding key in the registry instead of the display name of the service.
Event ID
Source ClusSvc Description
1041
Problem The Cluster service tried to bring the specified generic service resource online. The service could not be started at the operating system level. Solution Remove the generic service resource if this service is no longer installed. The parameters for the resource may not be valid. Check the generic service resource properties and confirm that the configuration is correct. Make sure that the service account has not expired, has the correct password, and has the necessary rights for the service to start. Check the system event log for any related errors. Try to run the service as an executable file from a command prompt, if the service has this capability. When you use this method, you may see error messages that you might otherwise miss.
Event ID
Source ClusSvc Description
1042
Problem The service that is associated with the mentioned generic service resource failed. Solution Check the generic service properties and service configuration for errors. Check the system and application event logs for errors. The cluster log may have an entry that corresponds to this event that may contain a useful error code.
Event ID
Source ClusSvc Description
1043
The NetBIOS interface for cluster IP Address resource ResourceName has failed.
Problem The network adapter for the specified IP address resource or the network itself may have experienced a failure. As a result, the IP address is either offline or the group has moved to a surviving node in the cluster. Solution Check the network adapter and network connection for problems. Resolve the network-related problem. Check the system event log for other, related errors.
Event ID
1044
69
Source ClusSvc Description Cluster IP Address resource ResourceName could not create the required NetBIOS interface.
Problem The Cluster service tried to initialize an IP address resource and could not establish a context with NetBIOS. Solution There may be a network adapter problem or a network adapter driver-related problem. Make sure that the adapter is using a current, correct driver. If this is an embedded adapter, contact the OEM to determine if a specific OEM version of the driver is required. If you already have many IP address resources defined, make sure that you have not reached the NetBIOS limit of 64 addresses. If you have IP address resources defined that do not need NetBIOS affiliation, use the IP address private property to disable NetBIOS for the address.
Event ID
Source ClusSvc Description
1045
Cluster IP Address resource ResourceName could not create the required TCP/IP interface.
Problem The Cluster service tried to bring an IP address online. The resource properties may specify a network that is not valid or a malfunctioning adapter. This error may occur if you replace a network adapter with a different model and continue to use an old or inappropriate driver. As a result, the IP address resource cannot be bound to the specified network. Solution Resolve the network adapter problem or change the properties of the IP address resource to show the correct network for the resource.
Event ID
Source ClusSvc Description
1046
Cluster IP Address resource ResourceName cannot be brought online because the subnet mask parameter is invalid. Please check your network configuration.
Problem The Cluster service tried to bring an IP address resource online but could not. The subnet mask for the resource is either blank or not valid. Solution Correct the subnet mask for the resource.
Event ID
Source ClusSvc Description
1047
Cluster IP Address resource ResourceName cannot be brought online because the IP address parameter is invalid. Please check your network configuration.
Problem The Cluster service tried to bring an IP address resource online but could not. The IP address property contains a value that is not valid. This problem may occur if the resource was incorrectly created through an API or the command prompt interface. Solution Correct the IP address properties for the resource.
Event ID
1048
70
Source ClusSvc Description Cluster IP Address resource ResourceName cannot be brought online because the adapter name parameter is invalid. Please check your network configuration.
Problem The Cluster service tried to bring the IP address online. The resource properties may specify a network that is not valid or a malfunctioning adapter. This error may occur if you replace a network adapter with a different model. As a result, the IP address resource cannot be bound to the specified network. Solution Resolve the network adapter problem or change the properties of the IP address resource to show the correct network for the resource.
Event ID
Source ClusSvc Description
1049
Cluster IP Address resource ResourceName cannot be brought online because address Address is already present on the network. Please check your network configuration.
Problem The Cluster service tried to bring an IP address online. The address is already in use on the network and cannot be registered. Therefore, the resource cannot be brought online. Solution Resolve the IP address conflict or use another address for the resource.
Event ID
Source ClusSvc Description
1050
Cluster Network Name resource ResourceName cannot be brought online because the name Name is already present on the network. Please check your network configuration.
Problem The Cluster service tried to bring a network name resource online. The name is already in use on the network and cannot be registered. Therefore, the resource cannot be brought online. Solution Resolve the conflict or use another network name.
Event ID
Source ClusSvc Description
1051
Cluster Network Name resource ResourceName cannot be brought online because it does not depend on an IP Address resource. Please add an IP Address dependency.
Problem The Cluster service tried to bring a network name resource online. The resource either does not have a dependency on a resource of type IP Address, or the IP Address resource is not enabled to use NetBIOS. Solution Resolve the network configuration problem or enable the corresponding IP address resource to use NetBIOS.
Event ID
1052
71
Source ClusSvc Description Cluster Network Name resource ResourceName cannot be brought online because the name could not be added to the system.
Problem The Cluster service tried to bring the network name resource online but the attempt failed. Solution Check the system event log for errors. Check the network adapter configuration and operation. Check the TCP/IP configuration and name resolution methods. Check the WINS servers for possible database problems or static mappings that are not valid.
Event ID
Source ClusSvc Description
1053
Cluster File Share ShareName cannot be brought online because the share could not be created.
Problem The Cluster service tried to bring the share online but the attempt to create the share failed. Solution Make sure that the Server service is started and functioning correctly. Check the path for the share. Check ownership and permissions on the directory. Check the system event log for details. Also, if diagnostic logging is enabled, check the log for an entry that may be related to this failure.
Event ID
Source ClusSvc Description
1054
Problem The share that corresponds to the named file share resource was deleted by a mechanism other than Cluster Administrator. This problem may occur if you select the share with Windows Explorer, and then click Not Shared. Solution Delete shares or take them offline through Cluster Administrator or the command-line program Cluster.exe.
Event ID
Source ClusSvc Description
1055
Cluster File Share resource ResourceName has failed a status check. The error code is ErrorCode.
Problem The Cluster service (through resource monitors) periodically monitors the status of cluster resources. In this case, a file share failed a status check. This may mean that someone tried to delete the share through Windows NT Explorer or Server Manager, instead of through Cluster Administrator. This event may also indicate a problem with the Server service or access to the shared directory. Solution Check the system event log for errors. Check the cluster diagnostic log (if it is enabled) for status codes that may be related to this event. Check the resource properties for correct configuration. Also, make sure that the file share has proper dependencies defined for related resources.
Event ID
1056
72
Source ClusSvc Description The cluster database on the local node is in an invalid state. Please start another node before starting this node.
Problem The cluster database on the local node may be in a default state from the installation process, and the node has not correctly joined with an existing node. Solution Make sure that another node of the same cluster is online before you start this node. After the node joins with another cluster node, the node will receive an updated copy of the official cluster database. That should correct this error.
Event ID
Source ClusSvc Description
1057
The cluster database could not be loaded. The file CLUSDB is either corrupt or missing.
Problem The Cluster service tried to open the CLUSDB registry hive but could not. Therefore, the Cluster service cannot be brought online. Solution See if there is a file named Clusdb in the cluster installation directory. Make sure that the registry file is not held open by any applications and that permissions on the file give the Cluster service access to this file and directory.
Event ID
Source ClusSvc Description
1058
The Cluster Resource Monitor could not load the DLL FileName for resource type ResourceType.
Problem The Cluster service tried to bring a resource online that requires a specific resource DLL for the resource type. The DLL is either missing, corrupted, or an incompatible version. As a result, the resource cannot be brought online. Solution Check the cluster installation directory for the existence of the named resource DLL. Make sure that the DLL exists in the correct directory on both nodes.
Event ID
Source ClusSvc Description
1059
Cluster resource DLL FileName for resource type ResourceType failed to initialize.
Problem The Cluster service tried to load the specified resource DLL and the DLL failed to initialize. The DLL may be corrupted or it may be an incompatible version. As a result, the resource cannot be brought online. Solution Check the cluster installation directory for the existence of the named resource DLL. Make sure the DLL exists in the correct directory on both nodes and is the correct version. If the DLL is Clusres.dll, this is the default resource DLL that is included with the Cluster service. Check to make sure that the version/date stamp is equivalent to or later than the date on the version in the service pack that is in use.
Event ID
1060
73
Source ClusSvc Description Cluster resource DLL FileName for resource type ResourceType returned an invalid function table.
Event ID
Source ClusSvc Description Information
1061
Microsoft Clustering Service successfully formed a cluster on this node. This message indicates that an existing cluster of the same name was not detected on the network, and that this node elected to form the cluster and own access to the quorum disk.
Event ID
Source ClusSvc Description
1062
Event ID
Source ClusSvc Description
1063
Event ID
Source ClusSvc Description
1064
The quorum resource was changed. The old quorum resource could not be marked as obsolete. If there is partition in time, you may lose changes to your database since the node that is down will not be able to get to the new quorum resource.
Problem The administrator changed the quorum disk designation when some cluster nodes were not online. Solution When other cluster nodes try to join the existing cluster, they may not be able to connect to the quorum disk and may not participate in the cluster because their configuration indicates a different quorum device. For any nodes that meet this condition, you may have to use the -fixquorum option to start the Cluster service on these nodes and make configuration changes.
Event ID
Source ClusSvc Description
1065
Problem The Cluster service tried to bring the resource online, but the resource did not reach an online state. The resource may have exhausted the timeout period that is allotted for the resource to reach an online state. Solution Check any parameters that are related to the resource and check the event log for details.
74
Event ID
Source ClusSvc Description
1066
Problem The Cluster service detected corruption on the indicated disk resource and started chkdsk /f on the volume to repair the structure. The Cluster service automatically performs this operation, but only for cluster-defined disk resources (not local disks). Solution Scan the event log for additional errors. The disk corruption may indicate other problems. Check related hardware and devices on the shared bus and check the cables and termination. This error may be a symptom of failing hardware or a deteriorating drive.
Event ID
Source ClusSvc Description
1067
Cluster disk resource ResourceName: has corrupt files. Running ChkDsk /F to repair problems.
Problem The Cluster service detected corruption on the indicated disk resource and started chkdsk /f on the volume to repair the structure. The Cluster service will automatically perform this operation, but only for cluster-defined disk resources (not local disks). Solution Scan the event log for additional errors. The disk corruption may indicate other problems. Check related hardware and devices on the shared bus and check cables and termination. This error may be a symptom of failing hardware or a deteriorating drive.
Event ID
Source ClusSvc Description
1068
Cluster file share resource ResourceName failed to start with error error.
Problem The file share cannot be brought online. The problem may be caused by permissions to the directory or disk where the directory resides. This may also be related to permission problems in the domain. Solution Check to make sure that the Cluster service account has rights to the directory that you want shared. Make sure that a domain controller is accessible on the network. Make sure that dependencies for the share and for other resource in the group are set correctly. Run net helpmsg to determine the meaning of the error code.
Event ID
Source Description
1069
ClusSvc Cluster resource ResourceName failed.
Problem The named resource failed and the Cluster service logged the event. Solution For disk resources, check the device for correct operation. Check cables, termination, and log files on all cluster nodes. For other resources, check resource properties for correct configuration, and check to make sure that dependencies are configured correctly. Check the diagnostic log (if it is enabled) for status codes that correspond to the failure.
75
Event ID
Source ClusSvc Description
1070
The node failed to begin the process of joining the cluster. The error code was ErrorCode.
Problem The cluster node tried to join an existing cluster but could not complete the process. This problem may occur if the node was previously evicted from the cluster. Solution If the node was previously evicted from the cluster, you must remove ,and then reinstall the Cluster service on the affected server.
Event ID
Source ClusSvc Description
1071
Cluster node NodeName attempted to join but was refused. The error code was ErrorCode.
Problem Another node tried to join the cluster, and this node refused the request. Solution If the node was previously evicted from the cluster, you must remove and then reinstall the Cluster service on the affected server. Look in Cluster Administrator to see if the other node is listed as a possible cluster member. Run net helpmsg to determine the meaning of the error code.
Event ID
Source ClusSvc Description
1072
The Microsoft Clustering Service node initialization failed with error error.
Event ID
Source ClusSvc Description
1073
Microsoft Clustering Service was halted to prevent an inconsistency within the cluster. The error code was ErrorCode.
Problem The Cluster service on the affected node was stopped because of some inconsistency between cluster nodes. Solution Check connectivity between systems. This error may indicate configuration problems, hardware problems, or network problems.
Event ID
Source ClusSvc Description
1074
An unrecoverable error occurred on node NodeName while attempting to evict node NodeName from the cluster. The error code was ErrorCode.
Event ID
Source ClusSvc
1075
76
Description
Microsoft Clustering Service failed to set the priority for cluster network NeworkName. The error code was ErrorCode. Communication between cluster nodes may not operate as configured.
Event ID
Source ClusSvc Description
1076
Microsoft Clustering Service failed to register the interface for node NodeName on network NetworkName with the cluster network provider. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1077
Problem The IP address resource depends on a specific network interface as configured in the resource properties. The network interface failed. Solution Check the system event log for errors. Check the network adapter and replace the adapter if it is not working correctly. Check to make sure that the correct adapter driver is loaded for the device and look for newer versions of the driver.
Event ID
Source ClusSvc Description
1078
Event ID
Source ClusSvc Description
1079
The node cannot join the cluster because it cannot communicate with node NodeName over any network configured for internal cluster communication. Check the network configuration of the node and the cluster.
Event ID
Source ClusSvc Description
1080
The Microsoft Clustering Service could not write file FileName. The disk may be low on disk space, or some other serious condition exists.
Problem The Cluster service tried to create a temporary file in the MSCS directory on the quorum disk. Lack of disk space or other factors prevented the operation from completing successfully. Solution Check the quorum drive for available disk space. The file system may be corrupted, or the device may be failing. Check file system permissions to make that the Cluster service account has full access to the drive and directory.
Event ID
1081
77
Source ClusSvc Description The Microsoft Clustering Service failed to save the registry key KeyName when a resource was brought offline. The error code was ErrorCode. Some changes may be lost.
Event ID
Source ClusSvc Description
1082
The Microsoft Clustering Service failed to restore a registry key for resource ResourceName when it was brought online. This error code was ErrorCode. Some changes may be lost.
Event ID
Source ClusSvc Description
1083
Microsoft Clustering Service failed to initialize the Windows Sockets interface. The error code was ErrorCode. Please check the network configuration and system installation.
Event ID
Source ClusSvc Description
1084
Microsoft Clustering Service failed to allocate a necessary system resource. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1085
Microsoft Clustering Service failed to determine the local computer name. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1086
Microsoft Clustering Service failed to access the Cluster Network driver. The error code was ErrorCode. Please check the MSCS installation.
Event ID
Source ClusSvc Description
1087
A required Cluster Network Driver operation failed. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1088
Microsoft Clustering Service failed to open a registry key. The key name was KeyName. The error code was ErrorCode.
78
Event ID
Source ClusSvc Description
1089
Microsoft Clustering Service failed to query a registry value. The value name was Name. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1090
Microsoft Clustering Service failed to perform an operation on the registry. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1091
The Cluster Network endpoint value, value, specified in the registry is not valid. The default value, DefaultValue, will be used.
Event ID
Source ClusSvc Description
1092
Microsoft Clustering Service failed to form the cluster membership. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1093
Node NodeName is not a member of cluster ClusterName. If the name of the node has changed, Microsoft Clustering Service must be reinstalled.
Event ID
Source ClusSvc Description
1094
Microsoft Clustering Service failed to obtain information from the cluster configuration database. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1095
Microsoft Clustering Service failed to obtain information about the network configuration of the node. The error code was ErrorCode. Check that the TCP/IP services are installed.
Event ID
Source ClusSvc
1096
79
Description
Microsoft Clustering Service cannot use network adapter AdapterName because it does not have a valid IP address assigned to it.
Problem The network configuration for the adapter has changed and the Cluster service cannot use the adapter. Solution Check the network configuration. If a DHCP address was used for the primary address of the adapter, the address may have been lost. For best results, use a static address.
Event ID
Source ClusSvc Description
1097
Microsoft Clustering Service did not find any network adapters with valid IP addresses installed in the system. The node will not be able to join a cluster.
Solution Check the network configuration and make sure it agrees with the working node of the cluster. Make sure the same networks are accessible from all systems in the cluster.
Event ID
Source ClusSvc Description
1098
The node is no longer attached to cluster network NetworkName by adapter AdapterName. Microsoft Clustering Service will delete network interface interface from the cluster configuration. The Cluster service noticed a change in network configuration that might be caused by a change of adapter type or by a removal of a network. The network will be removed from the list of available networks.
Information
Event ID
Source ClusSvc Description
1099
Microsoft Clustering Service failed to delete an unused network interface from the cluster configuration. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1100
Microsoft Clustering Service discovered that the node is now attached to cluster network NetworkName by adapter AdapterName. A new cluster network interface will be added to the cluster configuration.
Event ID
Source ClusSvc Description
1101
Microsoft Clustering Service failed to add a network interface to the cluster configuration. The error code was ErrorCode.
Event ID
Source ClusSvc
1102
80
Description
Microsoft Clustering Service discovered that the node is attached to a new network by adapter AdapterName. A new network and a network interface will be added to the cluster configuration.
Event ID
Source ClusSvc Description
1103
Microsoft Clustering Service failed to add a network to the cluster configuration. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1104
Microsoft Clustering Service failed to update the configuration for one of the nodes network interfaces. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1105
Microsoft Clustering Service failed to initialize the RPC services. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1106
The node failed to make a connection to cluster node NodeName over network NetworkName. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1107
Cluster node NodeName failed to make a connection to the node over network NetworkName. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1108
The join of node NodeName to the cluster timed out and was aborted.
Event ID
Source ClusSvc Description
1109
The node was unable to secure its connection to cluster node NodeName. The error code was ErrorCode. Check that both nodes can communicate with their domain controllers.
Problem The Cluster service tried to connect to another cluster node and could not establish a secure connection. This may indicate domain connectivity problems.
81
Solution Check to make sure that the networks are available and functioning correctly. This may be a symptom of larger network problems or domain security problems.
Event ID
Source ClusSvc Description
1110
The node failed to join the active cluster membership. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1111
An unrecoverable error occurred while the node was joining the cluster. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1112
The nodes attempt to join the cluster was aborted by the sponsor node. Check the event log of an active cluster node for more information.
Event ID
Source ClusSvc Description
1113
The nodes attempt to join the cluster was abandoned after too many failures.
Event ID
Source ClusSvc Description
1114
The node failed to make a connection to joining node NodeName over network NetworkName. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1115
An unrecoverable error caused the join of node NodeName to the cluster to be aborted. The error code was ErrorCode.
Problem A node tried to join the cluster but was unable to join. Solution Use the net helpmsg errorcode command to obtain more information about the error that prevented the join operation. For example, error code 1393 indicates that a disk structure is corrupted and not readable. An error code like this may indicate a corrupted quorum disk.
Event ID
Source ClusSvc
1116
82
Description
The network name NetworkName could not be instantiated on its associated cluster network. No DNS servers were available and the network names IP address resource does not have NetBIOS configured for it.
Event ID
Source ClusSvc Description
1117
Event ID
Source ClusSvc Description
1118
Event ID
Source ClusSvc Description
1119
The registration of DNS name DNSname for network name resource ResourceName failed for the following reason: Reason
Event ID
Source ClusSvc Description
1120
The DLL associated with cluster disk resource ResourceName is attempting to change the drive letter parameters from OldParameter to NewParameter in the cluster hive. If your restore database operation failed, this could be caused by the replacement of a quorum drive with a different partition layout from the original quorum drive at the time the database backup was made. If this is the case and you would like to use the new disk as the quorum drive, we suggest that you stop the cluster disk service, change the drive letter(s) of this drive to Letter, start the cluster disk service and retry the restore database procedure.
Event ID
Source ClusSvc Description
1121
The crypto checkpoint for cluster resource ResourceName could not be restored to the container name ContainerName. The resource may not function correctly.
Event ID
Source ClusSvc Description
1122
The node (re)established communication with cluster node NodeName on network NetworkName.
Event ID
Source ClusSvc
1123
83
Description
The node lost communication with cluster node NodeName on network NetworkName.
Event ID
Source ClusSvc Description
1124
Event ID
Source ClusSvc Description
1125
The interface for cluster node NodeName on network NetworkName is operational (up). The node can communicate with all other available cluster nodes on the network.
Event ID
Source ClusSvc Description
1126
The interface for cluster node NodeName on network NetworkName is unreachable by at least one other cluster node attached to the network. The cluster was not able to determine the location of the failure. Look for additional entries in the system event log indicating which other nodes have lost communication with node NodeName. If the condition persists, check the cable connecting the node to the network. Next, check for hardware or software errors in the nodes network adapter. Finally, check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
Event ID
Source ClusSvc Description
1127
The interface for cluster node NodeName on network NetworkName failed. If the condition persists, check the cable connecting the node to the network. Next, check for hardware or software errors in nodes network adapter. Finally, check for failures in any network components to which the node is connected such as hubs, switches, or bridges.
Event ID
Source ClusSvc Description
1128
Cluster network NetworkName is operational (up). All available cluster nodes attached to the network can communicate using it.
Event ID
Source ClusSvc Description
1129
Cluster network NetworkName is partitioned. Some attached cluster nodes cannot communicate with each other over the network. The cluster was not able to determine the location of the failure. Look for additional entries in the system event log indicating which nodes have lost communication. If the
84
condition persists, check for failures in any network components to which the nodes are connected such as hubs, switches, or bridges. Also check for hardware or software errors in the adapters that attach the nodes to the network.
Event ID
Source ClusSvc Description
1130
Cluster network NetworkName is down. None of the available nodes can communicate using this network. If the condition persists, check for failures in any network components to which the nodes are connected such as hubs, switches, or bridges. Next, check the cables connecting the nodes to the network. Finally, check for hardware or software errors in the adapters that attach the nodes to the network.
Event ID
Source ClusSvc Description
1131
Event ID
Source ClusSvc Description
1132
Event ID
Source ClusSvc Description
1133
Event ID
Source ClusSvc Description
1134
Event ID
Source ClusSvc Description
1135
Cluster node NodeName was removed from the active cluster membership. The Clustering Service may have been stopped on the node, the node may have failed, or the node may have lost communication with the other active cluster nodes.
Event ID
Source ClusSvc
1136
85
Description
Cluster node NodeName failed a critical operation. It will be removed from the active cluster membership. Check that the node is functioning properly and that it can communicate with the other active cluster nodes.
Event ID
Source ClusSvc Description Information
1137
Event Log replication queue QueueName is full. Number event(s) is(are) discarded of total size. The Cluster Service replicates events from the local node's event log to the event logs of other nodes in the same cluster. Events to be replicated are queued and replicated when possible. If a node cannot process events quickly or if the node is offline, this event may appear in the event log when the maximum queue length is exceeded. By default, event log replication is enabled. For information about configuring event log replication click the following article number to view the article in the Microsoft Knowledge Base: 224969 HOW TO: Configure Event Log Replication in Windows 2000 Cluster Servers
Event ID
Source ClusSvc Description
1138
Cluster File Share ShareName could not be made a DFS root share because a DFS root could not be created in this node. This could be caused possibly because a DFS root already exists in this node. If this is the case and you really wish to make file share ShareName a DFS root share, you may try to first open the DFS Manager UI in the Administrative Tools folder in the Control Panel, delete any existing DFS root, and then bring the file share ShareName online.
Event ID
Source ClusSvc Description
1139
Physical Disk resource received a notification that the drive letter on the quorum drive was changed from OldLetter: to NewLetter.
Event ID
Source ClusSvc Description
1140
A DNS query or update operation started by the Network Name resource did not complete. To avoid further problems, the Network Name resource has stopped the host Resource Monitor process. If this condition recurs, move the resource into a separate Resource Monitor so it cant affect other resources. Then, consult your network administrator to resolve the DNS problem.
Event ID
Source ClusSvc
1141
86
Description
Cluster DFS Root RootName could not initialize the DFS service on this node. This is preventing the DFS Root resource from coming online. If the Distributed File System is not listed as one of the services that have been started on this node when you issue the command net start, then you may manually try to start the DFS service by using the command net start dfs and then try to bring the DFS root resource online.
Event ID
Source ClusSvc Description
1142
Cluster DFS Root RootName could not be brought online on this node. The error code is ErrorCode.
Event ID
Source ClusSvc Description
1143
The Microsoft Clustering Service is attempting to stop the active clustering services on all other cluster nodes as a part of the restore database operation. After the restoration procedure is successful on this node, you have to manually restart the clustering services on all other cluster nodes.
Event ID
Source ClusSvc Description
1144
Microsoft Clustering Service failed to register the network NetworkName with the cluster network provider. The error code was ErrorCode.
Event ID
Source ClusSvc Description
1145
Cluster resource ResourceName timed out. If the pending timeout is too short for this resource, then you should consider increasing the pending timeout value.
Event ID
Source ClusSvc Description
1146
The cluster resource monitor died unexpectedly, an attempt will be made to restart it.
Event ID
Source ClusSvc Description
1147
The Microsoft Clustering Service encountered a fatal error. The vital quorum log file FileName could not be found. If you have a backup of the quorum log file, you may try to start the cluster service by entering clussvc -debug -noquorumlogging at a command window, copy the backed up quorum log file to the MSCS directory in the quorum drive, stop the cluster service, and restart the cluster service normally using the net start clussvc command. If you do not have a backup of the quorum log file, you may try to start the
87
cluster service as clussvc -debug -resetquorumlog and this will attempt to create a new quorum log file based on possibly stale information in the cluster hive. You may then stop the cluster service and restart it normally using the net start clussvc command.
Event ID
Source ClusSvc Description
1148
The Microsoft Clustering Service encountered a fatal error. The vital quorum log file FileName is corrupt. If you have a backup of the quorum log file, you may try to start the cluster service by entering clussvc -debug -noquorumlogging at a command window, copy the backed up quorum log file to the MSCS directory in the quorum drive, stop the cluster service, and restart the cluster service normally using the net start clussvc command. If you do not have a backup of the quorum log file, you may try to start the cluster service as clussvc -debug -resetquorumlog and this will attempt to create a new quorum log file based on possibly stale information in the cluster hive. You may then stop the cluster service and restart it normally using the net start clussvc command.
The device, \Device\ScsiPort2, did not respond within the timeout period.
Problem An I/O request was sent to a SCSI device and was not serviced within acceptable time. The device timeout was logged by this event. Solution You may have a device or controller problem. Check SCSI cables, termination, and adapter configuration. Too many recurrences of this event message may indicate a serious problem with potential for data loss or corruption. For help troubleshooting this problem, contact your hardware vendor. For more information, click the following article number to view the article in the Microsoft Knowledge Base: 259237 Troubleshooting Event ID 9, 11, and 15 on Cluster Servers
Event ID
11
Source DriverName
Description The driver detected a controller error on Device\ScsiPort1.
Problem An input/output (I/O) request was sent to a SCSI device, but the controller returned an error. This is a more serious error than an I/O timeout. This error may occur because there is a bad or corrupted device driver, a hardware problem, a malfunctioning device, poor cabling, or termination problems. The actual error comes from the driver or hardware itself, not the Cluster service.
88
Solution Check the version of the SCSI controller BIOS and the device firmware revision. Contact the manufacturer for the latest updates. Check the SCSI device driver version. The SCSI driver is located in the %SystemRoot%\System32\Drivers folder. Look for the version in the file properties, and check whether the SCSI manufacturer has a newer version. For more information, click the following article number to view the article in the Microsoft Knowledge Base: 259237 Troubleshooting Event ID 9, 11, and 15 on Cluster Servers
Event ID
Source Disk Description
15
Problem The device is not ready. This may be the result of SCSI host adapter configuration problems or other problems. Contact the manufacturer for updated firmware, drivers, or known problems. This may also indicate a malfunctioning device. The error occurs at the device level; the cluster software is not likely to be involved. This event may also be accompanied by the following error events from the Cluster service: Event ID 1038, Event ID 1036, Event ID 1069. Solution Check the version of the SCSI controller BIOS and the device firmware revision. Contact the manufacturer for the latest updates. Check the SCSI device driver version. The SCSI driver is located in the %SystemRoot%\System32\Drivers folder. Look for the version in the file properties, and check whether the SCSI manufacturer has a newer version. Examine the problem device. For more information, click the following article number to view the article in the Microsoft Knowledge Base: 259237 Troubleshooting Event ID 9, 11, and 15 on Cluster Servers
Event ID
Source Disk Description
51
Problem The event is a generic error for any type of error that is encountered when paged I/O is used. In a paging operation, the operating system either swaps a page of memory from memory to disk or retrieves a page of memory from disk to memory. It is part of the memory management of Windows. This event does not mean this error involved a paging file. It may be in response to an error that occurred during memory mapped file I/O. Solution For more information, click the following article number to view the article in the Microsoft Knowledge Base: 244780 Information About Event ID 51
Event ID
Source W3SVC Description
101
The server was unable to add the virtual root / for the directory path because of the following error: The system cannot find the path specified. The data is the error.
89
Problem The World Wide Web Publishing service could not create a virtual root for the Microsoft Internet Information Services (IIS) Virtual Root resource. The directory path may have been deleted. Solution Re-create or restore the directory and contents. Check the resource properties for the IIS Virtual Root resource and make sure that the path is correct. This problem may occur if you had an IIS virtual root resource defined and then you removed the Cluster service but did not first delete the resource. In this case, you can use the Internet Service Manager to evaluate and change virtual root properties.
Event ID
Source DHCP Description
1004
DHCP IP address lease IP address for the card with network address media access control Address has been denied.
Problem This system uses a DHCP-assigned IP address for a network adapter. The system tried to renew the leased address, and the DHCP server denied the request. The address may already be allocated to another system. The DHCP server may also have a problem. Network connectivity may be affected by this problem. Solution Correct the DHCP server problems or assign a static IP address. For best results with a cluster, use statically assigned IP addresses.
Event ID
Source DHCP Description
1005
DHCP failed to renew a lease for the card with network address MAC Address. The following error occurred: The semaphore timeout period has expired.
Problem This system uses a DHCP-assigned IP address for a network adapter. The system tried to renew the leased address and could not renew the lease. Network operations on this system may be affected. Solution There may be a connectivity problem that is preventing access to the DHCP server that leased the address, or the DHCP server may be offline. For best results with a cluster, use statically assigned IP addresses.
Event ID
Source Server Description
2019
The server was unable to allocate from the system non-paged pool because the pool was empty.
Problem A device driver or other component is using too much non-paged pool memory, so new allocations cannot succeed for other tasks that require it. Solution Contact Microsoft Product Support Services to help identify the component with the problem. This may be a symptom of a non-paged pool memory leak.
Event ID
Source Server
2020
90
Description
The server was unable to allocate from the system paged pool because the pool was empty.
Problem A device driver or other component is using too much paged pool memory, so new allocations cannot succeed for other tasks that require it. Solution Contact Microsoft Product Support Services to help identify the component with the problem. This may be a symptom of a paged pool memory leak.
Event ID
Source Server Description
2511
The server service was unable to recreate the share ShareName because the directory path no longer exists.
Problem The Server service tried to create a share by using the specified directory path. This problem may occur if you create a share (outside Cluster Administrator) on a cluster shared device. If the device is not exclusively available to this computer, the Server service cannot create the share. Also, the directory may no longer exist or there may be RPC-related problems. Solution Create a shared resource through Cluster Administrator, or correct the problem of the missing directory. Check file dates and version numbers of RPC files in the System32 directory. Make sure that the information matches that from the file in the service pack in use or in any hotfixes that you have applied.
Event ID
Source TCPIP Description
4199
The system detected an address conflict for IP address IPAddress with the system having network hardware address MediaAccessControlAddress. Network operations on this system may be disrupted as a result.
Problem Another system on the network may be using one of the addresses that is configured on this computer. Solution Resolve the IP address conflict. Check network adapter configuration and any IP address resources that are defined in the cluster.
Event ID
5719
Source Netlogon Description No Windows NT Domain controller is available for domain domain. (This event is expected and can be ignored when booting with the No Net hardware profile.) The following error occurred: There are currently no logon servers available to service the logon request.
Problem A domain controller for the domain could not be contacted. As a result, authentication of accounts could not be completed. This problem may occur if the network is disconnected or disabled through system configuration. Solution Resolve the connectivity problem with the domain controller, and then restart the system.
Event ID
7000
91
Source Service Control Manager Description The Cluster service failed to start because of the following error: The service did not start because of a logon failure.
Problem The service control manager tried to start a service (possibly ClusSvc). It could not authenticate the service account. This error frequently occurs with Event 7013. Solution The service account could not be authenticated. This may be because the service control manager failed when it tried to contact a domain controller, or because account credentials are not valid. Check the service account name and password and make sure that the account is available and that credentials are correct. You might also try running the Cluster service from a command prompt (if you are logged on as an administrator) by changing to the %SystemRoot%\Cluster directory (or the directory where you installed the software), and then typing clussvc -debug. If the service starts and runs correctly, stop it by pressing CTRL+C, and then troubleshoot the service account problem. This error may also occur if network connectivity is disabled through the system configuration or hardware profile. The Cluster service requires network connectivity.
Event ID
7013
Source Service Control Manager Description Logon attempt with current password failed with the following error: There are currently no logon servers available to service the logon request.
Information The description for this error message may vary based on the actual error. For example, another error that may be listed in the event detail might be: Logon Failure: unknown username or bad password. Problem The service control manager tried to start a service (possibly ClusSvc). It was not able to authenticate the service account with a domain controller. Solution The service account may be in another domain, or this system is not a domain controller. The node can be a non-domain controller, but the node must have access to a domain controller in the domain in addition to the domain that the service account belongs to. Inability to contact the domain controller may be because of a problem with the server, network, or other factors. This problem is not related to the cluster software, and it must be resolved before you start the cluster software. This error may also occur if network connectivity is disabled through the system configuration or hardware profile. The Cluster service requires network connectivity.
Event ID
7023
Source Service Control Manager Description The Cluster Server service terminated with the following error: The quorum log could not be created or mounted successfully.
Problem The Cluster service tried to start but could not gain access to the quorum log on the quorum disk. This may be because of problems gaining access to the disk or problems joining a cluster that has already formed. Solution Check the disk and quorum log for problems. For more information, check the cluster log file. There may also be other events in the system event log that provide more information.
92
000003f0.000005c4::2002/09/19-14:53:12.728 000003f0.000005c4::2002/09/19-14:53:12.728 [CS] Cluster Service started Cluster Node Version 3.2195 000003f0.000005c4::2002/09/19-14:53:12.728 OS Version 5.0.2195 - Service Pack 2 (AS)
Initialization The Cluster service next loads the cluster registry hive (CLUSDB). If the hive is already loaded from a previous execution of the Cluster service, the service requests that the hive be unloaded and loaded again.
000003f0.000004a4::2002/09/19-14:53:12.728 [CS] Service Starting... 000003f0.000004a4::2002/09/19-14:53:12.728 [EP] Initialization... 000003f0.000004a4::2002/09/19-14:53:12.728 [DM]: Initialization 000003f0.000004a4::2002/09/19-14:53:12.744 [DM] DmInitialize: The hive was loaded- rollback, unload and reload again
93
000003f0.000004a4::2002/09/19-14:53:12.744 000003f0.000004a4::2002/09/19-14:53:12.744 the hive 000003f0.000004a4::2002/09/19-14:53:12.744 from C:\WINNT\cluster\CLUSDB 000003f0.000004a4::2002/09/19-14:53:12.978 000003f0.000004a4::2002/09/19-14:53:12.978 created 000003f0.000004a4::2002/09/19-14:53:12.978
[DM] DmpRestartFlusher: Entry [DM] DmpUnloadHive: unloading [DM]: Loading cluster database [DM] DmpStartFlusher: Entry [DM] DmpStartFlusher: thread [NM] Initializing...
Determining the Node ID The Cluster service determines the ID number of the node and the node name by examining the registry on the current server.
000003f0.000004a4::2002/09/19-14:53:12.978 000003f0.000004a4::2002/09/19-14:53:12.978 000003f0.000004a4::2002/09/19-14:53:12.978 1 (NodeA) 000003f0.000004a4::2002/09/19-14:53:12.978 000003f0.000004a4::2002/09/19-14:53:12.978 interfaces. 000003f0.000004a4::2002/09/19-14:53:12.994 000003f0.000004a4::2002/09/19-14:53:12.994 000003f0.000004a4::2002/09/19-14:53:12.994 000003f0.000004bc::2002/09/19-14:53:12.994 000003f0.000004a4::2002/09/19-14:53:12.994 000003f0.000004a4::2002/09/19-14:53:12.994 Entry. [NM] Local node name = NodeA. [NM] Local node ID = 1. [NM] Creating object for node [NM] Initializing networks. [NM] Initializing network [NM] Initialization complete. [FM] Starting worker thread... [API] Initializing [FM] Worker thread running [LM] :LmInitialize Entry. [LM] :TimerActInitialize
Determining the Cluster Service Account That the Node Uses The account that the Cluster service runs under must be the same on all nodes in the same cluster. If you are looking at cluster log files from more than one node in a cluster, an entry like the following should appear and be the same in each log.
000003f0.000004a4::2002/09/19-14:53:12.994 [CS] Service Domain Account = MyDomain\clussvc 000003f0.000004a4::2002/09/19-14:53:12.994 [CS] Initializing RPC server.
Trying to Join First The Cluster service always tries to join an existing cluster. It will try to join through the known cluster IP address and cluster name, and through the name and address of each cluster node, through each known interface.
000003f0.000004a4::2002/09/19-14:53:13.041 cluster MyCluster 000003f0.000004a4::2002/09/19-14:53:13.041 connect to sponsor 169.254.87.239 000003f0.00000408::2002/09/19-14:53:13.041 to sponsor us. 000003f0.000004a4::2002/09/19-14:53:13.041 connect to sponsor 10.0.0.139 000003f0.000004a4::2002/09/19-14:53:13.041 connect to sponsor NodeB 000003f0.000005b0::2002/09/19-14:53:13.041 sponsor us. [INIT] Attempting to join [JOIN] Spawning thread to [JOIN] Asking 169.254.87.239 [JOIN] Spawning thread to [JOIN] Spawning thread to [JOIN] Asking 10.0.0.139 to
94
000003f0.000005a0::2002/09/19-14:53:13.041 us. 000003f0.000004a4::2002/09/19-14:53:14.041 connect to sponsor 10.0.0.53 000003f0.000004a4::2002/09/19-14:53:14.041 threads to terminate. 000003f0.000005b4::2002/09/19-14:53:14.041 sponsor us.
[JOIN] Asking NodeB to sponsor [JOIN] Spawning thread to [JOIN] Waiting for all connect [JOIN] Asking 10.0.0.53 to
For each failed connection attempt to an unavailable sponsor, the failing JOIN thread logs error 1722. This error code indicates "The RPC server is unavailable." This is ordinary and does not specifically indicate RPC problems.
000003f0.000005a0::2002/09/19-14:53:15.322 [JOIN] available (JoinVersion), status=1722. 000003f0.000005a0::2002/09/19-14:53:15.322 [JOIN] sponsor NodeB is invalid, status 1722. 000003f0.00000408::2002/09/19-14:53:45.056 [JOIN] is not available (JoinVersion), status=1722. 000003f0.000005b0::2002/09/19-14:53:45.056 [JOIN] not available (JoinVersion), status=1722. 000003f0.00000408::2002/09/19-14:53:45.056 [JOIN] sponsor 169.254.87.239 is invalid, status 1722. 000003f0.000005b0::2002/09/19-14:53:45.056 [JOIN] sponsor 10.0.0.139 is invalid, status 1722. 000003f0.000005b4::2002/09/19-14:53:46.056 [JOIN] not available (JoinVersion), status=1722. 000003f0.000005b4::2002/09/19-14:53:46.056 [JOIN] sponsor 10.0.0.53 is invalid, status 1722. 000003f0.000004a4::2002/09/19-14:53:46.056 [JOIN] have terminated. 000003f0.000004a4::2002/09/19-14:53:46.056 [JOIN] any sponsor node. Sponsor NodeB is not JoinVersion data for Sponsor 169.254.87.239 Sponsor 10.0.0.139 is JoinVersion data for JoinVersion data for Sponsor 10.0.0.53 is JoinVersion data for All connect threads Unable to connect to
No Join Sponsors Are Available If the join attempt fails, the Cluster service concludes that the cluster is not already online and tries to establish the cluster on the network. This is a multiplestep process. The Cluster service loads the local copy of the cluster registry hive CLUSDB in read-only mode. It enumerates what the service needs to identify the quorum disk, and it tries to establish ownership of the quorum device through a reservation. The first indication that the join process has failed is the following entry with error code 53. When you run net helpmsg 53, you find that this code signifies "The network path was not found." If the error code is something other than 53, the error may indicate another reason why the join process failed.
000003f0.000004a4::2002/09/19-14:53:46.056 [INIT] Failed to join cluster, status 53 000003f0.000004a4::2002/09/19-14:53:46.056 [INIT] Attempting to form cluster MyCluster 000003f0.000004a4::2002/09/19-14:53:46.056 [API] Online read only 00000408.000005b0::2002/09/19-14:53:46.087 [RM] Main: Initializing. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Creating group a83c098955b5-4563-97a3-f677cf78966e 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Initializing group a83c0989-55b5-4563-97a3-f677cf78966e from the registry. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Name for Group a83c098955b5-4563-97a3-f677cf78966e is 'Cluster Group'. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Group a83c0989-55b5-456397a3-f677cf78966e contains Resource b96c7b54-4fb4-43ef-8104-c79c743bd9b7.
95
000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Creating resource b96c7b54-4fb4-43ef-8104-c79c743bd9b7 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Initializing resource b96c7b54-4fb4-43ef-8104-c79c743bd9b7 from the registry. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Name for Resource b96c7b54-4fb4-43ef-8104-c79c743bd9b7 is 'Cluster IP Address'. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] FmpAddPossibleEntry: adding node 1 as possible host for resource b96c7b54-4fb4-43ef-8104c79c743bd9b7. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] FmpQueryResTypeInfo: Calling FmpAddPossibleNodeToList for restype IP Address 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] FmpAddPossibleNodeToList: adding node 1 to resource type's possible node list 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] FmpAddPossibleNodeToList: Warning, node 2 not found 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] All dependencies for resource b96c7b54-4fb4-43ef-8104-c79c743bd9b7 created.
. .
Note
. . 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Initializing resource e8d3afbd-13da-41ec-9fb7-96755a008607 from the registry. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Name for Resource e8d3afbd-13da-41ec-9fb7-96755a008607 is 'Disk Q: R: S:'. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] FmpAddPossibleEntry: adding node 1 as possible host for resource e8d3afbd-13da-41ec-9fb796755a008607. . . 000003f0.000004a4::2002/09/19-14:53:46.165 [FMX] Found the quorum resource e8d3afbd-13da-41ec-9fb7-96755a008607. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] All dependencies for resource e8d3afbd-13da-41ec-9fb7-96755a008607 created. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] arbitrate for quorum resource id e8d3afbd-13da-41ec-9fb7-96755a008607. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Initializing resource e8d3afbd-13da-41ec-9fb7-96755a008607 from the registry. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] Name for Resource e8d3afbd-13da-41ec-9fb7-96755a008607 is 'Disk Q: R: S:'. 000003f0.000004a4::2002/09/19-14:53:46.165 [FM] FmpRmCreateResource: creating resource e8d3afbd-13da-41ec-9fb7-96755a008607 in shared resource monitor 000003f0.000004a4::2002/09/19-14:53:46.212 [FM] FmpRmCreateResource: created resource e8d3afbd-13da-41ec-9fb7-96755a008607, resid 773848 00000408.00000390::2002/09/19-14:53:46.228 Physical Disk: PnP window created successfully. 000003f0.000004a4::2002/09/19-14:53:46.228 [MM] MmSetQuorumOwner(1,1), old owner 0.
Reading from the Quorum Disk The Cluster service has identified the quorum disk and will now try to arbitrate for it. Arbitration occurs as the Cluster services issues a reset to the device to clear any reservations and watches to see if any attached system issues another reservation. If not, the Cluster service issues a reservation and brings the cluster online. If this critical step fails, the Cluster service ends after it logs events in the system event log to indicate the problem it encountered when it tried to obtain
96
access to the disk. Of course, if the Cluster service can openly attach to the disk, no other nodes have it reserved.
00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb]Wait for offline thread to complete... 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb]------- DisksArbitrate -------. 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb]DisksOpenResourceFileHandle: Attach successful. 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb]DisksOpenResourceFileHandle: CreateFile successful. 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb] Arbitration Parameters (1 9999). 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb] Issuing GetSectorSize on signature 5005e134. 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb] GetSectorSize completed, status 0. 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb] ArbitrationInfo.SectorSize is 512 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb] Issuing GetPartInfo on signature 5005e134. 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb] GetPartInfo completed, status 0. 00000408.000005c8::2002/09/19-14:53:46.228 Physical Disk <Disk Q: R: S:>: [DiskArb]Successful read (sector 12) [:0] (0,00000000:00000000). 00000408.00000390::2002/09/19-14:53:46.228 Physical Disk: AddVolume : \\?\Volume{637951b8-adcd-11d5-a236-806d6172696f}\ ' ', 18 (559216) 00000408.000005c8::2002/09/19-14:53:46.243 Physical Disk <Disk Q: R: S:>: [DiskArb]Successful write (sector 11) [NodeA:0] (0,5aee4f06:01c25fec). 00000408.000005c8::2002/09/19-14:53:46.243 Physical Disk <Disk Q: R: S:>: [DiskArb]Successful read (sector 12) [:0] (0,00000000:00000000). 00000408.000005c8::2002/09/19-14:53:46.259 Physical Disk <Disk Q: R: S:>: [DiskArb]Successful write (sector 12) [NodeA:0] (0,5aee4f06:01c25fec). 00000408.000005c8::2002/09/19-14:53:46.259 Physical Disk <Disk Q: R: S:>: [DiskArb]Successful read (sector 11) [NodeA:0] (0,5aee4f06:01c25fec). 00000408.000005c8::2002/09/19-14:53:46.259 Physical Disk <Disk Q: R: S:>: [DiskArb] Issuing Reserve on signature 5005e134. 00000408.000005c8::2002/09/19-14:53:46.259 Physical Disk <Disk Q: R: S:>: [DiskArb] Reserve completed, status 0. . . 00000408.000006d8::2002/09/19-14:53:46.259 Physical Disk <Disk Q: R: S:>: [DiskArb]------- DisksOnline -------. . . 00000408.000006d8::2002/09/19-14:53:46.259 Physical Disk <Disk Q: R: S:>: [DiskArb]DisksOpenResourceFileHandle: Attach successful. 000003f0.000004a4::2002/09/19-14:53:46.259 [FM] FmpPropagateResourceState: resource e8d3afbd-13da-41ec-9fb7-96755a008607 pending event. 000003f0.000004a4::2002/09/19-14:53:46.259 [FM] FmpRmOnlineResource: Resource e8d3afbd-13da-41ec-9fb7-96755a008607 pending 000003f0.000004a4::2002/09/19-14:53:46.259 [FM] FmpRmOnlineResource: Returning. Resource e8d3afbd-13da-41ec-9fb7-96755a008607, state 129, status 997. 00000408.000006d8::2002/09/19-14:53:46.259 Physical Disk <Disk Q: R: S:>: [DiskArb]DisksOpenResourceFileHandle: CreateFile successful. 00000408.000006d8::2002/09/19-14:53:46.259 Physical Disk <Disk Q: R: S:>: Mountie[0]: 1, let=?, start=7E00, len=15145800.
97
00000408.000006d8::2002/09/19-14:53:46.259 Physical Mountie[1]: 2, let=?, start=1514D600, len=1514D600. 00000408.000006d8::2002/09/19-14:53:46.259 Physical Mountie[2]: 3, let=?, start=2A29AC00, len=15925800. 00000408.000006d8::2002/09/19-14:53:46.259 Physical Online 00000408.000006d8::2002/09/19-14:53:46.259 Physical MountieVerify: ClusReg-DiskInfo selected. 00000408.000006d8::2002/09/19-14:53:46.259 Physical MountieVerify: DriveLetters mask is now 00070000.
Disk <Disk Q: R: S:>: Disk <Disk Q: R: S:>: Disk <Disk Q: R: S:>: Disk <Disk Q: R: S:>: Disk <Disk Q: R: S:>:
Enumerating Drives The Cluster service has access to the quorum disk, and it will now enumerate disk devices and drive letters.
00000408.00000390::2002/09/19-14:53:46.493 Physical Disk: AddVolume : \\?\Volume{637951c1-adcd-11d5-a236-806d6172696f}\ 'C', 7 (560992) 00000408.00000390::2002/09/19-14:53:46.525 Physical Disk: AddVolume : \\?\Volume{5bd61ef7-ae06-11d5-a766-00508bb08906}\ 'D', 7 (555128) 00000408.00000390::2002/09/19-14:53:46.540 Physical Disk: AddVolume : \\?\Volume{ddb339f6-b76e-11d5-a706-806d6172696f}\ 'E', 7 (555384) 00000408.00000390::2002/09/19-14:53:46.572 Physical Disk: AddVolume : \\?\Volume{919587dc-c7ee-11d5-bb36-806d6172696f}\ 'F', 7 (561664) 00000408.00000390::2002/09/19-14:53:46.603 Physical Disk: AddVolume : \\?\Volume{7051f405-ae9f-11d5-a767-00508bb08906}\ 'Q', 7 (562112) 00000408.00000390::2002/09/19-14:53:46.618 Physical Disk: AddVolume : \\?\Volume{7051f407-ae9f-11d5-a767-00508bb08906}\ 'R', 7 (562520) 00000408.000006d8::2002/09/19-14:53:46.650 Physical Disk <Disk Q: R: S:>: DriveLetterIsAlive called for Online check 00000408.000006d8::2002/09/19-14:53:46.650 Physical Disk <Disk Q: R: S:>: DiskspCheckPath: Open Q:\MSCS\chkAFB3.tmp succeeded. 00000408.000006d8::2002/09/19-14:53:46.650 Physical Disk <Disk Q: R: S:>: DiskspCheckPath: Open Q:\MSCS\quolog.log succeeded. 00000408.00000390::2002/09/19-14:53:46.650 Physical Disk: AddVolume : \\?\Volume{7051f409-ae9f-11d5-a767-00508bb08906}\ 'S', 7 (562928) 00000408.00000390::2002/09/19-14:53:46.665 Physical Disk: AddVolume : \\?\Volume{637951bc-adcd-11d5-a236-806d6172696f}\ 'T', 7 (563336) 00000408.00000390::2002/09/19-14:53:46.681 Physical Disk: AddVolume : \\?\Volume{637951bd-adcd-11d5-a236-806d6172696f}\ 'U', 7 (563744) 00000408.00000390::2002/09/19-14:53:46.697 Physical Disk: AddVolume : \\?\Volume{637951be-adcd-11d5-a236-806d6172696f}\ 'V', 7 (564152) 00000408.000006d8::2002/09/19-14:53:46.697 Physical Disk <Disk Q: R: S:>: DisksWriteTestFile: Creating test file (Q:\zClusterOnlineChk.tmp) 00000408.00000390::2002/09/19-14:53:46.712 Physical Disk: AddVolume : \\?\Volume{637951bf-adcd-11d5-a236-806d6172696f}\ 'W', 7 (565752) 00000408.00000390::2002/09/19-14:53:46.915 Physical Disk: AddVolume: GetPartitionInfo(\??\Volume{019a31d0-b120-11d5-a4b6-806d6172696f}), error 1 00000408.000006d8::2002/09/19-14:53:46.993 Physical Disk <Disk Q: R: S:>: DisksWriteTestFile: Returns status 0
98
00000408.000006d8::2002/09/19-14:53:47.009 Physical Disk <Disk Q: R: S:>: DisksWriteTestFile: Creating test file (R:\zClusterOnlineChk.tmp) 00000408.000006d8::2002/09/19-14:53:47.259 Physical Disk <Disk Q: R: S:>: DisksWriteTestFile: Returns status 0 00000408.000006d8::2002/09/19-14:53:47.275 Physical Disk <Disk Q: R: S:>: DisksWriteTestFile: Creating test file (S:\zClusterOnlineChk.tmp) 00000408.000006d8::2002/09/19-14:53:47.525 Physical Disk <Disk Q: R: S:>: DisksWriteTestFile: Returns status 0 00000408.000006d8::2002/09/19-14:53:47.525 Physical Disk <Disk Q: R: S:>: DisksMountDrives: letter mask is 00070000. 00000408.000006d8::2002/09/19-14:53:47.525 [RM] RmpSetResourceStatus, Posting state 2 notification for resource <Disk Q: R: S:>
Reading from Quolog.log File The quorum resource is online. Now the Cluster service must open the quorum log file, Quolog.log, verify that the file is valid, and determine what changes must be made to its local copy of the CLUSDB cluster registry. If the number of changes is significant, the Cluster service loads the current copy of CLUSDB that exists on the quorum disk.
000003f0.000006d8::2002/09/19-14:53:47.525 [DM] DmpQuoObjNotifyCb: Own quorum resource, try open the quorum log 000003f0.000006d8::2002/09/19-14:53:47.525 [DM] DmpQuoObjNotifyCb: the name of the quorum file is Q:\MSCS\quolog.log 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogCreate : Entry FileName=Q:\MSCS\quolog.log MaxFileSize=0x00400000 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogpCreate : Entry 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogpMountLog : Entry pLog=0x000a5828 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogpMountLog::Quorumlog File size=0x00008000 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogpMountLog::reading 1024 bytes at offset 0x00000400 . . 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogpValidateChkPoint: Entry 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogpValidateChkPoint: Exit, returning 0x00000000 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogpMountLog : NextLsn=0x000011b8 FileAlloc=0x00001400 ActivePageOffset=0x00001000 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] LogpCreate : Exit with success 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] AddTimerActivity:: hTimer = 0x000003dc pfnTimerCb=0x0107fb50 dwInterval(in msec)=120000 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] AddTimerActivity:: Interval(high)=0xffffffff Interval(low)=0xb8797400 000003f0.000006d8::2002/09/19-14:53:47.525 [LM] AddTimerActivity:: returns 0x00000000 000003f0.000004b4::2002/09/19-14:53:47.525 [LM] :ReSyncTimerHandles Entry.
99
000003f0.000006d8::2002/09/19-14:53:47.540 [LM] LogCreate : Exit *LastLsn=0x000011b8 Log=0x000a5828 000003f0.000004b4::2002/09/19-14:53:47.540 [LM] :ReSyncTimerHandles Exit gdwNumHandles=2 000003f0.000006d8::2002/09/19-14:53:47.540 [DM] DmpQuoObjNotifyCb: Quorum log opened 000003f0.000006d8::2002/09/19-14:53:47.540 [LM] AddTimerActivity:: hTimer = 0x000003e0 pfnTimerCb=0x0106d31a dwInterval(in msec)=14400000 000003f0.000006d8::2002/09/19-14:53:47.540 [LM] AddTimerActivity:: Interval(high)=0xffffffde Interval(low)=0x78ee6000 000003f0.000006d8::2002/09/19-14:53:47.540 [LM] AddTimerActivity:: returns 0x00000000 000003f0.000004b4::2002/09/19-14:53:47.540 [LM] :ReSyncTimerHandles Entry. 000003f0.000006d8::2002/09/19-14:53:47.540 [FM] HandleResourceTransition: Resource Name = e8d3afbd-13da-41ec-9fb7-96755a008607 old state=129 new state=2 000003f0.000004b4::2002/09/19-14:53:47.540 [LM] :ReSyncTimerHandles Exit gdwNumHandles=3 000003f0.000006d8::2002/09/19-14:53:47.540 [FM] FmpPropagateResourceState: signalling the ghQuoOnlineEvent
Checking for a Quorum Tombstone Entry If an administrator has designated another disk to be the quorum disk while this node was offline, the quorum disk will have a tombstone entry that indicates what the new quorum disk is. In this example, there is no tombstone entry, so the process continues as usual.
000003f0.000004a4::2002/09/19-14:53:47.540 Entry 000003f0.000006d8::2002/09/19-14:53:47.540 waiting type 0 context 8 000003f0.000006d8::2002/09/19-14:53:47.540 wait on Type 0 000003f0.000006d8::2002/09/19-14:53:47.540 successful, lock granted to 1 000003f0.000004a4::2002/09/19-14:53:47.540 returning 0x00000000 . . 000003f0.000004a4::2002/09/19-14:53:47.540 0x00000000 000003f0.000004a4::2002/09/19-14:53:47.540 LSN=0x00000000, returning 0x00000000 [DM] DmpChkQuoTombStone [GUM] GumSendUpdate: Locker
Checking the Time to Load a New CLUSDB After the Cluster service makes changes to CLUSDB (or loads a complete current version from the quorum disk), the Cluster service destroys the temporary objects and groups that it created previously to access the quorum disk. The Cluster service then begins to create all the objects that are necessary to form the cluster. These include nodes, networks, groups, resources, resource types, and resource monitors. After the service creates the objects, they are brought online.
000003f0.000004a4::2002/09/19-14:53:47.540 [DM] DmpApplyChanges: Exit, returning 0x00000000
100
000003f0.000004a4::2002/09/19-14:53:47.540 [FM] FmFormNewClusterPhase1, Entry. Quorum quorum will be deleted 000003f0.000004a4::2002/09/19-14:53:47.540 [FM] DestroyGroup: destroying a83c0989-55b5-4563-97a3-f677cf78966e 000003f0.000004a4::2002/09/19-14:53:47.540 [FM] DestroyResource: destroying b96c7b54-4fb4-43ef-8104-c79c743bd9b7 . . 000003f0.000004a4::2002/09/19-14:53:47.540 [NM] Beginning cluster form process. 000003f0.000004a4::2002/09/19-14:53:47.540 [NM] Synchronizing node information. 000003f0.000004a4::2002/09/19-14:53:47.540 [DM] DmBeginLocalUpdate Entry 000003f0.000004a4::2002/09/19-14:53:47.540 [DM] DmBeginLocalUpdate Exit, pLocalXsaction=0x000a9b30 000003f0.000004a4::2002/09/19-14:53:47.540 [DM] DmAbortLocalUpdate Entry 000003f0.000004a4::2002/09/19-14:53:47.540 [DM] DmpRestartFlusher: Entry 000003f0.000005a4::2002/09/19-14:53:47.540 [DM] DmpRegistryFlusher: restarting 000003f0.000004a4::2002/09/19-14:53:47.540 [DM] DmAbortLocalUpdate Exit, returning 0x00000000 000003f0.000004a4::2002/09/19-14:53:47.540 [NM] Creating node objects. 000003f0.000004a4::2002/09/19-14:53:47.540 [NM] Creating object for node 2 (NodeB) 000003f0.000004a4::2002/09/19-14:53:47.540 [NM] Synchronizing network information. 000003f0.000004a4::2002/09/19-14:53:47.540 [NM] Synchronizing interface information. 000003f0.000004a4::2002/09/19-14:53:47.556 [NM] Running network configuration engine. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Processing network configuration changes. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Matched 2 networks, created 0 new networks. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Resynchronizing network information. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Resynchronizing interface information. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Creating network objects. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Creating object for network 524f32e5-ed95-42b8-9917-77560bfb7880 (VLAN). 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Creating object for network 78e44a1e-a4f6-4c77-a755-ae35b12c286f (Private). 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Creating object for network b0176e30-7e3c-4acd-88f5-0259d3a11d6b (Private(1)). . . 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Creating interface objects. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Creating object for interface 18b0338c-ea5e-4a21-bf16-e581400e3ceb (Private(1) - NodeA). 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Assigned index 0 to interface 18b0338c-ea5e-4a21-bf16-e581400e3ceb. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Registering network b0176e30-7e3c-4acd-88f5-0259d3a11d6b (Private(1)) with cluster transport. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Bringing network b0176e30-7e3c-4acd-88f5-0259d3a11d6b online. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Registering interface 18b0338c-ea5e-4a21-bf16-e581400e3ceb (Private(1) - NodeA) with cluster transport, addr 10.11.12.13, endpoint 3343. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Scheduled network connectivity report worker thread.
101
000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Creating object for interface 894e5bfc-dd63-47ba-b317-0f1a6833a075 (Private - NodeB). . . 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Creating object for interface d0c5daa3-d7d7-4763-99aa-86be1c102014 (VLAN - NodeA). 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Assigned index 0 to interface d0c5daa3-d7d7-4763-99aa-86be1c102014. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Registering network 524f32e5-ed95-42b8-9917-77560bfb7880 (VLAN) with cluster transport. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Bringing network 524f32e5-ed95-42b8-9917-77560bfb7880 online. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Registering interface d0c5daa3-d7d7-4763-99aa-86be1c102014 (VLAN - NodeA) with cluster transport, addr 10.0.0.144, endpoint 3343. . . 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Registering interface ece486bc-3197-4873-9e7d-973c40726870 (VLAN - NodeB) with cluster transport, addr 10.0.0.139, endpoint 3343.
After network enumeration and initialization, the node manager [NM] declares that this node has membership in the cluster that is being formed. [RGP] is the component that handles membership and regrouping events.
000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Initializing membership... 000003f0.000004a4::2002/09/19-14:53:47.681 [RGP] Node 1: RGP Init called : 0x1, 0x10, 0x2364a8, 0x236480. 000003f0.00000660::2002/09/19-14:53:47.681 [NM] State recalculation timer expired for network 524f32e5-ed95-42b8-9917-77560bfb7880 (VLAN). 000003f0.000004a4::2002/09/19-14:53:47.681 [ClMsg] Initializing. 000003f0.00000660::2002/09/19-14:53:47.681 [NM] Scheduled worker thread to service network 524f32e5-ed95-42b8-9917-77560bfb7880. 000003f0.00000660::2002/09/19-14:53:47.681 [NM] State recalculation timer expired for network b0176e30-7e3c-4acd-88f5-0259d3a11d6b (Private(1)). 000003f0.00000660::2002/09/19-14:53:47.681 [NM] Scheduled worker thread to service network b0176e30-7e3c-4acd-88f5-0259d3a11d6b. 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Worker thread processing network 524f32e5-ed95-42b8-9917-77560bfb7880. 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Beginning phase 1 of state computation for network 524f32e5-ed95-42b8-9917-77560bfb7880. 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Examining connectivity data for interface 0 (d0c5daa3-d7d7-4763-99aa-86be1c102014) on network d0c5daa3-d7d7-4763-99aa-86be1c102014. 000003f0.00000654::2002/09/19-14:53:47.681 [NM] The report from interface 1 is not valid on network 524f32e5-ed95-42b8-9917-77560bfb7880. 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Interface 0 (d0c5daa3d7d7-4763-99aa-86be1c102014) is up on network 524f32e5-ed95-42b8-991777560bfb7880. 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Node is down for interface 1 (ece486bc-3197-4873-9e7d-973c40726870) on network 524f32e5ed95-42b8-9917-77560bfb7880 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Completed phase 1 of state computation for network 524f32e5-ed95-42b8-9917-77560bfb7880.
Enumerated Networks The following is an example of a network summary that indicates that one network was unavailable, one network was reachable, and a total of one network was declared "up". The unavailable network in this case is the private network
102
that is connected to the second node by a crossover cable. Because the other node is not turned on, "unavailable" is the correct status for this network.
000003f0.00000654::2002/09/19-14:53:47.681 [NM] Unavailable=1, Failed = 0, Unreachable=0, Reachable=1, Up=1 on network 524f32e5-ed95-42b8-991777560bfb7880 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Network 524f32e5-ed9542b8-9917-77560bfb7880 is now in state 3 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Interface d0c5daa3-d7d74763-99aa-86be1c102014 is up (node: NodeA, network: VLAN). 000003f0.00000654::2002/09/19-14:53:47.681 [NM] Network 524f32e5-ed9542b8-9917-77560bfb7880 (VLAN) is up. . . 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Membership initialization complete. 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] NmpValidateNodeVersion: Node=1, HighestVersion=0x00030893, LowestVersion=0x000200e0 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] NmpCalcClusterVersion: status = 0 ClusHighestVer=0x00030893, ClusLowestVer=0x000200e0 000003f0.000004a4::2002/09/19-14:53:47.681 [NM] [NmpResetClusterVersion] ClusterHighestVer=0x00030893 ClusterLowestVer=0x000200e0
Disabling Mixed Operation The Cluster service contains a backward compatibility mode for rolling upgrades. If declared nodes are Windows 2000-based, the backward compatibility mode is disabled.
000003f0.000004a4::2002/09/19-14:53:47.681 [NM] Disabling mixed NT4/NT5 operation.
Forming Cluster Membership The next step in this log shows the cluster membership forming.
000003f0.000004a4::2002/09/19-14:53:47.681 membership. 000003f0.000004a4::2002/09/19-14:53:47.681 SetRGPInfo : 0x1, 0x0, 0x4, 0x0. 000003f0.000004a4::2002/09/19-14:53:47.681 called: 0x0, 0x107ab6d, 0x0, 0x0. 000003f0.000004a4::2002/09/19-14:53:47.681 dwFixupType=1 [NM] Forming cluster [RGP] Node 1: RGP [RGP] Node 1: RGP Start [NM] NmPerformFixups Entry,
Any updates to the nodes are performed through the Global Update Manager [GUM]. If one or more nodes are offline, these changes are recorded in the quorum log. Nodes that are online will have [GUM] events in their cluster logs with the same sequence number.
000003f0.000004a4::2002/09/19-14:53:47.681 [GUM] GumSendUpdate: Locker waiting type 2 context 17 000003f0.000004a4::2002/09/19-14:53:47.681 [GUM] Thread 0x4a4 UpdateLock wait on Type 2 000003f0.000004a4::2002/09/19-14:53:47.681 [GUM] DoLockingUpdate successful, lock granted to 1 000003f0.000004a4::2002/09/19-14:53:47.697 [GUM] GumSendUpdate: Locker dispatching seq 45014 type 2 context 17 000003f0.000004a4::2002/09/19-14:53:47.697 [DM] DmBeginLocalUpdate Entry 000003f0.000004a4::2002/09/19-14:53:47.853 [DM] DmBeginLocalUpdate Exit, pLocalXsaction=0x0009ec20
103
Identifying Resource Types To properly bring resources online, the resource types and corresponding resource DLL information must be enumerated. For each resource type, the nodes that may host the resource type must be set.
000003f0.000004a4::2002/09/19-14:53:47.853 [FM] processing resource types. 000003f0.000004a4::2002/09/19-14:53:47.853 [FM] FmpQueryResTypeInfo: Calling FmpAddPossibleNodeToList for restype DHCP Service 000003f0.000004a4::2002/09/19-14:53:47.853 [FM] FmpAddPossibleNodeToList: adding node 1 to resource type's possible node list 000003f0.000004a4::2002/09/19-14:53:47.853 [FM] FmpAddPossibleNodeToList: adding node 2 to resource type's possible node list . . 000003f0.000004a4::2002/09/19-14:53:47.853 [NM] NmFixupNotifyCb: Calculating Cluster Node Limit
Cluster Membership Limit With this cluster, only two nodes have formally joined the cluster through configuration of the Cluster service. As additional nodes are added, this value increases. The value decreases when you evict a node. The limit is also constrained by the maximum number of nodes permitted for the current version of the operating system.
000003f0.000004a4::2002/09/19-14:53:47.853 [NM] Calculated cluster node limit = 2 000003f0.000004a4::2002/09/19-14:53:47.853 [NM] NmpUpdatePerformJoinFixups2: called postfixup notifycb function with status 0 000003f0.000004a4::2002/09/19-14:53:47.853 [DM] DmCommitLocalUpdate Entry 000003f0.000004a4::2002/09/19-14:53:47.853 [DM] DmCommitLocalUpdate Exit, returning 0x00000000 000003f0.000004a4::2002/09/19-14:53:47.853 [GUM] GumpDoUnlockingUpdate releasing lock ownership 000003f0.000004a4::2002/09/19-14:53:47.853 [GUM] GumSendUpdate: completed update seq 45015 type 2 context 17 000003f0.000004a4::2002/09/19-14:53:47.853 [NM] NmPerformFixups Exit, dwStatus=0 000003f0.0000051c::2002/09/19-14:53:50.509 [GUM] DoLockingUpdate successful, lock granted to 1 . .
Cluster Service Started When all startup events complete for the Cluster service, the following entry appears in the log. If failures occur before this step that prevent startup, the Cluster service destroys any created objects in memory, and the service ends. The following entry corresponds to event 1061 or 1062 in the system event log. After that, the log will show events for various updates, resource state changes, and remaining resources brought online.
000003f0.000004a4::2002/09/19-14:53:50.509 [INIT] Cluster started. 000003f0.0000051c::2002/09/19-14:53:50.509 [GUM] GumSendUpdate: Locker dispatching seq 45052 type 1 context 4098
104
. . 00000408.0000053c::2002/09/19-14:53:57.821 Network Name ClusterName: Registered DNS PTR record 53.0.0.10.in-addr.arpa. for host MyCluster.mydomain.com. 00000408.0000053c::2002/09/19-14:53:57.821 Network Name ClusterName: Network Name MyCluster is now online 00000408.0000053c::2002/09/19-14:53:57.821 [RM] RmpSetResourceStatus, Posting state 2 notification for resource ClusterName
Note Error code 21 means "The device is not ready. This indicates a possible problem with the device. In this case, the device was turned off, and the error status correctly indicates the problem. Example 2: Duplicate Cluster IP Address If another computer on the network has the same IP address as the cluster IP address resource, the resource will fail. Also, the cluster name will not be registered on the network, because it depends on the IP address resource. Because this name is the network name that is used for cluster administration, you cannot administer the cluster by using this name while the network name resource is offline. However, you may be able to use the computer name of the cluster node to connect from Cluster Administrator. You may be able to connect locally from the console by using the loopback address. The following sample entry is from a cluster log file during this type of failure:
00000408.00000b9::2002/09/19-14:53:57.821 IP Address ClusterIPAddress: The IP address is already in use on the network, status 5057.
105
Using Cluster.exe
Cluster.exe is a companion program and is installed with Cluster Administrator. The Cluster service documentation details basic syntax for this utility. This section complements the Administrator's Guide and provides examples. All examples in this section assume a cluster that is named MyCluster, that is installed in the MyDomain domain, and that contains the NODEA and NODEB servers. MyGroup is a group name used in the example. Note In commands, enclose any names that contain spaces in quotation marks.
Basic Syntax Except for the cluster /? command, which returns basic syntax for the command, every command line uses the following syntax:
cluster /cluster:clustername /option
To test connectivity with a cluster or to make sure that you can use Cluster.exe, try the simple command in the next section to check the version name (/version). Cluster Commands Version Number To check the version number of your cluster, use a command similar to the following:
cluster /cluster:mycluster /version
If your cluster is named MyCluster, the command returns the version information for the product. Listing Clusters in the Domain To list all clusters in a single domain that is named MyDomain, use a command that includes the /list option:
cluster /cluster:mycluster /list:mydomain
Node Commands All commands that are directed toward a specific cluster node must use the following syntax:
106
Node Status To obtain the status of a particular cluster node, use the /status command. For example:
cluster /cluster:mycluster node nodea /status
The node name is optional only for the /status command, so the following command will report the status of all nodes in the cluster:
cluster /cluster:mycluster node /status
Pause or Resume The /pause option allows the Cluster service to continue running and communicating in the cluster. However, the paused node may not own groups or resources. For example, to pause a node, use the /pause option.
cluster /cluster:mycluster node nodeb /pause
One reason to use this command is to transfer groups to another node while you perform some other kind of task, such as run a backup or disk defragmenter utility.
To resume the node, use the /resume option instead: cluster /cluster:mycluster node nodeb /resume
Evict a Node The /evict option removes the ability of a node to participate in the cluster. In other words, the cluster node loses membership rights in the cluster. To perform this action, use a command similar to the following:
cluster /cluster:mycluster node nodeb /evict
The only way to grant membership rights again to the evicted node is to: 1. Remove the cluster software from the evicted node through Add/Remove Programs in Control Panel. 2. Restart the node. 3. Reconfigure the Cluster service on the previously evicted node. Changing Node Properties The cluster node only has one property that Cluster.exe can change: node description. For example:
cluster /cluster:mycluster node nodea /properties description="the best node in mycluster."
NOTE The command lines have been wrapped because of space constraints in this document. The lines do not typically wrap.
107
A good use for this node changing property might be if you have multiple administrators. For example, you pause a node to run a large application on the designated node, and you want to change the node description to reflect this. The information in this field can remind you and other administrators why it was paused and that someone must run the /resume option on the node later. Consider including the /resume option in a batch file that pauses a node in preparation for the designated task. Group Commands All group commands use the syntax:
cluster clustername group groupname /option
Group Status To obtain the status of a group, you can use the /status option. This option is the only group option for which the group name is optional. Therefore, if you omit the group name, the status of all groups is displayed. Another status option (/node) will display group status by node. Examples: To list the status of all groups, type:
cluster /cluster:mycluster group /status
Example: To list the status of all groups that are owned by a specific node, type:
cluster /cluster:mycluster group /status /node:nodea
Create a New Group It is easy to create a new group from the command prompt. The following example creates a group that is named MyGroup.
cluster /cluster:mycluster group mygroup /create
Delete a Group The /delete option is as simple to use as the /create option to delete groups from the command prompt. However, the group must be empty before you can delete it.
cluster /cluster:mycluster group mygroup /delete
Rename a Group To rename a group that is named MyGroup as YourGroup, use the following command:
cluster /cluster:mycluster group mygroup /rename:yourgroup
108
Move, Online, and Offline Group Commands You can use the move group command to transfer ownership of a group and its resources to another node. By design, the move command must take the group offline and bring it online on the other node. You can specify a timeout value (number of seconds) that is the time to wait before cancellation of the move request. By default, Cluster.exe waits indefinitely until the state of the group changes to the appropriate state. Examples:
cluster /cluster:mycluster group mygroup /moveto:nodeb /wait:120 cluster /cluster:mycluster group mygroup /offline cluster /cluster:mycluster group mygroup /online
Group Properties Use the /property option to display or set group properties. Documentation on common properties for groups is in the Microsoft Cluster Server Administrator's Guide. There is one additional property that is not documented: LoadBalState. This property is reserved for future use. Examples:
cluster /cluster:mycluster group mygroup /properties cluster /cluster:mycluster group mygroup /properties description="my favorite group"
Preferred Owner You can specify a preferred owner for a group. The preferred owner is the node that you prefer each group to run on. If a node fails, the remaining node takes over the groups from the failed node. By setting the failback option at the group level, groups may fail back to their preferred server when the node becomes available. A group does not fail back if a preferred owner is not specified. MSCS version 1.0 is limited to two nodes in a cluster. With that version, if you specify more than one preferred owner, group ownership can change when a node becomes available, even though the node that owns it is also on the list. For best results, specify no more than one preferred owner unless the cluster has more than two nodes. Example: To list the preferred owner for a group, type:
cluster /cluster:mycluster group mygroup /listowner
Resource Commands Resource Status To list the status of resources or a particular resource, you can use the /status option.
109
Examples:
cluster /cluster:mycluster resource /status cluster /cluster:mycluster resource myshare /status
Create a New Resource To create a new resource, use the /create option. Note To avoid error, you must specify all required parameters for the resource. With the /create option, you can create resources in an incomplete state. Make sure to set additional resource properties as appropriate with subsequent commands. Example: The following command sequence adds a file share resource named MyShare.
cluster /cluster:mycluster /type:"file share" cluster /cluster:mycluster cluster /cluster:mycluster cluster /cluster:mycluster cluster /cluster:mycluster resource myshare /create /group:mygroup resource resource resource resource myshare myshare myshare myshare /privprop sharename="myshare" /privprop path="w:\myshare" /privprop maxusers=-1 /adddependency:"disk w"
Simulating Resource Failure You can use the /fail option for a resource from a command prompt to simulate resource failure in a cluster. This option is similar to using the Initiate Failure command from Cluster Administrator. The command assumes that the resource is already online. Example:
cluster /cluster:mycluster resource myshare /fail
Online/Offline Resource Commands The /online and /offline resource commands work the same way as the corresponding group commands. You can also use the /wait option to specify a time limit (in seconds) for the operation to complete. Examples:
cluster /cluster:mycluster resource myshare /offline /wait cluster /cluster:mycluster resource myshare /online
Dependencies You can also list or change resource dependency relationships from a command prompt. To add or remove a dependency, you must know the name of the resource to be added or removed as a dependency. Examples:
cluster /cluster:mycluster resource myshare /listdependencies cluster /cluster:mycluster resource myshare /adddependency:"disk w:" cluster /cluster:mycluster resource myshare /removedependency:"disk w:"
110
Example of a Batch Job The following example takes an existing group, MyGroup, and creates resources in the group. The example creates a network name resource, and then it initiates failures to test failover. During the process, it uses various reporting commands to obtain the status of the group and resources. This example shows the output from all the commands. Keep in mind that if use you these commands, you may have to change them depending on cluster configuration, group, resource, network, and IP addresses in your environment. Note The LoadBal properties reported in the example are reserved for future use. The EnableNetBIOS property for the IP address resource was introduced in Microsoft Windows NT Server Service Pack 4. This property must be set to a value of 1 for the resource to be a valid dependency for a network name resource.
C:\>REM Get group status. C:\>cluster /cluster:mycluster group mygroup /status Listing status for resource group 'mygroup': Group Node Status -------------------- --------------- -----mygroup NodeA Online C:\>REM Create the IP Address resource: myip. C:\>cluster /cluster:mycluster resource myip /create /group:mygroup /Type:"Ip Address" Creating resource 'myip'... Resource Group Node Status -------------------- -------------------- --------------- -----myip mygroup NodeA Offline C:\>REM Define the IP Address parameters. C:\>cluster /cluster:mycluster resource myip /priv network:client C:\>cluster /cluster:mycluster resource myip /priv address:157.57.152.23 C:\>REM Redundant. Subnet mask should already be same as network uses. C:\>cluster /cluster:mycluster resource myip /priv subnetmask:255.255.252.0 C:\>cluster /cluster:mycluster resource myip /priv enablenetbios:1 C:\>REM Check the status. C:\>cluster /cluster:mycluster resource myip /stat Listing status for resource 'myip': Resource Group Node Status -------------------- -------------------- --------------- -----myip mygroup NodeA Offline C:\>REM View the properties. C:\>cluster /cluster:mycluster resource myip /prop Listing properties for 'myip': R Name Value --------------------------------- ------------------------------R Name myip Type IP Address Description DebugPrefix
111
SeparateMonitor PersistentState LooksAlivePollInterval IsAlivePollInterval RestartAction RestartThreshold RestartPeriod PendingTimeout LoadBalStartupInterval LoadBalSampleInterval LoadBalAnalysisInterval LoadBalMinProcessorUnits LoadBalMinMemoryUnits
0 (0x0) 0 (0x0) 5000 (0x1388) 60000 (0xea60) 2 (0x2) 3 (0x3) 900000 (0xdbba0) 180000 (0x2bf20) 300000 (0x493e0) 10000 (0x2710) 300000 (0x493e0) 0 (0x0) 0 (0x0)
C:\>REM View the private properties. C:\>cluster /cluster:mycluster resource myip /priv Listing private properties for 'myip': R Name Value --------------------------------- ------------------------------Network Client Address 157.57.152.23 SubnetMask 255.255.252.0 EnableNetBIOS 1 (0x1) C:\>REM Bring online and wait 60 seconds for completion. C:\>cluster /cluster:mycluster resource myip /online /wait:60 Bringing resource 'myip' online... Resource Group Node Status -------------------- -------------------- --------------- -----myip mygroup NodeA Online C:\>REM Check the status again. C:\>cluster /cluster:mycluster resource myip /stat Listing status for resource 'myip': Resource Group Node Status -------------------- -------------------- --------------- -----myip mygroup NodeA Online C:\>REM Define a network name resource. C:\>cluster /cluster:mycluster resource mynetname /create /group:mygroup /type:"network name" Creating resource 'mynetname'... Resource Group Node Status -------------------- -------------------- --------------- -----mynetname mygroup NodeA Offline c:\>cluster /cluster:mycluster resource mynetname /priv name:"mynetname" c:\>cluster /cluster:mycluster resource mynetname /adddependency:myip Making resource 'mynetname' depend on resource 'myip'... C:\>REM Status check. c:\>cluster /cluster:mycluster resource mynetname /stat Listing status for resource 'mynetname': Resource Group Node Status -------------------- -------------------- --------------- -----mynetname mygroup NodeA Offline
112
C:\>REM Bring the network name online. C:\>cluster /cluster:mycluster resource mynetname /online /wait:60 Bringing resource 'mynetname' online... Resource Group Node Status -------------------- -------------------- --------------- -----mynetname mygroup NodeA Online C:\>REM Status check. c:\>cluster /cluster:mycluster group mygroup /stat Listing status for resource group 'mygroup': Group Node Status -------------------- --------------- -----mygroup NodeA Online C:\>REM Simulate a failure of the IP address. C:\>cluster /cluster:mycluster resource myip /fail Failing resource 'myip'... Resource Group Node Status -------------------- -------------------- --------------- -----myip mygroup NodeA Online Pending C:\>REM Get group status. C:\>cluster /cluster:mycluster group mygroup /status Listing status for resource group 'mygroup': Group Node Status -------------------- --------------- -----mygroup NodeA Online
113
INDEX
A
add resources 28, 61 administrator 8, 19, 29, 40, 65, 74, 92
C
CD-ROM 15, 35 certified 9 Chkdsk 32, 33, 52, 53, 54, 55, 75 Chkdsk /c 53, 54, 55 /i 53, 54 Cluster Administrator 24, 25, 27, 30, 36, 49, 60, 66, 72, 76, 91, 105, 106, 110 cluster is down 22 cluster log file 23, 24, 32, 63, 65, 92, 93, 94, 105 Cluster service 10, 12, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 33, 36, 37, 41, 42, 43, 44, 47, 48, 50, 54, 61, 62, 63, 65, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 80, 81, 88, 89, 92, 93, 94, 95, 96, 98, 99, 100, 103, 104, 105, 106, 107 Cluster service account 19, 23, 26, 41, 50, 77 Cluster.exe 29, 72, 106, 107 CLUSTERLOG 93 connectivity 10, 11, 12, 14, 16, 19, 20, 27, 40, 41, 43, 76, 81, 90, 91, 92, 101, 102, 106
D
delete group 28, 108 resource 31 DHCP 20, 48, 80, 90 domain 11, 16, 17, 19, 23, 27, 41, 75, 81, 82, 91, 92, 106 domain controller 11, 15, 19, 23, 75, 81, 91, 92 drive letter assignment 16 drives 15, 16, 19, 35, 36, 38, 55
E
error access denied 50 incorrect function 25 Event ID 1004 64, 90 1005 64, 90 101 89 2511 91 4199 91 5719 91 7000 23, 91 7013 23, 92 7023 23, 92 9 88 evict 21, 23, 31, 61, 76, 104, 107
F
failback 33 failover 24, 27, 33 failover problems 24 file correct share 72 file share 25, 28, 29, 30, 41, 47, 50, 51, 60, 72, 75, 86, 110
H
hardware compatibility list 9 HCL 9, 15, 18, 37 host adapters 10, 13, 14, 35
I
installation problems 18
J
Jetpack 49
114
Lock pages in memory 17, 23 Lock pages in memory XE 17 log file 18, 22, 23, 24, 60, 63, 64, 66, 75, 87, 88, 93, 99, 105 Logon as a service 17, 23
M
move group 24, 27, 28, 31, 33, 109 resource 32, 58 MSCS 8, 63, 109
N
network private 11, 12, 19, 60, 102 public 11, 12, 19, 21, 60 network adapter 9, 11, 13, 16, 24, 35, 41, 42, 60, 64, 69, 70, 71, 72, 80, 81, 90 network name 12, 26, 30, 40, 60, 71, 83, 86, 105, 111, 112, 113 node is down 22 NTFS 15, 16, 48, 52, 55, 56
P
paging file 16 partition 15, 56, 66, 67, 68, 74, 83 physical disk 25, 29, 30, 41, 105 ping 12, 40
Q
quorum disk 18, 25, 59, 65, 66, 67, 74, 77, 82, 92, 95, 96, 98, 99, 100, 105
R
Registry Replication 30, 31 RPC 20, 24, 27, 81, 91, 94, 95
S
SCSI cables 13, 88 SCSI termination 13, 14, 16, 25, 35, 36, 58, 68, 75, 88 service pack 16, 22, 58, 73, 91, 93 shared SCSI 9, 10, 13, 14, 15, 16, 25, 29, 35, 36, 38, 68 signature 39, 67, 97
V
validated 9, 10, 13, 18
W
Windows Internet Naming Service (WINS) 12, 26, 40, 41, 48, 49, 61, 72, 77
115