Cluster Installation Time Out Issues
If a new cluster is being created, and multiple nodes are specified in the Create Cluster Wizard, the creation process fails due to a timeout. Creation of a single node cluster, on the other hand, using any of the servers that will participate in the cluster will succeed. If you try to then add an additional node using the Add Node Wizard, the process will time out.
If the Create Cluster Wizard is used to create the cluster, the following output is displayed:
Configuring node ‘name’
---------------------------------------
12% Validating cluster state on node ‘name’.
25% Getting current node membership of cluster ‘name’.
37% Adding node ‘name’ to Cluster configuration data.
50% Validating installation of the Microsoft Failover Cluster Virtual Adapter on node ‘name’.
62% Validating installation of the Cluster Disk Driver on node ‘name’.
75% Configuring Cluster Service on node ‘name’.
87% Starting Cluster Service on node ‘name’
100% Waiting for notification that node ‘name’ is a fully functional member of the cluster. This phase has failed for Cluster object 'name' with an error status of 1460 (0x000005B4).
Cleaning up ‘name’.
If you use the Cluster.exe /ADD [NODE] command (from an elevated command line interface (CLI) prompt) to add a node, you will see the following error:
"System error 1460 has occurred (0x000005b4) This operation returned because the timeout period expired"
You will also see the error if you use the Add-ClusterNode [[-Name] ] command from an elevated Windows PowerShell Modules command line interface (CLI) prompt to add a node.
NOTE: We recommend using Windows PowerShell cmdlets, based on the statement found on the following link as stated below: http://technet.microsoft.com/en-us/library/dd443539(WS.10).aspx
If you have scripts based on Cluster.exe, you can continue to use them in Windows Server 2008 R2, but we recommend that you rewrite them with Windows PowerShell cmdlets. In future releases, Windows PowerShell will be the only command-line interface available for failover clusters.
To learn more about how Cluster.exe commands map to Windows PowerShell cmdlets, please visit the following link:
Based on our current case data, we’ve identified two common causes for this behavior.
The first cause is due to a duplicate account name in Active Directory (AD) for a node name. The node name is the name of a cluster server. What will happen is that it finds the non-computer account and tries to join it. Because it is not the actual node it is making the connection to, it times out the connection.
In most cases the duplicate account is created by an application that is AD integrated. The most effective way to find the duplicate name, if it exists, is to use LDIFDE.exe from a command prompt run as Administrator. Here is the command to run:
ldifde -f output.ldf -r "(samAccountName=W2K8-R2*)"
In the example above:
-f = filename to write to
-r = the variable to search
W2K8-R2* = give me everything that starts with W2K8-R2, which is the node name.
This will create the file output.ldf in the current directory that can be read by notepad. If you review the file, if it is a computername, you will see the below information:
objectclass: computer
servicePrincipalName: StevenAndress$
If it is a user or service account, it will not have the above, but would have:
userPrincipalName=StevenAndress$
Also, it will give you the current OU that the object resides in. To get the node to join, you must rename the user/service account name to something else. For this, just go to the OU listed to find the object and rename it.
The second common cause of this issue is Anti-Virus/Firewall (Security) applications. These applications appear to be closing the required network endpoints. You can determine if this is the likely cause by generating a cluster log file to review. You do this from an administrator command prompt using the following syntax.
CLUSTER [[/CLUSTER:]cluster-name] LOG
Cluster log /gen will generate a cluster log and place it in the %systemroot%\Windows\System32\Cluster\Reports
Once you have generated the log, open it and go to the bottom. Search up for the word graceful. You may see entries similar to the following:
00000e40.00000c24::2011/04/27-16:01:04.513 INFO [CHANNEL 1.1.1.1:~52099~] graceful close, status (of previous failure, may not indicate problem) ERROR_SUCCESS(0)
00000e40.00000c24::2011/04/27-16:01:04.513 WARN mscs::ListenerWorker::operator (): GracefulClose(1226)' because of 'channel to remote endpoint 1.1.1.1:~52099~ is closed
Note: IP Address changed to protect the innocent
These entries indicate an endpoint has been closed at the application layer, and as a result, cluster communications fail. The only way to conclusively determine that an Anti-Virus/Firewall (Security) application is the culprit is to fully uninstall it. Disabling the service(s) will not suffice, because there may be Kernel level drivers still loading in memory even with the service(s) disabled. If the Anti-Virus/Firewall (Security) application removal resolves the failure, you should contact the application vendor
Comments
Post a Comment