Server Health Checks – 2
Check Network Connections
Here are some other checks you should perform to ensure proper network connectivity:
Here are some other checks you should perform to ensure proper network connectivity:
1. ipconfig /all will display all you TCP/IP settings including you MAC address
2. ipconfig /flushdns will flush your dns resolver cache
3. ipconfig/displaydns will display what is in your dns name cache
4. Netstat -an command will show all the connections & ports from a machine
5. Nbtstat command will show net bios tcp/ip connection stats
6. Tracert <IP or DNS Name> command will show you the path the packet takes, the routers, and the response time for each hop.
7. pathping <IP or DNS Name> command combines ping and tracert to the 100th degree. It pings each hop 100 times and is great for testing wan connectivity
Disk Space
All kinds of bad stuff can happen when your disk space is filling up. The best way to alleviate this is to write a script to notify you when you reach a certain threshold. In a future post I”ll share a method for you to do just that…however if there is a problem and you need to perform a health check then here is how you check the space the old fashion way.
To check disk space manually:
1. Right Click on My Computer
2. Select Manage
3. Select Disk Management
4. Validate each disk more than 10 percent free space
Event Logs
Event logs can reveal a more historical perspective on what is going on with the system and applications. Things to look for when troubleshooting event logs is to query either the system or the application logs and look for the presence of events that have a timestamp near the time of the issue you are troubleshooting.
Events have 3 categories in the event viewer:
· Informational: Noted with a white icon and letter ‘i’. Successful operations are logged as informational. Usually not used in troubleshooting problems or failures
· Warning: Noted with a yellow icon and exclamation point. These usually are looked up as they serve as predictive future failure indicators, such as disk space running low, dhcp ip address lease renewal failures, etc.
· Error: Noted with a red circle icon and ‘x’. These are indications that something has failed outright and are a good starting point for troubleshooting.
When looking at event logs, use the information to determine the following:
· Is the incident tied to a particular time or outage incident?
· Is this a one-off, or has this particular error occurred multiple times in the past?
· Does this error appear on other systems or is it unique to the system that has failed?
Also make sure you take a look at eventcombmt from Microsoft. This tool allows you to search the logs of multiple machines. The benefit to this is to see if a specific error or warning message is also occurring on other systems. This can help rule out issues.
Services
Troubleshooting services should be limited to the specific that is affected by the problem being troubleshot. Each server will have specific services varying upon the types of applications running. You should document how your servers services are configured to and compare that to the server in question to see if anything is not configured correctly.
Cluster
Servers that host applications and services that require high availability should be clustered so that if one node fails the other can pick up the workload. Clustered servers need the same type of health checks as stand-alone systems except you will want to check on the health of the cluster.
Check Cluster Resource Status
1. Open Cluster Administrator: Log onto server, select Start –> Run –> cluadmin
2. Check the Resources and ensure all are Online
3. If Cluster Administrator does not open, ensure that the Cluster Service is running on the node.
4. Cluster resource status can also be checked from a remote server. From a command prompt, just type –cluster res <cluster name>
Client Side Health
1. Right click on My Computer, select Manage
2. Open Device Manage
3. Drill down to SCSI and RAID Controllers, verify that the HBA HW is visible and does not show any errors
4. If it does not show up in Device Manager, you may need to re-scan for the HW, re-seat the fiber card, or re-install the driver.
5. If the HBA is showing healthy in Device Manager, open the tool that you use to view configuration and settings for the fiber card and verify there aren’t any transmit/receive errors on link statistics or counters
Switch Health
1. Make sure fiber is properly connected to each switch
2. Make sure switch has no errors
3. If you’re using zoning verify it is properly configured
Check Fiber and SAN Connectivity
1. Log onto san appliance and verify that the SAN is in general good health and no major errors are present for the controllers, loops, switches, or ports.
2. Ensure that the LUNs are presented to the servers in the cluster
NLBS
Some applications will require you to spread the load across multiple servers. Web servers are a very popular choice to network load balance. As with clusters we will need to check the status of the load balancing.
Check NLBS Status CMD Line
1. From a command prompt on the local system, run ‘wlbs query’. This will give you the convergence status of the local node with the nlbs cluster.
2. Other useful NLBS commands: wlbs stop (stops nlbs), wlbs start (starts nlbs), wlbs drainstop (drains node)