Having a good health check mechanism in your load balancer is critical. If your load balancer is not able to recognize when servers are not able to process client requests, that will result in a poor customer experience.
For example, imagine if you just ping your web servers to check their availability, but the Web server is not able to reach its database any more for some reason (for example, wrong routing table or misconfigured firewall). Ping will still work (even the Web TCP ports will still be functional as well), but any client connection forwarded to the server will result in an error.
Azure Load Balancer supports TCP probes and HTTP probes (see more information on Azure Load Balancer probing here). You can use HTTP probes not only to monitor Web server availability, but to add additional functionality to your health checking scheme.
The idea is having an HTTP probe polling a Web page which will do some checks before giving back a 200 return code. For example, the web page might check that certain daemons are running correctly in the system, or that certain hosts are reachable over the network.
For example, I use HTTP probes in my Advanced Networking Lab. In this case I verify that the virtual firewalls have Internet reachability, before including them in the firewall load balancing farm (sending one single ICMP echo request to “bing.com”). I decided to generalize the web page I am using there and publish it in this Github repository.
As you can see, there is a very simple PHP page (obviously, you need to have a Web server such as Apache and PHP installed on your server), that optionally performs some operations before deciding whether returning a 200, or something else (I went for a 409, but you could choose any other HTTP return code, even in the 201-299 range). For completeness, here some sections of the code.
The first one is a brief section where you can quickly configure the tests that you want to perform:
<?php $reachabilityTest = True; $hosts = array ("bing.com", "google.com"); $daemonTest = True; $daemons = array ("httpd", "sshd"); $localTCPTest = True; $ports = array ("22", "80"); ?>
After that, different sections follow, each corresponding to one of the tests. Here the one for host reachability, that prints along the way some information for troubleshooting:
<?php // Reachability test if ($reachabilityTest === True) { print ("<h2>Reachability Test</h2>\n"); $allReachable = true; print (" <ul>\n"); foreach ($hosts as $host) { $result = exec ("ping -c 1 -W 1 " . $host . " 2>&1 | grep received"); print (" <li>" . $host . ": " . $result . "</li>\n"); $pos = strpos ($result, "1 received"); if ($pos === false) { $allReachable = false; break; } } print (" </ul>\n"); if ($allReachable === false) { // Ping did not work print (" <p>At least one target host does not seem to be reachable (" . $host . ")</p>\n"); } else { // Ping did work print (" <p>All target hosts seem to be reachable</p>\n"); } } ?>
And finally, the test results are evaluated to either return a 200, or something else (in this case a 409, “Conflict”):
<?php // Return code evaluation if ( (!($reachabilityTest) || $allReachable) && (!($daemonTest) || $allRunning) && ( !($localTCPTest) || $allOpen) ) { http_response_code (200); } else { http_response_code (409); } ?>
For example, you might verify that your VM has Internet reachability by pinging some other host in your network, and that your required daemons are running correctly. Or in case of a Web server, you might run a simple SQL query to verify connectivity to the database.
If you are using Windows machines or prefer other scripting languages, you can have a look at this code to see the overall architecture.
Happy Load Balancing!