Watchdog

From Alteeve Wiki
Jump to navigation Jump to search

 AN!Wiki :: Watchdog

Watchdog timers are devices built into many servers which constantly count down to zero. If the timer does actually reach zero, the watchdog device will hard-reset the server. This way, if the machine's operating system or software fails, the server will hopefully restart in a healthy state.

To avoid rebooting, software running on the server needs to periodically reset the watchdog's timer. In most cases, this software first runs one or more tests to ensure that the operating system and critical software is operating properly. If all tests pass, the timer is reset. If the tests fail, the timer does not get reset. This alone may not cause a reboot though. Generally, the timer is set to a long enough time span that the tests will have several chances to run.

So multiple failures can be tolerated, so long as whatever caused the tests to fail clears before the timer expires. This means that the administrator needs to balance a short enough time out that failures trigger the reboot in an acceptably short period of time, but not so short that transient issues won't cause an unnecessary reboot. What this means will be up to each administrator, but times around 60 to 300 seconds is usual.

Software-based watchdog timers are available and are effective in protecting against software faults, but not operating system faults. The reason is that the software watchdogs rely on the operating system being able to trigger the reboot. If anything blocks the operating system from functioning though, the system will never reboot. For this reason, a hardware-based watchdog is always preferred.

Additional information:

 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.