FLAPPINGSTOP and FLAPPINGSTART alerts in Nagios
FLAPPINGSTOP and FLAPPINGSTART alerts in Nagios appear regularly and customers question me about the exact problem behind the alert. If you check the alert, the 2 important points are,
- What is Flappingstart and Flappingstop
- What is the status of the service or host?
There is a clear explanation given in Nagios Assets that describe about the basics of Flapping events in Nagios.
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/flapping.html
As mentioned in above link, here is some of the text that is relevant to what exactly Flapping mean.
Flapping occurs when a service or host changes state too frequently, resulting in a storm of problem and recovery notifications. Flapping can be indicative of configuration problems (i.e. thresholds set too low), troublesome services, or real network problems.
Whenever Nagios checks the status of a host or service, it will check to see if it has started or stopped flapping. It does this by:
- Storing the results of the last 21 checks of the host or service
- Analyzing the historical check results and determine where state changes/transitions occur
- Using the state transitions to determine a percent state change value (a measure of change) for the host or service
- Comparing the percent state change value against low and high flapping thresholds
A host or service is determined to have started flapping when its percent state change first exceeds a high flapping threshold.
A host or service is determined to have stopped flapping when its percent state goes below a low flapping threshold (assuming that it was previously flapping).
