Alerting
Understanding Alerts vs. Incidents
Understanding the difference between alerts and incidents helps to effectively manage and respond to critical or potentially dangerous events.
Alert
An alert is a notification generated when a predefined condition or threshold is met. Alerts are intended to draw attention to potential issues or unusual patterns that could require further attention, but do not necessarily indicate that a critical issue is impacting your system's performance.
- Purpose: Informing users of conditions that may need monitoring or prompt preventive action.
- Examples:
- CPU usage exceeds 80% for a specific period.
- A single failed login attempt.
- A minor network latency spike.
- Response: Alerts can be configured to trigger automatically, and while they may require investigation, they don’t always require immediate action unless they escalate or happen frequently.
Incident
An incident is a more severe event that represents an actual problem impacting the functionality, performance, or security of your system. Incidents typically result from a critical alert that has been validated as a true problem in need of a resolution.
- Purpose: Signaling an issue that requires immediate investigation, diagnosis, and resolution.
- Examples:
- System downtime or a service outage.
- Unauthorized access to sensitive data.
- A complete failure of a critical application component.
- Response: Incidents require an escalation process and often a coordinated response from a response team. They may also initiate incident reports, root-cause analysis, and preventive measures (throubleshooting).
Alert vs Incident
Aspect | Alert | Incident |
---|---|---|
Severity | Potential/minor issue | Confirmed/significant issue |
Response | Monitoring, may require investigation | Immediate action and escalation |
Examples | High CPU usage, minor latency spikes | Service outage, unauthorized data access |
Alert sources
Our built-in HTTP- and ICMP-monitors are one possible alert source. To link your own Alertmanager instance to Incidite for automatic incident creation and service updates, follow these steps:
- Navigate to Settings.
- Click on Alerting in the sidebar.
- Click on Create Source.
Alert source settings
- Alert Processor: Select which Alert Processor should handle the correlation of incoming alerts.
- Type: Specify the type of alert source you want to ingest (currently only Alertmanager is supported).
- Name: Provide an alias for your alert source for reference.
After creating the alert source, you will receive a webhook endpoint and a webhook secret. Add these to your Alertmanager instance to forward alerts to Incidite:
Alert processors
Alert processors correlate incoming alerts based on their attributes, linking impacted services by their correlation group and mapping them to ongoing incidents.
We currently offer two types of processors:
- Statistical: Uses statistical methods to correlate alerts based on overlapping key-value pairs and predefined weights. You can also define weights for specific attributes.
- ML (Beta): Uses machine learning to determine if alerts correlate with each other.
- Fusion: Work in progress
Statistical alert processing
The statistical alert processor compares two or more alerts and calculates how similar they are. The percentage of similarity between these alerts is indicated by a value between 0 and 1, with 1 being identical and 0 showing no similarity at all.
Alerts are correlated if a preset Similarity Threshold is reached. The default value is at 0.7, but you can change it depending on how exact the correlation should be. If you are unsure which Similarity Threshold suits you best you can simulate how many incidents would be created from your previous alerts with your current threshold by clicking Simulate Alert Processor after configuring your threshold, keys and weights.
Machine-learning alert processing
The machine-learning alert processor uses advanced algorithms to detect patterns and correlations in alerts based on historical data and learned behaviors. Unlike the statistical processor, which uses a fixed Similarity Threshold, the machine-learning processor dynamically adapts as it processes new data. This allows it to evolve over time, enhancing accuracy in identifying related alerts based on complex patterns beyond straightforward similarity scores.
Statistical vs ML
Feature | Statistical Alert Processing | Machine-learning Alert Processing |
---|---|---|
Correlation Basis | Uses a fixed Similarity Threshold and weights to correlate alerts based on similarity score. | Employs adaptive algorithms that learn from data, allowing detection of complex patterns beyond a single similarity score. |
Adaptability | Requires manual adjustment of the Similarity Threshold/weights for changing conditions. | Automatically adjusts correlation criteria as it learns, reducing the need for manual tuning. |
Use Case Flexibility | Best for stable environments with predictable patterns or for users needing strict control over correlation criteria. | Ideal for dynamic environments with changing patterns, as it can automatically adapt to new correlations. |