altermanager使用

  • 环境:centos7, alertmanager 0.15.1
  • github

Foreward

  • What

    ○ The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

    ☆ Alertmanager用于接收报警信息并发送报警信息,经常和prometheus结合使用.它提供对报警信息分组、去重,并发送到指定的接收方,如email等.它同样关注沉默报警和抑制报警.

    什么是沉默报警?

    什么是抑制报警?

Concepts

Grouping

○ Grouping categorizes alerts of similar nature into a single notification. This is especially useful during larger outages when many systems fail at once and hundreds to thousands of alerts may be firing simultaneously.

☆ 报警信息分组合并.这个比较好理解,如果同时有100台机器有问题,你肯定不希望收到100封邮件.grouping就是将这些分组合并.这样只要收到1封邮件,里面有100台机器的报警信息.

Inhibition

○ Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing.

Execute Alertmanager

Build with Binary and Config Systemd

Build Binary

  • Download
    1
    2
    3
    4
    5
    6
    # 安装在/opt下.下载地址prometheus官网有提供.
    $ cd /opt
    $ wget https://github.com/prometheus/alertmanager/releases/download/v0.15.1/alertmanager-0.15.1.linux-amd64.tar.gz
    $ tar -xvf alertmanager-0.15.1.linux-amd64.tar.gz
    # 重命名
    $ mv alertmanager-0.15.1.linux-amd64 alertmanager-0.15.1

Config Systemd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager.service
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/opt/alertmanager-0.15.1/alertmanager --config.file=/opt/alertmanager-0.15.1/alertmanager.yml
Restart=on-failure
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID

[Install]
WantedBy=multi-user.target
# 启动
$ systemctl enable alertmanager && systemctl start alertmanager

Build with Docker

Configuration

  • alertmanager.yml example
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    # 
    $ vim /opt/alertmanager-0.15.1/alertmanager.yml
    global:
    resolve_timeout: 5m
    smtp_smarthost: 'smtp.mxhichina.com:25'
    smtp_from: 'xiaoxiangyoupin@basestonedata.com'
    smtp_auth_username: 'xiaoxiangyoupin@basestonedata.com'
    smtp_auth_password: 'Xiaoxiangyoupin1'
    templates:
    - '/tmp/alert_test.txt' # 发送消息模版.
    route:
    group_by: ['proxy']
    group_wait: 10s
    group_interval: 20s
    repeat_interval: 5m # 警告发送成功后,等待该配置时间后才再次发送.
    receiver: 'proxy-team'
    # 以上配置会被routes标签继承或覆盖.
    # routes:
    receivers: # 接收方集合
    - name: 'proxy-team'
    email_configs:
    - to: '363054731@qq.com,jiangwe@basestonedata.com'
    text: '报警'
    # 防止过度报警.
    #inhibit_rules:
    # - source_match:
    # severity: 'critical'
    # target_match:
    # severity: 'warning'
    # equal: ['alertname', 'dev', 'instance']
坚持原创技术分享,您的支持将鼓励我继续创作!
Fork me on GitHub