[Monitoring] 03. 프로메테우스 (CentOS Stream 8 기준) - Target 노드 추가, 경고 알림 설정(E-mail)

Date:     Updated:

카테고리:

태그:


03-1. 프로메테우스 - Target 노드 추가 (node_exporter 설치)


  • node exporter는 Linux 시스템의 메트릭 데이터(CPU/Memory/Disk/Network Traffic 등)를 수집하고, 제공한다.

  • https://prometheus.io/download/#node_exporter


■ node_exporter 사용자 생성

useradd -m -s /bin/false node_exporter


■ node_exporter 패키지 다운로드

{
    wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
    tar -zxpvf node_exporter-1.6.1.linux-amd64.tar.gz
    cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
    chown node_exporter:node_exporter /usr/local/bin/node_exporter
}


■ 서비스 파일 생성

cat <<EOF > /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

{
    systemctl daemon-reload
    systemctl enable node_exporter --now
    systemctl status node_exporter
    netstat -nlp | grep 9100
}


■ 방화벽 설정

{
	firewall-cmd --add-port=9100/tcp --permanent
	firewall-cmd --reload
}


■ node_exporter 노드 정보 추가 (프로메테우스 서버)

vi /etc/prometheus/prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  # 추가
  - job_name: "node"
    static_configs:
    - targets: ["localhost:9100", "node01.test.srv:9100"]

  # 추가
  - job_name: "Test"
    static_configs:
    - targets: ["node01.test.srv:9100"]


■ prometheus 서비스를 재기동

systemctl restart prometheus


■ [Status] - [Targets]를 클릭하여 새 노드가 나열되는지 확인

77777



03-2. 프로메테우스 - 경고 알림 설정 (E-mail)


  • 알림을 받는 방법은 여러가지가 있지만 예제에서는 이메일로 알림을 구성한다.

  • Alerting에 대한 문서


■ SMTP 서버 설치 및 설정

  • 이 예제에서는 SMTP 서버가 localhost에서 실행되는 환경을 기반으로 한다.
{
	dnf -y install postfix
	sed -i 's/#myhostname = host.domain.tld/myhostname = dlp.test.srv/gi' /etc/postfix/main.cf
	sed -i 's/#mydomain = domain.tld/mydomain = test.srv/gi' /etc/postfix/main.cf
	sed -i 's/#myorigin = $mydomain/myorigin = $mydomain/gi' /etc/postfix/main.cf
	sed -i 's/inet_interfaces = localhost/inet_interfaces = all/gi' /etc/postfix/main.cf
	sed -i 's/inet_protocols = all/inet_protocols = ipv4/gi' /etc/postfix/main.cf
	sed -i 's/mydestination = $myhostname, localhost.$mydomain, localhost/mydestination = $myhostname, localhost.$mydomain, localhost, $mydomain/gi' /etc/postfix/main.cf
	sed -i 's/#mynetworks = 168.100.189.0\/28, 127.0.0.0\/8/mynetworks = 127.0.0.0\/8, 192.168.219.0\/24/gi' /etc/postfix/main.cf
	sed -i 's/#home_mailbox = Maildir\//home_mailbox = Maildir\//gi' /etc/postfix/main.cf
cat <<EOF >> /etc/postfix/main.cf


# add follows to the end
# disable SMTP VRFY command
disable_vrfy_command = yes

# require HELO command to sender hosts
smtpd_helo_required = yes

# limit an email size
# example below means 10M bytes limit
message_size_limit = 10240000

# SMTP-Auth settings
smtpd_sasl_type = dovecot
smtpd_sasl_path = private/auth
smtpd_sasl_auth_enable = yes
smtpd_sasl_security_options = noanonymous
smtpd_sasl_local_domain = \$myhostname
smtpd_recipient_restrictions = permit_mynetworks, permit_auth_destination, permit_sasl_authenticated, reject
EOF
	systemctl enable --now postfix
	firewall-cmd --add-service=smtp
	firewall-cmd --runtime-to-permanent
	dnf -y install dovecot
	sed -i 's/#listen = \*\, \:\:/listen = \*\, \:\:/gi' /etc/dovecot/dovecot.conf
	sed -i 's/#disable_plaintext_auth = yes/disable_plaintext_auth = no/gi' /etc/dovecot/conf.d/10-auth.conf
	sed -i 's/auth_mechanisms = plain/auth_mechanisms = plain login/gi' /etc/dovecot/conf.d/10-auth.conf
	sed -i 's/#mail_location =/mail_location = maildir\:\~\/Maildir/gi' /etc/dovecot/conf.d/10-mail.conf
	sed -i 's/#unix_listener \/var\/spool\/postfix\/private\/auth {/unix_listener \/var\/spool\/postfix\/private\/auth {\n  mode = 0666\n  user = postfix\n  group = postfix\n}/' /etc/dovecot/conf.d/10-master.conf
	sed -i 's/ssl = required/ssl = yes/gi' /etc/dovecot/conf.d/10-ssl.conf
	systemctl enable --now dovecot
	firewall-cmd --add-service={pop3,imap}
	firewall-cmd --runtime-to-permanent
	dnf -y install mailx
	echo 'export MAIL=$HOME/Maildir' >> /etc/profile.d/mail.sh
	sed -i 's/SELINUX=enforcing/SELINUX=disabled/gi' /etc/selinux/config
	reboot
}


■ 경고 알림 설정


▶ 프로메테우스 서버에 Alertmanager 설치

  • https://prometheus.io/download/#alertmanager
{
    mkdir /etc/alertmanager
    dnf -y install wget
    wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz -P /tmp
	  cd /tmp/
    tar -zxpvf alertmanager-0.25.0.linux-amd64.tar.gz
    cp alertmanager-0.25.0.linux-amd64/* /etc/alertmanager/
}
cat <<EOF > /etc/systemd/system/alertmanager.service
[Unit]
Description=Alert Manager
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/etc/alertmanager/alertmanager --config.file=/etc/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target
EOF
{
	systemctl daemon-reload
	systemctl enable alertmanager --now
	systemctl status alertmanager
	netstat -nlp | grep 9093
 	firewall-cmd --add-port=9093/tcp --permanent
	firewall-cmd --reload
}


▶ 프로메테우스 알림 설정

mv /etc/alertmanager/alertmanager.yml /etc/alertmanager/alertmanager.yml.org

vim /etc/alertmanager/alertmanager.yml
# create new
global:
  # SMTP server to use
  smtp_smarthost: 'localhost:25'
  # require TLS or not
  smtp_require_tls: false
  # notification sender's Email address
  smtp_from: 'Alertmanager <root@dlp.test.srv>'
  # if set SMTP Auth on SMTP server, set below, too
  # smtp_auth_username: 'alertmanager'
  # smtp_auth_password: 'password'

route:
  # Receiver name for notification
  receiver: 'email-notice'
  # grouping definition
  group_by: ['alertname', 'Service', 'Stage', 'Role']
  group_wait:      30s
  group_interval:  5m
  repeat_interval: 4h

receivers:
# any name of Receiver
- name: 'email-notice'
  email_configs:
  # destination Email address
  - to: "root@localhost"
# configure Alerting rules
vi /etc/prometheus/alert_rules.yml
# create new
# for example, monitor node-exporter's [Up/Down]
groups:
- name: Instances
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      description: ' of job  has been down for more than 5 minutes.'
      summary: 'Instance  down'
vi /etc/prometheus/prometheus.yml

...
...

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # line 12 : change to (Alertmanager Host):(Port)
      - 'localhost:9093'

# line 18 : add alert rules created above
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "alert_rules.yml"
{
  systemctl restart prometheus alertmanager
  systemctl enable alertmanager
}


■ 결과 확인


▶ alertmanager 페이지 (“http://prometheus_server_ip:9090”)

164644


▶ 타겟 서버가 다운되면 이메일이 전송

66664444

[root@dlp ~]# mail
Heirloom Mail version 12.5 7/5/10.  Type ? for help.
"/root/Maildir": 1 messages 1 new 1 unread
>N  1 Alertmanager          Fri Aug  4 15:14 260/12747 "[FIRING:2] InstanceDown (node01.test.srv:9100 critica"
& 1
Message  1:
From root@dlp.test.srv Fri Aug  4 15:14:44 2023
Return-Path: <root@dlp.test.srv>
X-Original-To: root@localhost
Delivered-To: root@localhost
Subject: [FIRING:2] InstanceDown (node01.test.srv:9100 critical)
To: root@localhost
From: Alertmanager <root@dlp.test.srv>
Date: Fri, 04 Aug 2023 15:14:44 +0900
Content-Type: multipart/alternative;  boundary=ae33085aecff41c3ddeb142b0a8700539d1ece2f28e86356b1b78bdb65af
Status: R

Part 1:
Content-Type: text/html; charset=UTF-8

....
....

55555556666666

MONITORING 카테고리 내 다른 글 보러가기

댓글 남기기