[Monitoring] 03. 프로메테우스 (CentOS Stream 8 기준) - Target 노드 추가, 경고 알림 설정(E-mail)
카테고리: MONITORING
태그: monitoring linux
03-1. 프로메테우스 - Target 노드 추가 (node_exporter 설치)
-
node exporter는 Linux 시스템의 메트릭 데이터(CPU/Memory/Disk/Network Traffic 등)를 수집하고, 제공한다.
-
https://prometheus.io/download/#node_exporter
■ node_exporter 사용자 생성
useradd -m -s /bin/false node_exporter
■ node_exporter 패키지 다운로드
{
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -zxpvf node_exporter-1.6.1.linux-amd64.tar.gz
cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
chown node_exporter:node_exporter /usr/local/bin/node_exporter
}
■ 서비스 파일 생성
cat <<EOF > /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOF
{
systemctl daemon-reload
systemctl enable node_exporter --now
systemctl status node_exporter
netstat -nlp | grep 9100
}
■ 방화벽 설정
{
firewall-cmd --add-port=9100/tcp --permanent
firewall-cmd --reload
}
■ node_exporter 노드 정보 추가 (프로메테우스 서버)
vi /etc/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
# 추가
- job_name: "node"
static_configs:
- targets: ["localhost:9100", "node01.test.srv:9100"]
# 추가
- job_name: "Test"
static_configs:
- targets: ["node01.test.srv:9100"]
■ prometheus 서비스를 재기동
systemctl restart prometheus
■ [Status] - [Targets]를 클릭하여 새 노드가 나열되는지 확인
03-2. 프로메테우스 - 경고 알림 설정 (E-mail)
-
알림을 받는 방법은 여러가지가 있지만 예제에서는 이메일로 알림을 구성한다.
■ SMTP 서버 설치 및 설정
- 이 예제에서는 SMTP 서버가 localhost에서 실행되는 환경을 기반으로 한다.
{
dnf -y install postfix
sed -i 's/#myhostname = host.domain.tld/myhostname = dlp.test.srv/gi' /etc/postfix/main.cf
sed -i 's/#mydomain = domain.tld/mydomain = test.srv/gi' /etc/postfix/main.cf
sed -i 's/#myorigin = $mydomain/myorigin = $mydomain/gi' /etc/postfix/main.cf
sed -i 's/inet_interfaces = localhost/inet_interfaces = all/gi' /etc/postfix/main.cf
sed -i 's/inet_protocols = all/inet_protocols = ipv4/gi' /etc/postfix/main.cf
sed -i 's/mydestination = $myhostname, localhost.$mydomain, localhost/mydestination = $myhostname, localhost.$mydomain, localhost, $mydomain/gi' /etc/postfix/main.cf
sed -i 's/#mynetworks = 168.100.189.0\/28, 127.0.0.0\/8/mynetworks = 127.0.0.0\/8, 192.168.219.0\/24/gi' /etc/postfix/main.cf
sed -i 's/#home_mailbox = Maildir\//home_mailbox = Maildir\//gi' /etc/postfix/main.cf
cat <<EOF >> /etc/postfix/main.cf
# add follows to the end
# disable SMTP VRFY command
disable_vrfy_command = yes
# require HELO command to sender hosts
smtpd_helo_required = yes
# limit an email size
# example below means 10M bytes limit
message_size_limit = 10240000
# SMTP-Auth settings
smtpd_sasl_type = dovecot
smtpd_sasl_path = private/auth
smtpd_sasl_auth_enable = yes
smtpd_sasl_security_options = noanonymous
smtpd_sasl_local_domain = \$myhostname
smtpd_recipient_restrictions = permit_mynetworks, permit_auth_destination, permit_sasl_authenticated, reject
EOF
systemctl enable --now postfix
firewall-cmd --add-service=smtp
firewall-cmd --runtime-to-permanent
dnf -y install dovecot
sed -i 's/#listen = \*\, \:\:/listen = \*\, \:\:/gi' /etc/dovecot/dovecot.conf
sed -i 's/#disable_plaintext_auth = yes/disable_plaintext_auth = no/gi' /etc/dovecot/conf.d/10-auth.conf
sed -i 's/auth_mechanisms = plain/auth_mechanisms = plain login/gi' /etc/dovecot/conf.d/10-auth.conf
sed -i 's/#mail_location =/mail_location = maildir\:\~\/Maildir/gi' /etc/dovecot/conf.d/10-mail.conf
sed -i 's/#unix_listener \/var\/spool\/postfix\/private\/auth {/unix_listener \/var\/spool\/postfix\/private\/auth {\n mode = 0666\n user = postfix\n group = postfix\n}/' /etc/dovecot/conf.d/10-master.conf
sed -i 's/ssl = required/ssl = yes/gi' /etc/dovecot/conf.d/10-ssl.conf
systemctl enable --now dovecot
firewall-cmd --add-service={pop3,imap}
firewall-cmd --runtime-to-permanent
dnf -y install mailx
echo 'export MAIL=$HOME/Maildir' >> /etc/profile.d/mail.sh
sed -i 's/SELINUX=enforcing/SELINUX=disabled/gi' /etc/selinux/config
reboot
}
■ 경고 알림 설정
▶ 프로메테우스 서버에 Alertmanager 설치
- https://prometheus.io/download/#alertmanager
{
mkdir /etc/alertmanager
dnf -y install wget
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz -P /tmp
cd /tmp/
tar -zxpvf alertmanager-0.25.0.linux-amd64.tar.gz
cp alertmanager-0.25.0.linux-amd64/* /etc/alertmanager/
}
cat <<EOF > /etc/systemd/system/alertmanager.service
[Unit]
Description=Alert Manager
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/etc/alertmanager/alertmanager --config.file=/etc/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target
EOF
{
systemctl daemon-reload
systemctl enable alertmanager --now
systemctl status alertmanager
netstat -nlp | grep 9093
firewall-cmd --add-port=9093/tcp --permanent
firewall-cmd --reload
}
▶ 프로메테우스 알림 설정
mv /etc/alertmanager/alertmanager.yml /etc/alertmanager/alertmanager.yml.org
vim /etc/alertmanager/alertmanager.yml
# create new
global:
# SMTP server to use
smtp_smarthost: 'localhost:25'
# require TLS or not
smtp_require_tls: false
# notification sender's Email address
smtp_from: 'Alertmanager <root@dlp.test.srv>'
# if set SMTP Auth on SMTP server, set below, too
# smtp_auth_username: 'alertmanager'
# smtp_auth_password: 'password'
route:
# Receiver name for notification
receiver: 'email-notice'
# grouping definition
group_by: ['alertname', 'Service', 'Stage', 'Role']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
# any name of Receiver
- name: 'email-notice'
email_configs:
# destination Email address
- to: "root@localhost"
# configure Alerting rules
vi /etc/prometheus/alert_rules.yml
# create new
# for example, monitor node-exporter's [Up/Down]
groups:
- name: Instances
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
description: ' of job has been down for more than 5 minutes.'
summary: 'Instance down'
vi /etc/prometheus/prometheus.yml
...
...
alerting:
alertmanagers:
- static_configs:
- targets:
# line 12 : change to (Alertmanager Host):(Port)
- 'localhost:9093'
# line 18 : add alert rules created above
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
- "alert_rules.yml"
{
systemctl restart prometheus alertmanager
systemctl enable alertmanager
}
■ 결과 확인
▶ alertmanager 페이지 (“http://prometheus_server_ip:9090”)
▶ 타겟 서버가 다운되면 이메일이 전송
[root@dlp ~]# mail
Heirloom Mail version 12.5 7/5/10. Type ? for help.
"/root/Maildir": 1 messages 1 new 1 unread
>N 1 Alertmanager Fri Aug 4 15:14 260/12747 "[FIRING:2] InstanceDown (node01.test.srv:9100 critica"
& 1
Message 1:
From root@dlp.test.srv Fri Aug 4 15:14:44 2023
Return-Path: <root@dlp.test.srv>
X-Original-To: root@localhost
Delivered-To: root@localhost
Subject: [FIRING:2] InstanceDown (node01.test.srv:9100 critical)
To: root@localhost
From: Alertmanager <root@dlp.test.srv>
Date: Fri, 04 Aug 2023 15:14:44 +0900
Content-Type: multipart/alternative; boundary=ae33085aecff41c3ddeb142b0a8700539d1ece2f28e86356b1b78bdb65af
Status: R
Part 1:
Content-Type: text/html; charset=UTF-8
....
....
댓글 남기기