[Monitoring] 02. 프로메테우스(Prometheus) 설치 (CentOS Stream 8 기준)

Date:     Updated:

카테고리:

태그:


02. 프로메테우스(Prometheus) 설치



■ prometheus 사용자 생성

{
	useradd -m -s /bin/false prometheus
	id prometheus
}


■ prometheus 디렉토리 생성

{
	mkdir /etc/prometheus
	mkdir /var/lib/prometheus
	chown prometheus /var/lib/prometheus
}


■ prometheus 패키지 다운로드

  • https://prometheus.io/download/#prometheus
{
	dnf -y install wget
	wget https://github.com/prometheus/prometheus/releases/download/v2.46.0/prometheus-2.46.0.linux-amd64.tar.gz -P /tmp
	cd /tmp/
	tar -zxpvf prometheus-2.46.0.linux-amd64.tar.gz
	cd prometheus-2.46.0.linux-amd64
	cp -pr prometheus /usr/local/bin/
	cp -pr promtool /usr/local/bin/
	cp -pr prometheus.yml /etc/prometheus/
}


■ 서비스 파일 생성

cat <<EOF > /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Time Series Collection and Processing Server
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \\
    --config.file /etc/prometheus/prometheus.yml \\
    --storage.tsdb.path /var/lib/prometheus/ \\
    --web.console.templates=/etc/prometheus/consoles \\
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
EOF


{
	systemctl daemon-reload
	systemctl enable prometheus --now
	systemctl status prometheus
	netstat -nlp | grep 9090
}


■ 방화벽 설정

{
	firewall-cmd --add-port=9090/tcp --permanent
	firewall-cmd --reload
}


02. node_exporter 설치


  • node exporter는 Linux 시스템의 메트릭 데이터(CPU/Memory/Disk/Network Traffic 등)를 수집하고, 제공한다.

  • https://prometheus.io/download/#node_exporter


■ node_exporter 사용자 생성

useradd -m -s /bin/false node_exporter


■ node_exporter 패키지 다운로드

{
    wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
    tar -zxpvf node_exporter-1.6.1.linux-amd64.tar.gz
    cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
    chown node_exporter:node_exporter /usr/local/bin/node_exporter
}


■ 서비스 파일 생성

cat <<EOF > /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

{
    systemctl daemon-reload
    systemctl enable node_exporter --now
    systemctl status node_exporter
    netstat -nlp | grep 9100
}


■ 방화벽 설정

{
	firewall-cmd --add-port=9100/tcp --permanent
	firewall-cmd --reload
}


■ node_exporter 노드 정보 추가 (프로메테우스 서버)

vi /etc/prometheus/prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  # 추가
  - job_name: "node"
    static_configs:
    - targets: ["localhost:9100"]


■ prometheus 서비스를 재기동 (프로메테우스 서버)

systemctl restart prometheus


■ metric 정보확인

curl http://localhost:9100/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.4137e-05
go_gc_duration_seconds{quantile="0.25"} 3.4137e-05
go_gc_duration_seconds{quantile="0.5"} 8.1547e-05
go_gc_duration_seconds{quantile="0.75"} 8.1547e-05
go_gc_duration_seconds{quantile="1"} 8.1547e-05
go_gc_duration_seconds_sum 0.000115684
go_gc_duration_seconds_count 2
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.20.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 2.481704e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 6.30988e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.446544e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 49214
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 8.281024e+06
...
...


03. 결과 확인



■ “http://prometheus_server_ip:9090”에 액세스하면 다음과 같은 UI가 표시된다.

111111


■ 해당 표시를 클릭하면 시계열 데이터를 보기 위한 쿼리들이 있는 것을 확인할 수 있다.

222222

333333


■ node_procs_running을 쿼리를 실행

444444

555555


■ [Status]-[Targets]를 클릭하면 각 노드의 상태를 확인할 수 있다.

66666

MONITORING 카테고리 내 다른 글 보러가기

댓글 남기기