監視照会 (Cluster Admin)

印刷

変更日： 2025-03-31 23:52

Cloud Z CP のモニタリングは、次のオープンソースコンポーネントで構成してサービスを提供します。

Metricを収集して保存するため Prometheus
Prometheusによって作成されたアラートを処理する Alertmanager
3rd-partyのmetricを外部にエクスポートしてPrometheusが収集できるようにするための各種 Exporter（node-exporter、kube-state-metrics、blackbox-exporter、elasticsearch-exporterなど）
最後に収集されたMetricをPrometheus Queryを使用して視覚化し、ユーザーがわかりやすい形のDashboardとして提供する Grafana

ここでは、GrafanaのDashboardの使い方と基本的に提供するDashboardの各項目を中心に説明します。

Grafanaの詳細や使用方法を確認するには Grafana Docsを参照してください。

サービスを利用するには、ZCP Consoleサイドメニューの[モニタリング]をクリックします。

バージョンが更新され、追加、修正、削除されたダッシュボードやパネルに変更された情報は、次の凡例として表示されます。

バージョン、内容：追加されたダッシュボードまたはパネル
バージョン、内容：変更されたダッシュボードまたはパネル
バージョン、内容：削除されたダッシュボードまたはパネル

Dashboard に移動

上部のホームメニューを選択してください。
最近選択したDashboard（Recent）と基本提供Folder（4個）が見えます。
基本提供Folderの中から1つを選択すると、Folderに属しているDashboardが展開されます。
Dashboardを選択すると、各種パネルで構成されている画面に出会うことができます。

組み込みDashboard

Cloud Z CP Public が基本として提供する Dashboard について説明します。

Addon Dashboards

ElasticSearch

elasticsearchに関する情報を表示する（JVM、CPU、Memory、Documents、Indicesなど）

Row	Pannel	Description
KPI	Cluster health	elasticsearch clusterの現在の状態（N / A / Green / Yellow / Red）
	Tripped for breakers	cluster が壊れてトリプされた平均値
	CPU usage Avg.	CPU平均使用量
	JVM memory used Avg.	JVMメモリの平均使用量
	Nodes	Number of nodes in the cluster.
	Data nodes	Number of data nodes in the cluster.
	Pending tasks	Cluster level changes which have not yet been executed.
	Openfile descriptors per cluster	elasticsearchで実行されるオープンファイル数の合計
Shards	Active primary shards	The number of primary shards in your cluster. This is an aggregate total across all indices.
	Active shards	Aggregate total of all shards across all indices, which includes replica shards.
	Initializing shards	Count of shards that are being freshly created.
	Relocating shards	The number of shards that are currently moving from one node to another node.
	Delayed shards	Shards delayed to reduce reallocation overhead.
	Unassigned shards	The number of shards that exist in the cluster state, but cannot be found in the cluster itself.
JVM Garbage Collection	GC count	Garbage Collectionが処理する数
JVM Garbage Collection	GC time	Garbage Collectionが処理する時間
CPU and Memory	Load average	elasticsearchが使用するLoadの平均
	CPU usage	elasticsearchが使用するCPU使用率
	JVM memory usage	elasticsearchが使用するJVMメモリ使用量
	JVM memory committed	elasticsearchがコミットするために使用するJVMメモリ使用量
Disk and Network	Disk usage	elasticsearchが使用するDisk使用量
Disk and Network	Network usage	elasticsearchが使用するネットワーク使用量
Documents	Documents count on node	data nodeに格納されている文書の数
	Documents indexed rate	documentがindexされた割合
	Documents deleted rate	document が削除された割合
	Documents merged rate	documentがmergeされた割合
	Documents merged bytes	documentがmergeされた容量(bytes)
Times	Query time	Query 実行時間
	Indexing time	Indexing 実行時間
	Merging time	マージング実行時間
	Throttle time for index store	indexを保存するためのthrottle時間
Indices: Count of documents and Total size	Count of documents with only primary shards	プライマリシャードのドキュメント数
	Total size of stored index data in bytes with only primary shards on all nodes	プライマリシャードが格納されているインデックスデータの総容量
	Total size of stored index data in bytes with all shards on all nodes	すべてのシャードが保存されているインデックスデータの総容量
Indices: Index writer	Index writer with only primary shards on all nodes in bytes	primary shard が index で書かれている容量
Indices: Index writer	Index writer with all shards on all nodes in bytes	すべての shard が index で書かれている容量

ZCP Services Status

zcp-system namespace의 health check (CPU usages, 상태값)

Panel	Description
Duration	probe duration seconds
Status : alertmanager	alertmanager health (UP / DOWN)
alertmanager Status Code	alertmanagerステータスコード
Status : grafana	grafana health (UP / DOWN)
grafana Status Code	grafanaステータスコード
Status : prometheus	prometheus health (UP / DOWN)
prometheus Status Code	prometheusステータスコード

Cluster Dashboards

Etcd Cluster

Etcdステータス値（RPC Rate、DB Size、Disk Sync Durationなど）

Panel	Description
Etcd has a leader?	Etcdがリーダーを持っているかどうかをチェックする（YES / NO）
The number of leader changes seen	Etcd leaderが変わった数
The total number of failed proposals seen	proposalが失敗した総数
RPC Rate	gRPCが5分間開始またはハンドリングされた数
Etcd DB Size	Etcd debugging mvcc db total size in bytes
Etcd Disk Sync Duration	5分間etc diskがwal fsyncした合計数（Histogram 99）
Etcd Memory	'etcd' jobのメモリ使用量
Etcd Client Traffic In	etcd network client gRPCが5分間受信したトラフィックの総数
Etcd Client Traffic Out	etcd network client gRPCが5分間送信したトラフィックの総数
Etcd Peer Traffic In	etcd network peerが5分間受信したトラフィックの総数
Etcd Peer Traffic Out	etcd network peerが5分間送信したトラフィックの総数
Etcd Proposals rate(Fail,Pending,commit,apply)	etcd serverが5分間 proposalした総コミット数
Etcd Disk operations(AVG)	etcd diskが2分間バックエンドコミットした合計数
Network	etcd network client gRPCが2分間受信したトラフィックの総数
Snapshot duration	Abnormally high snapshot duration (snapshot_save_total_duration_seconds) indicates disk issues and might cause the cluster to be unstable.

Kubernetes: Cluster Overview

全体/ノード平均/クラスタ平均リソースに関する情報（ノード/ポッド/コンテナ数、CPU/メモリ/ネットワーク使用量など）

Row	Panel	Description
Resource Dashboard	Alertmanager Alerts Firing	アラートの総数
	Node Not Ready	Nodeが「Not Ready」状態の数
	Node Unschedulable	Nodeが「Unschedulable」状態の数
	Node Memory Pressure	Nodeが「Memory Pressure」状態の数
	Node Disk Pressure	Nodeが「Disk Pressure」状態の数
	Running Pod Total	現在「ランニング」状態のポッドの数
	Running Pod Total by Node	各ノードで現在「Running」状態のPodの数
	Running Container Total	現在「Running」状態のContainerの数
	Running Container Total by Node	各ノードで現在「Running」状態のContainerの数
Node Resource Usage	Number of Node	現在のクラスタ内ノードの総数
	Total CPU	現在のクラスタ内ノードのCPU合計
	Used Memory	現在のクラスタ内ノードのメモリ使用量
	Total Memory	現在のクラスタ内ノードのメモリ合計
	DIsk Usage	現在のクラスタ内ノードのDIskの合計
	DIsk Total	現在のクラスタ内ノードのDIskの合計
	Avg CPU Usage	現在のクラスタ内ノードのCPU平均使用量
	Avg Memory Usage	現在のクラスタ内ノードのメモリ平均使用量
	Avg Disk Usage	現在のクラスタ内ノードのディスク平均使用量
	Network Usage (Node NIC)	現在のクラスタ内ノードのネットワーク使用量
Cluster Resource Usage	Cluster CPU Usage(Used/Total)	現在のクラスタ内ノードのCPU全体の使用量（％） - 加えて、下に総CPU量（Core）と使用された量も表記される
	Cluster Memory Usage(Used/Total)	現在のクラスタ内ノードのメモリ全体の使用量（％） - 加えて、下に全体のメモリ量（Gib）と使用された量も表記される
	Cluster DIsk Usage(Used/Total)	現在のクラスタ内ノードのDIsk全体の使用量（％） - 加えて、下に全体DIsk量（Gib）と使用された量も表記される
	Pod Count by namespace	Namespace別にkubernetesに登録されたPodの数
	Container Count by namespace	Namespace 別に kubernetes に登録された Container の数

Kubernetes: Performance Overview

API Server Requests/Latency、Pod/Container Running Trands、Creating Rateなど

Panel	Description
APIServer Request Rate	APIServerで2分単位で要求した合計
APIServer Latency	APIServerが要求遅延した平均
Kubelet POD Start Latency	Latency in microseconds for a single pod to go from pending to running. Broken down by podname.
Running Pod Trands	kubeletで2分間新しく作成されたポッドの割合
Create Rate of Pods	kubeletで2分間新しく作成されたポッドの割合
Running Containers Trands	kubeletにおける「running」状態のContainersの数
Create Rate of Containers	kubeletで2分間新しく作成されたContainerの割合

Kubernetes: Resource Requests

NodeのCPU/Memory usages, Pod countに関する情報を表示

Panel	Description
Cluster CPU(Allocated/Request)	This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster. For comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.
Cluster Memory(Allocated/Request)	This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster. For comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.
Cluster Pod(Allocated/Request)	This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run) in the cluster. For comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.

Panel

Description

Cluster CPU(Allocated/Request)

This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster.

For comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.

Cluster Memory(Allocated/Request)

This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster.

For comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.

Cluster Pod(Allocated/Request)

This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run) in the cluster.

For comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.

Container Dashboards

Kubernetes: DaemonSet Overview

Daemonsetに関する情報（Replicas、CPU/Memory/Network/Filesystem（v1.1.0）など）

Panel	Description
Desired Replicas ( v1.1.0, DESIRED)	スケジュール要求されたdaemonsetの数 The number of nodes that should be running the daemon pod
CURRENT ( v1.1.0)	現在スケジュールされているdaemonsetの数
READY ( v1.1.0)	現在動作して準備されているdaemonsetの数
Available Replicas ( v1.1.0, AVAILABLE)	現在動作しており、使用中のdaemonsetの数
Metadata Generation	Metadataで生成されたdaemonsetの数
DaemonSet Create Time	現在から最も長い前に生成されたdaemonsetの時間
Total CPU	Daemonsetで作成されたコンテナで使用されるCPUの合計（コア）
Total Memory	Daemonsetで作成されたContainerで使用されたMemoryの合計（MiB）
Total Network	Daemonsetで作成されたContainerで使用されるNetworkの合計（MBps）
CPU Usage	Daemonsetで作成されたContainerのCPUの使用量
Memory Usage	Daemonsetで作成されたContainerのMemoryの使用量
Filesystem Read/Write ( v1.1.0)	Daemonset で作成された Container の Filesystem Read/Write 使用量
Network TX/RX ( v1.1.0)	Daemonsetで作成されたContainerのNetwork Transmit / Receive使用量
Replicas Status	DaemonsetのReplicaの状態（Ready / Available / Unavailable / Misscheduled）

Kubernetes: Deployment Overview

Deployment に関する情報 (Replicas、CPU/Memory/Network/Filesystem( v1.1.0) 등)

Panel	Description
Desired Replicas ( v1.1.0, DESIRED)	スケジュール要求されたデプロイメントレプリカの数
Available Replicas ( v1.1.0, AVAILABLE)	使用中のデプロイメントレプリカの数
Observed Generation	Observedで作成されたデプロイメントの数
Metadata Generation	Metadataで生成されたデプロイメントの数
Deployment Create Time	v1.1.0、デプロイメントで作成されたポッドのすべてのコンテナで使用されるCPUの合計（コア）
AVG CPU ( v1.1.0, Total CPU)	Deployment で作成された Container で使用された Memory の平均 (MiB) ( v1.1.0, Deployment으로 생성된 Pod 의 모든 Container들에서 사용된 CPU의 합계 (Core))
AVG Memory ( v1.1.0, Total Memory)	Deploymentで作成されたContainerで使用されたMemoryの平均（MiB） (v1.1.0, Deployment で作成された Pod のすべての Container で使用された Network の合計 (MiB))
AVG Network ( v1.1.0, Total Network)	Deployment で作成された Container で使用される Network の平均 (kBps) ( v1.1.0, Deployment で作成された Pod のすべての Container で使用された Network の合計 (MiB))
CPU Usage	Deploymentで作成されたContainerのCPUの使用量
Memory Usage	Deploymentで作成されたContainerのMemoryの使用量
Filesystem Read/Write ( v1.1.0)	Deployment で作成された Container の Filesystem Read/Write 使用量
Network TX/RX ( v1.1.0)	Deployment で作成された Container の Network Transmit/Receive 使用量
Replicas Status	Deployment の Replica の Spec (Replicas/Paused)
Spec	Deployment の Replica の Spec (Replicas/Paused)

Kubernetes: POD Overview

Pod에 대한 정보 (Pod의 status, restart count, pod에서 사용된 CPU/Memory/Network/Volume( v1.1.0)/Filesystem( v1.1.0) 표시

Panel	Description
POD Count	選択した名前空間のポッドの数
Pod Status	選択したネームスペース、ポッドの状態
Pod Restart Count	選択した名前空間、ポッドの再起動数
CPU Usage	選択した名前空間、ポッドのコンテナで使用されているCPUの使用量と推移
Memory Usage	選択した名前空間、ポッドのコンテナで使用されているメモリの使用量と推移
Volume Usage ( v1.1.0)	選択した Namespace、Pod の Container で使用される Persistent Volume の使用量と推移
Filsystem Read/Write ( v1.1.0)	選択した Namespace、Pod の Container で使用された Filesystem Read/Write 使用量の推移
Network TX/RX	選択した Namespace、Pod の Container で使用された Network の Transmit/Receive 使用量および推移

Kubernetes: StatefulSets Overview

StatefulSetsに関する情報 (Replicas, CPU/Memory/Network/Filesystem( v1.1.0) )

Pane	Description
Desired Replicas ( v1.1.0, DESIRED)	スケジュール要求された statefulset Replica の数
Available Replicas ( v1.1.0, AVAILABLE)	使用中の statefulset Replica の数
Observed Generation	Observed で生成された statefulset の数
Metadata Generation	Metadataで生成されたステートフルセットの数
Statefulset Create Time	現在から最も長い前に生成された statefulset の時間
Total CPU	Statefulset で生成された Container で使用される CPU の合計 (Core)
Total Memory	Statefulsetで作成されたContainerで使用されたMemoryの合計（MiB）
Total Network	Statefulsetで生成されたContainerで使用されたNetworkの合計（MBps）
CPU Usage	Statefulset で生成された Container で使用された Network の合計 (MBps)
Memory Usage	Statefulset で生成された Container の Memory の使用量
Filesystem Read/Write ( v1.1.0)	Statefulset で生成された Container の Filesystem Read/Write 使用量
Network TX/RX ( v1.1.0)	Statefulset で生成された Container の Network Transmit/Receive 使用量
Replicas Status	Statefulset の Replica の状態 (Corrent/Available)

System Dashboards

System Disk Space

各Nodeで使用されたDisk Usageの推移

Panel	Description
Root Disk 容量チェック	Amount of disk space used and available on various mount points. Running out of disk space on OS volume, database volume or volume used for temporary space can cause downtime. Some storage may also have reduced performance when small amount of space is available.

System Usage Overview

各ノードの使用量情報（Idle cpu、DISK I / O、Network received / transmitted、Memory / Disk Usageなど）

Pane	Description
CPU Core 별 Idle	選択したノード内のCPUの5分間のIdle平均
System Load(1,5,15)	選択したノードが平均的にロードされる割合（1分/ 5分/ 15分）
Memory Usage	選択したノードが平均的にロードされる割合(1分/5分/15分)
Memory Usage	選択したノードで使用されているメモリの総使用率（％）
DIsk I/O	選択したノードで使用されているDIskの種類別使用量（read / written）
Disk Usage	選択したノードで使用されているディスクの総使用率（％）
Network Interface 별 Received(Byte)	選択したノードで5分間ネットワークで受け取ったバイト数
Network Interface 별 Transmitted(Byte)	選択したノードで5分間ネットワークに送信されたバイト数

System: Overview

各ノードの要約情報（Load Average、Swap、CPU/Memory/Network Usageなど）

Panel	Description
System Uptime	選択したノードの選択された間隔時間中にシステムで更新された時間
Virtual CPU	選択したノードの選択された間隔時間中にシステムで更新された時間
RAM	選択したノードの選択された間隔時間中にシステムで更新された時間
Memory Available	選択したノードの現在のメモリ使用率（％）
Load Average	選択したノードの選択された間隔時間の平均負荷（min、max、avgを別々に表示）
Memory	選択したノードの選択したInterval時間の種類別（Total / Used / Available） Memory使用量（Gib） - min、max、avg別々に表示
CPU Usage	選択したノードの選択したInterval時間のidle / user / system / steal / iowait / softirq / nice CPU使用率(%) - min、max、avg別々に表示
Memory Distribution	選択したノードの選択した間隔時間の種類別（Cached / Used / Free / Buffers） Memory Distribution 使用量（Gib） - min、max、avg別々に表示
Network Traffic(KBps)	選択したノードの選択したInterval時間の種類別（各項目別Inbound / Outbound） Network Traffic使用量（kBps） - min、max、avg別々に表示
Network Utilization	選択したノードの選択したInterval時間の種類別(Sent / Received) Network Utilization使用量(MiB) - min、max、avg別々に表示
Swap	選択したノードの選択したInterval時間の種類別(Used / Free) Swap使用量(B) - min, max, avg 따로 표시
Swap Activity	選択したノードの選択したInterval時間の種類別（Swap In / Swap Out） Swap Activity使用量（Bps） - min, max, avg 따로 표시

ダッシュボード作成ガイド

http://docs.grafana.org/reference/templating/

オンライン相談

問い合わせ

Japanese