Prometheus是一个开源系统监测和警报工具箱。
主要特征:
多维数据模型(时间序列由metri和key/value定义)
灵活的查询语言
不依赖分布式存储
采用 http 协议,使用 pull 拉取数据
可以通过push gateway进行时序列数据推送
可通过服务发现或静态配置发现目标
多种可视化图表及仪表盘支持
Prometheus架构如下:
Prometheus组件包括:Prometheus server、push gateway 、alertmanager、Web UI等。
Prometheus server 定期从数据源拉取数据,然后将数据持久化到磁盘。Prometheus 可以配置 rules,然后定时查询数据,当条件触发的时候,会将 alert 推送到配置的 Alertmanager。Alertmanager 收到警告的时候,可以根据配置,聚合并记录新时间序列,或者生成警报。同时还可以使用其他 API 或者 Grafana 来将收集到的数据进行可视化。
二、安装Prometheus Operator
1.Prometheus Operator简化了在 Kubernetes 上部署并管理和运行 Prometheus 和 Alertmanager 集群。
#wgethttps://codeload.github.com/coreos/prometheus-operator/tar.gz/v0.18.0-Oprometheus-operator-0.18.0.tar.gz#tar-zxvfprometheus-operator-0.18.0.tar.gz#cdprometheus-operator-0.18.0#kubectlapply-fbundle.yamlclusterrolebinding"prometheus-operator"configuredclusterrole"prometheus-operator"configuredserviceaccount"prometheus-operator"createddeployment"prometheus-operator"created#cdcontrib/kube-prometheus#hack/cluster-monitoring/deploynamespace"monitoring"createdclusterrolebinding"prometheus-operator"createdclusterrole"prometheus-operator"createdserviceaccount"prometheus-operator"createdservice"prometheus-operator"createddeployment"prometheus-operator"createdWaitingforOperatortoregistercustomresourcedefinitions...done!clusterrolebinding"node-exporter"createdclusterrole"node-exporter"createddaemonset"node-exporter"createdserviceaccount"node-exporter"createdservice"node-exporter"createdclusterrolebinding"kube-state-metrics"createdclusterrole"kube-state-metrics"createddeployment"kube-state-metrics"createdrolebinding"kube-state-metrics"createdrole"kube-state-metrics-resizer"createdserviceaccount"kube-state-metrics"createdservice"kube-state-metrics"createdsecret"grafana-credentials"createdsecret"grafana-credentials"createdconfigmap"grafana-dashboard-definitions-0"createdconfigmap"grafana-dashboards"createdconfigmap"grafana-datasources"createddeployment"grafana"createdservice"grafana"createdconfigmap"prometheus-k8s-rules"createdserviceaccount"prometheus-k8s"createdservicemonitor"alertmanager"createdservicemonitor"kube-apiserver"createdservicemonitor"kube-controller-manager"createdservicemonitor"kube-scheduler"createdservicemonitor"kube-state-metrics"createdservicemonitor"kubelet"createdservicemonitor"node-exporter"createdservicemonitor"prometheus-operator"createdservicemonitor"prometheus"createdservice"prometheus-k8s"createdprometheus"k8s"createdrole"prometheus-k8s"createdrole"prometheus-k8s"createdrole"prometheus-k8s"createdclusterrole"prometheus-k8s"createdrolebinding"prometheus-k8s"createdrolebinding"prometheus-k8s"createdrolebinding"prometheus-k8s"createdclusterrolebinding"prometheus-k8s"createdsecret"alertmanager-main"createdservice"alertmanager-main"createdalertmanager"main"created#kubectlgetpod-nmonitoringNAMEREADYSTATUSRESTARTSAGEalertmanager-main-02/2Running015halertmanager-main-12/2Running015halertmanager-main-22/2Running015hgrafana-567fcdf7b7-44ldd1/1Running015hkube-state-metrics-76b4dc5ffb-2vbh94/4Running015hnode-exporter-9wm8c2/2Running015hnode-exporter-kf6mq2/2Running015hnode-exporter-xtm4r2/2Running015hprometheus-k8s-02/2Running015hprometheus-k8s-12/2Running015hprometheus-operator-7466f6887f-9nsk81/1Running015h#kubectl-nmonitoringgetsvcNAMETYPECLUSTER-IPEXTERNAL-IPPORT(S)AGEalertmanager-mainNodePort10.244.69.39<none>9093:30903/TCP15halertmanager-operatedClusterIPNone<none>9093/TCP,6783/TCP15hgrafanaNodePort10.244.86.54<none>3000:30902/TCP15hkube-state-metricsClusterIPNone<none>8443/TCP,9443/TCP15hnode-exporterClusterIPNone<none>9100/TCP15hprometheus-k8sNodePort10.244.226.104<none>9090:30900/TCP15hprometheus-operatedClusterIPNone<none>9090/TCP15hprometheus-operatorClusterIP10.244.9.203<none>8080/TCP15h#kubectl-nmonitoringgetendpointsNAMEENDPOINTSAGEalertmanager-main10.244.2.10:9093,10.244.35.4:9093,10.244.91.5:909315halertmanager-operated10.244.2.10:9093,10.244.35.4:9093,10.244.91.5:9093+3more...15hgrafana10.244.2.8:300015hkube-state-metrics10.244.2.9:9443,10.244.2.9:844315hnode-exporter192.168.100.102:9100,192.168.100.103:9100,192.168.100.105:910015hprometheus-k8s10.244.2.11:9090,10.244.35.5:909015hprometheus-operated10.244.2.11:9090,10.244.35.5:909015hprometheus-operator10.244.35.3:808015h#kubectl-nmonitoringgetservicemonitorsNAMEAGEalertmanager15hkube-apiserver15hkube-controller-manager15hkube-scheduler15hkube-state-metrics15hkubelet15hnode-exporter15hprometheus15hprometheus-operator15h#kubectlgetcustomresourcedefinitionsNAMEAGEalertmanagers.monitoring.coreos.com11dprometheuses.monitoring.coreos.com11dservicemonitors.monitoring.coreos.com11d
注:
在部署过程中我将镜像地址都更改为从本地镜像仓库进行拉取,但是有pod依然会从远端拉取镜像,如下:
这里我是无法拉取alertmanager的镜像,解决方法就是先将该镜像拉取到本地,然后打包分发至各节点:
#dockersave23744b2d645c-oalertmanager-v0.14.0.tar.gz#ansiblenode-mcopy-a'src=alertmanager-v0.14.0.tar.gzdest=/root'#ansiblenode-a'dockerload-i/root/alertmanager-v0.14.0.tar.gz'192.168.100.104|SUCCESS|rc=0>>LoadedimageID:sha256:23744b2d645c0574015adfba4a90283b79251aee3169dbe67f335d8465a8a63f192.168.100.103|SUCCESS|rc=0>>LoadedimageID:sha256:23744b2d645c0574015adfba4a90283b79251aee3169dbe67f335d8465a8a63f#ansiblenode-a'dockerimagesquay.io/prometheus/alertmanager'192.168.100.103|SUCCESS|rc=0>>REPOSITORYTAGIMAGEIDCREATEDSIZEquay.io/prometheus/alertmanagerv0.14.023744b2d645c7weeksago31.9MB192.168.100.104|SUCCESS|rc=0>>REPOSITORYTAGIMAGEIDCREATEDSIZEquay.io/prometheus/alertmanagerv0.14.023744b2d645c7weeksago31.9MB
2.添加 etcd 监控
Prometheus Operator有 etcd 仪表盘,但是需要额外的配置才能完全监控显示。官方文档:Monitoring external etcd
a.在 namespace 中创建secrets
#kubectl-nmonitoringcreatesecretgenericetcd-certs--from-file=/etc/kubernetes/ssl/ca.pem--from-file=/etc/kubernetes/ssl/etcd.pem--from-file=/etc/kubernetes/ssl/etcd-key.pemsecret"etcd-certs"created#kubectl-nmonitoringgetsecretsetcd-certsNAMETYPEDATAAGEetcd-certsOpaque316h
注:这里的证书是在部署 etcd 集群时创建,请更改为自己证书存放的路径。
b.使Prometheus Operator接入secret
#vimmanifests/prometheus/prometheus-k8s.yamlapiVersion:monitoring.coreos.com/v1kind:Prometheusmetadata:name:k8slabels:prometheus:k8sspec:replicas:2secrets:-etcd-certsversion:v2.2.1#kubectl-nmonitoringreplace-fmanifests/prometheus/prometheus-k8s.yamlprometheus"k8s"replaced
注:
这里只需加入如下项即可:
secrets:-etcd-certs
c.创建Service、Endpoints和ServiceMonitor服务
#vimmanifests/prometheus/prometheus-etcd.yamlapiVersion:v1kind:Servicemetadata:name:etcd-k8slabels:k8s-app:etcdspec:type:ClusterIPclusterIP:Noneports:-name:apiport:2379protocol:TCP---apiVersion:v1kind:Endpointsmetadata:name:etcd-k8slabels:k8s-app:etcdsubsets:-addresses:-ip:192.168.100.102nodeName:etcd1-ip:192.168.100.103nodeName:etcd2-ip:192.168.100.104nodeName:etcd3ports:-name:apiport:2379protocol:TCP---apiVersion:monitoring.coreos.com/v1kind:ServiceMonitormetadata:name:etcd-k8slabels:k8s-app:etcd-k8sspec:jobLabel:k8s-appendpoints:-port:apiinterval:30sscheme:httpstlsConfig:caFile:/etc/prometheus/secrets/etcd-certs/ca.pemcertFile:/etc/prometheus/secrets/etcd-certs/etcd.pemkeyFile:/etc/prometheus/secrets/etcd-certs/etcd-key.pem#useinsecureSkipVerifyonlyifyoucannotuseaSubjectAlternativeNameinsecureSkipVerify:trueselector:matchLabels:k8s-app:etcdnamespaceSelector:matchNames:-monitoring#kubectlcreate-fmanifests/prometheus/prometheus-etcd.yaml
注1:请将 etcd 的ip地址和 etcd 的节点名更改为自行配置的ip和节点名。
注2:在 tlsconfig 下边的三项只需更改最后的ca.pem、etcd.pem、etcd-key.pem为自己相应的证书名即可。如实在不了解,可登陆进 prometheus-k8s 的pod进行查看:
#kubectlexec-ti-nmonitoringprometheus-k8s-0/bin/shDefaultingcontainernametoprometheus.Use'kubectldescribepod/prometheus-k8s-0-nmonitoring'toseeallofthecontainersinthispod./prometheus$ls/etc/prometheus/secrets/etcd-certs/ca.pemetcd-key.pemetcd.pem
3.Prometheus Operator 部署完成后会对外暴露三个端口:30900为Prometheus端口、30902为grafana端口、30903为alertmanager端口。
Prometheus显示如下,如何一切正常,所有target都应该是up的。
Alertmanager显示如下
Grafana的监控项显示如下
etcd相关监控项显示如下
kubernetes集群显示如下
节点监控显示如下
使用Prometheus Operator 监控Kubernetes
原文地址:http://blog.51cto.com/wangzhijian/2096182