欢迎大家订阅本人维护的微信公众号或者今日头条头条号,第一时间获取分享:)。
转载自阳明老哥技术blog,发现其中有部分问题加上自己实践踩坑
一般情况下我们是直接通过Dashboard的资源统计图标进行观察的,但是很显然如果要上到生产环境,就需要更自动化的方式来对集群、Pod甚至容器进行监控了。Kubernetes内置了一套监控方案:influxdb+grafana+heapster。但由于之前我们的应用的业务监控使用的是Prometheus,所以这里准备使用Prometheus来完成k8s的集群监控。
Prometheus 简介
Prometheus是SoundCloud开源的一款开源软件。它的实现参考了Google内部的监控实现,与源自Google的Kubernetes结合起来非常合适。另外相比influxdb的方案,性能更加突出,而且还内置了报警功能。它针对大规模的集群环境设计了拉取式的数据采集方式,你只需要在你的应用里面实现一个metrics接口,然后把这个接口告诉Prometheus就可以完成数据采集了。
部署 promethues 配置文件
首先我们使用ConfigMap的形式来设置Prometheus的配置文件,如下:
apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-system data: prometheus.yml: | global: scrape_interval: 30s scrape_timeout: 30s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'kubenetes-cluster' scheme: https tls_config: insecure_skip_verify: true kubernetes_sd_configs: - api_servers: - 'http://10.139.15.113:8080' role: node - job_name: 'kubernetes-nodes-cadvisor' tls_config: insecure_skip_verify: true kubernetes_sd_configs: - api_servers: - 'http://10.139.15.113:8080' in_cluster: true role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:4194' target_label: __address__ - job_name: 'kubernetes-apiserver-cadvisor' tls_config: insecure_skip_verify: true kubernetes_sd_configs: - api_servers: - 'http://10.139.15.113:8080' in_cluster: true role: apiserver relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:10255' target_label: __address__ - job_name: 'kubernetes-node-exporter' tls_config: insecure_skip_verify: true kubernetes_sd_configs: - api_servers: - 'http://10.139.15.113:8080' in_cluster: true role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:31672' target_label: __address__
将以上配置文件保存为 promethues-config.yaml,然后执行:
$ kubectl create -f prometheus-config.yaml
注意:
- job_name=kubernetes-apiserver-cadvisor 需要将10250端口替换成10255,10255端口是kubelet 实现的metrics,可以在节点上面curl查看内容,curl http://:10255/metrics
- job_name=kubernetes-nodes-cadvisor 需要将10250端口替换成4194,4194同样是kubernetes集成的容器监控服务,在k8s 1.7版本之前的用10255端口即可,但是1.7版本后cadvisor 监控的数据没有集成到 kubelet 的实现里面去了,这里一定要注意
- job_name=kubernetes-node-exporter 中替换10250的端口是31672,该端口是 node-exporter暴露的 NodePort 端口,这里需要根据实际情况填写。
部署 node-exporter
先部署node-exporter,为了能够收集每个节点的信息,所以我们这里使用DaemonSet的形式部署PODS:
--- apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: node-exporter namespace: kube-system labels: k8s-app: node-exporter spec: template: metadata: labels: k8s-app: node-exporter spec: containers: - image: prom/node-exporter name: node-exporter ports: - containerPort: 9100 protocol: TCP name: http --- apiVersion: v1 kind: Service metadata: labels: k8s-app: node-exporter name: node-exporter namespace: kube-system spec: ports: - name: http port: 9100 nodePort: 31672 protocol: TCP type: NodePort selector: k8s-app: node-exporter
将以上文件保存为 node-exporter.yaml ,然后执行命令:
$ kubectl create -f node-exporter.yaml
部署 promethues
接下来通过Deployment部署promethues.yaml文件如下:
apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: k8s-app: prometheus name: prometheus namespace: kube-system spec: replicas: 1 template: metadata: labels: k8s-app: prometheus spec: containers: - image: prom/prometheus:v1.0.1 name: prometheus command: - "/bin/prometheus" args: - "-config.file=/etc/prometheus/prometheus.yml" - "-storage.local.path=/prometheus" - "-storage.local.retention=24h" ports: - containerPort: 9090 protocol: TCP volumeMounts: - mountPath: "/prometheus" name: data subPath: prometheus - mountPath: "/etc/prometheus" name: config-volume resources: requests: cpu: 100m memory: 100Mi limits: cpu: 200m memory: 1Gi volumes: - name: data emptyDir: {} - configMap: name: prometheus-config name: config-volume
将以上文件保存为 promethues-deploy.yaml,然后执行命令:
$ kubectl create -f prometheus-deploy.yaml
创建 promethues-service.yaml:
apiVersion: v1 kind: "Service" metadata: name: prometheus labels: k8s-app: prometheus namespace: kube-system spec: ports: - protocol: TCP port: 9090 targetPort: 9090 selector: k8s-app: prometheus
执行创建 promethues service 服务的命令:
$ kubectl create -f prometheus-service.yaml
接下来暴露服务以便可以访问Prometheus的UI界面,你可以通过kubectl port-forward将它暴露在本地:
$ POD=`kubectl get pod -l app=prometheus -n kube-system -o go-template --template '{{range .items}}{{.metadata.name}}{{end}}'` $ kubectl port-forward $POD -n kube-system 9090:9090
然后用浏览器访问http://localhost:9090就可以访问到Prometheus的界面了。
在这里通过ingress暴露到外网
部署 ingress
部署 ingress-controller,查阅官网我是开启了rbac权限控制开关
- 首先创建defaultbackend deployment
$ curl -o default-backend.yaml https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/default-backend.yaml default-backend.yaml 文件如下: apiVersion: extensions/v1beta1 kind: Deployment metadata: name: default-http-backend labels: app: default-http-backend namespace: kube-system spec: replicas: 1 selector: matchLabels: app: default-http-backend template: metadata: labels: app: default-http-backend spec: terminationGracePeriodSeconds: 60 containers: - name: default-http-backend # Any image is permissible as long as: # 1. It serves a 404 page at / # 2. It serves 200 on a /healthz endpoint image: kirago/defaultbackend:1.4 livenessProbe: httpGet: path: /healthz port: 8080 scheme: HTTP initialDelaySeconds: 30 timeoutSeconds: 5 ports: - containerPort: 8080 resources: limits: cpu: 10m memory: 20Mi requests: cpu: 10m memory: 20Mi --- apiVersion: v1 kind: Service metadata: name: default-http-backend namespace: kube-system labels: app: default-http-backend spec: ports: - port: 80
- 执行创建命令:
$ kubectl create -f default-backend.yaml
- 创建 ingress configmap
$ curl -o ingress-configmap.yaml https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/configmap.yaml
ingress-configmap.yaml 文件如下:
kind: ConfigMap apiVersion: v1 metadata: name: nginx-configuration namespace: kube-system labels: app: ingress-nginx
- 执行创建命令如下:
$ kubectl create -f ingress-configmap.yaml
- 创建 tcp-services-configmap.yaml
$ curl -o tcp-service-configmap.yaml https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/tcp-services-configmap.yaml
- tcp-services-configmap.yaml 文件如下:
kind: ConfigMap apiVersion: v1 metadata: name: tcp-services namespace: kube-system
- 执行创建命令:
$ kubectl create -f tcp-services-configmap.yaml
- 创建 udp-services-configmap.yaml
$ curl -o udp-services-configmap.yaml https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/udp-services-configmap.yaml
- udp-services-configmap.yaml 文件如下:
kind: ConfigMap apiVersion: v1 metadata: name: udp-services namespace: kube-system
- 执行创建命令:
$ kubectl create -f udp-services-configmap.yaml
- 创建 ingress-rbac.yaml 文件
$ curl -o ingress-rbac.yaml https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/rbac.yaml
- ingress-rbac.yaml 文件如下:
apiVersion: v1 kind: ServiceAccount metadata: name: nginx-ingress-serviceaccount namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: nginx-ingress-clusterrole rules: - apiGroups: - "" resources: - configmaps - endpoints - nodes - pods - secrets verbs: - list - watch - apiGroups: - "" resources: - nodes verbs: - get - apiGroups: - "" resources: - services verbs: - get - list - watch - apiGroups: - "extensions" resources: - ingresses verbs: - get - list - watch - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - "extensions" resources: - ingresses/status verbs: - update --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: Role metadata: name: nginx-ingress-role namespace: kube-system rules: - apiGroups: - "" resources: - configmaps - pods - secrets - namespaces verbs: - get - apiGroups: - "" resources: - configmaps resourceNames: # Defaults to "<election-id>-<ingress-class>" # Here: "<ingress-controller-leader>-<nginx>" # This has to be adapted if you change either parameter # when launching the nginx-ingress-controller. - "ingress-controller-leader-nginx" verbs: - get - update - apiGroups: - "" resources: - configmaps verbs: - create - apiGroups: - "" resources: - endpoints verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: RoleBinding metadata: name: nginx-ingress-role-nisa-binding namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: nginx-ingress-role subjects: - kind: ServiceAccount name: nginx-ingress-serviceaccount namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: nginx-ingress-clusterrole-nisa-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: nginx-ingress-clusterrole subjects: - kind: ServiceAccount name: nginx-ingress-serviceaccount namespace: kube-system
- 执行如下命令:
$ kubectl create -f ingress-rbac.yaml
- 创建 ingress-with-rbac.yaml 文件
$ curl -o ingress-with-rbac.yaml https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/with-rbac.yaml
- ingress-with-rbac.yaml 文件如下:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-ingress-controller namespace: kube-system spec: replicas: 1 selector: matchLabels: app: ingress-nginx template: metadata: labels: app: ingress-nginx annotations: prometheus.io/port: '10254' prometheus.io/scrape: 'true' spec: serviceAccountName: nginx-ingress-serviceaccount containers: - name: nginx-ingress-controller image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0 args: - /nginx-ingress-controller - --default-backend-service=$(POD_NAMESPACE)/default-http-backend - --configmap=$(POD_NAMESPACE)/nginx-configuration - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services - --udp-services-configmap=$(POD_NAMESPACE)/udp-services - --annotations-prefix=nginx.ingress.kubernetes.io env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace ports: - name: http containerPort: 80 - name: https containerPort: 443 livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 10254 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1
- 执行以下创建命令:
$ kubectl create -f ingress-with-rbac.yaml
部署 ingress 实例
- yaml文件内容如下:
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: prometheus-ingress namespace: kube-system spec: rules: - host: prometheus.local # 替换成你的域名 http: paths: - path: - backend: serviceName: prometheus servicePort: 9090
将以上文件保存为prometheus-ingress.yaml,然后执行命令:
$ kubectl create -f prometheus-ingress.yaml
敲黑板
由于 ingress 是暴露给集群外部使用的,那么架在 ingress 之前的例如F5硬负载是要能够解析到集群内部 ingress 暴露的域名,因为在自己本地搭建环境的时候我就踩了这个坑,配置之后就可以访问了(自己测试验证的时候需要注意)。
最终结果: