Jib使用
Environment
- Jib 0.9.10
- Jdk1.8
Foreward
What
针对Java构建容器镜像.提供以maven插件和gradle插件的方式快速构建镜像.
特点:1、快速;2、不依赖docker命令,不需要写Dockerfile;3、同docker一样采用分层构建;
缺点:1、不能指定volume挂载;2、
一些java启动命令不能使用,如java -jar **.jar –spring.profiles.active=prod …
Feature
Simple
原来构建步骤
现在构建步骤
开发人员无需了解Dockerfile的细节,通过jib上传镜像.
Fast
为什么会快呢?学习了docker的层概念,增量构建.传统构建是将jar包打入镜像,而jib的策略的将依赖的class文件上传.
不依赖于docker
构建过程完成不依赖于本地是否有安装docker.
缺点
不能通过jib指定volume,只能提前将volume构建在镜像中.
Usage
配置详解
1 | <build> |
常用命令
打包应用镜像并上传
不依赖本地docker
1 | $ mvn compile com.google.cloud.tools:jib-maven-plugin:0.9.11:build -Dimage=<MY IMAGE> |
依赖本地docker
1 | $ mvn compile com.google.cloud.tools:jib-maven-plugin:0.9.10:dockerBuild |
镜像打成tar包
1 | # 会在项目路径的target目录下生成jib-image.tar. |
导出镜像的Dockerfile
1 | # 默认生成在target/jib-docker-context下.-DjibTargetDir用于指定生成路径. |
配置私仓身份认证
如果没有使用credHelper,
通过Docker Credential Helps
通过Maven配置
配置Setting.xml文件
- maven加密方式参见Password Encryption
1 | <settings> |
Example
目标描述
使用spring boot构建一个java应用程序镜像,并指定profile运行.
为了能够指定profile运行,main启动代码要改动.
平常启动应用的命令是:
1 | java -jar -Xms512m -Xmx512m **.jar --spring.profiles.active=prod |
Jib打成的镜像,ENTRYPOINT实际的启动命令是:
1 | java -Xms512m -Xmx512m -cp app/libs/*:app/resources:app/classes package.MonitorApplication |
为了能够指定profile运行,需要在docker启动添加参数.
1 | @SpringBootApplication |
配置pom.xml
1 | <?xml version="1.0" encoding="UTF-8"?> |
构建并上传images
1 | $ mvn package |
启动容器
1 | $ docker pull [镜像名称] |
结合Ansible Scripts
配置ansible脚本
1 | $ mkdir -p /opt/ansible-biz/roles/monitor/tasks |
启动
$ ansible-playbook -i hosts -e profile=prod sity.yml
Expend Study
使用alpine镜像或distroless镜像作为基础构建镜像.
Reference Resources
- Github Jib/jib)
- Jib介绍视屏
- Jib Blog
- Jib PPT
zookeeper学习视图
Foreward
上面这个视图,就是我对zookeeper需要学习知识的总结.我也将按照这个方式记录自己学习的内容.
学习资源
- 官网
- 书籍:《从Paxos到ZooKeeper》
服务端部署
常规部署
docker部署
ansible-playbook部署
客户端
ZKClient
Curator
ZAB
典型应用场景
运维
源码解读
docker实战之zookeeper集群
环境
- centos7
- zookeeper 3.4.12
- docker 18.03.1-ce
Preface
基于docker创建zookeeper集群.
Install
开始之前,先梳理下几个关键问题:
- zookeeper的配置文件、日志、数据文件需要映射到宿主机中
- docker hub中提供的镜像,其在容器中zookeeper的路径(需查阅dockerfile文件)
Pull Image
1 | $ docker pull zookeeper:3.4.12 |
Prepare
create persistent directory
1 | # 创建数据持久目录、日志目录、启动配置文件目录 |
create zoo.cfg、myid
1 | # 这里看需要多少节点的集群,配置文件有适当调整,这里以三台为例. |
Start
1 | # 每台机器分别启动,注意替换--name的值 |
Don’t Forget
1 | # 防火墙添加端口 |
Use Ansible-playbook
待完成
参考
docker zookeeper
Docker部署Zookeeper集群
Spring boot之Actuator理解
Environment
- Spring boot 2.0.4
- Jdk1.8
Preface
今天生产上的日志文件把磁盘撑爆了,非常悲剧.既然出了问题,就得找到原因以及解决方案.(一劳永逸)
通过梳理,识别出以下几个问题:
- 代码中的日志没有行至有效的规范.debug和info没有明确的规范.
- 有些debug日志有助于生产上排查错误,需要动态切换日志级别的能力.
针对日志规范问题,每个人的见解不一样,我自己的梳理在《Java日志规范看法》中.
根据第二个问题,发现spring boot actuator提供了这个能力,这就促使我去研究一番.
Spring Boot Actuator
○ Spring Boot includes a number of additional features to help you monitor and manage your application when you push it to production. You can choose to manage and monitor your application by using HTTP endpoints or with JMX. Auditing, health, and metrics gathering can also be automatically applied to your application.
☆ Spring boot 提供了以HTTP或JMX管理和监控应用程序.适用于应用程序的审计、健康情况、度量收集.
依赖包
maven项目
1 | <dependencies> |
提供的端点
- 详细用法参考Usage.
下表提供了16个请求端点,HTTP和JMX都可以使用.另外4个端点在使用web时可以用.
- 请求前缀为/actuators
例如:以http方式,监测应用程序健康情况:/actuators/health
1 | $ curl http://localhost:[port]/actuator/health |
列表:
ID | Description | Enabled by default |
---|---|---|
auditevents |
Exposes audit events information for the current application. | Yes |
beans |
Displays a complete list of all the Spring beans in your application. | Yes |
conditions |
Shows the conditions that were evaluated on configuration and auto-configuration classes and the reasons why they did or did not match. | Yes |
configprops |
Displays a collated list of all @ConfigurationProperties . |
Yes |
env |
Exposes properties from Spring’s ConfigurableEnvironment . |
Yes |
flyway |
Shows any Flyway database migrations that have been applied. | Yes |
health |
Shows application health information. | Yes |
httptrace |
Displays HTTP trace information (by default, the last 100 HTTP request-response exchanges). | Yes |
info |
Displays arbitrary application info. | Yes |
loggers |
Shows and modifies the configuration of loggers in the application. | Yes |
liquibase |
Shows any Liquibase database migrations that have been applied. | Yes |
metrics |
Shows ‘metrics’ information for the current application. | Yes |
mappings |
Displays a collated list of all @RequestMapping paths. |
Yes |
scheduledtasks |
Displays the scheduled tasks in your application. | Yes |
sessions |
Allows retrieval and deletion of user sessions from a Spring Session-backed session store. Not available when using Spring Session’s support for reactive web applications. | Yes |
shutdown |
Lets the application be gracefully shutdown. | No |
threaddump |
Performs a thread dump. | Yes |
○ If your application is a web application (Spring MVC, Spring WebFlux, or Jersey),you can use the following additional endpoints.
ID | Description | Enabled by default |
---|---|---|
heapdump |
Returns a GZip compressed hprof heap dump file. |
Yes |
jolokia |
Exposes JMX beans over HTTP (when Jolokia is on the classpath, not available for WebFlux). | Yes |
logfile |
Returns the contents of the logfile (if logging.file or logging.path properties have been set). Supports the use of the HTTP Range header to retrieve part of the log file’s content. |
Yes |
prometheus |
Exposes metrics in a format that can be scraped by a Prometheus server. | Yes |
开启端点
- actuator中提供的20个端点,默认级别开关级别如下:
ID | JMX | Web |
---|---|---|
auditevents |
Yes | No |
beans |
Yes | No |
conditions |
Yes | No |
configprops |
Yes | No |
env |
Yes | No |
flyway |
Yes | No |
health |
Yes | Yes |
heapdump |
N/A | No |
httptrace |
Yes | No |
info |
Yes | Yes |
jolokia |
N/A | No |
logfile |
N/A | No |
loggers |
Yes | No |
liquibase |
Yes | No |
metrics |
Yes | No |
mappings |
Yes | No |
prometheus |
N/A | No |
scheduledtasks |
Yes | No |
sessions |
Yes | No |
shutdown |
Yes | No |
threaddump |
Yes | No |
JMX
除prometheus、logfile、jolokia、heapdump没有提供外,其余端点默认开启.
Web
默认只开启health、info端点.
正确打开姿势:
JMX
1
2
3
4
5
6
7# 打开 management.endpoints.jmx.exposure.include
# 关闭 management.endpoints.jmx.exposure.exclude
# 如关闭mappings、shutdown端点.
management.endpoints.jmx.exposure.exclude=mappings,shutdown
# 如关闭所有端点.
management.endpoints.jmx.exposure.exclude=*
# YAML中*有特殊含义,需要用"*"Web
1
2
3
4
5
6
7# 打开 management.endpoints.web.exposure.include
# 关闭 management.endpoints.web.exposure.exclude
# 如打开env、mappings端点
management.endpoints.web.exposure.include=env,mappings
# 如打开所有端点
management.endpoints.web.exposure.include=*
# YAML中*有特殊含义,需要用"*"
端点详细用法
请求前缀为/actuators.具体用法参考文档Usage.
配置端点缓存
○ Endpoints automatically cache responses to read operations that do not take any parameters.
☆ 官方说是默认缓存读取操作的无参端点返回值.至于具体时间只能看源码了.
对应的源码类为:EndpointAutoConfiguration、EndpointIdTimeToLivePropertyFunction
正确配置姿势如下:
1 | # The prefix management.endpoint.<name> is used to uniquely identify the endpoint that is being configured. |
自定义端点路径
base-path
默认值是:/actuator
path-mapping
端点映射路径,默认是官方提供的20个端点名称.
完整请求路径为:[base-path]+[path-mapping]
1 | # 原health端点为:/actuator/health,现在改为:/manager/healthcheck |
自定义端点
○ If you add a @Bean
annotated with @Endpoint
, any methods annotated with @ReadOperation
, @WriteOperation
, or @DeleteOperation
are automatically exposed over JMX and, in a web application, over HTTP as well. Endpoints can be exposed over HTTP using Jersey, Spring MVC, or Spring WebFlux.
○ You can also write technology-specific endpoints by using @JmxEndpoint
or @WebEndpoint
. These endpoints are restricted to their respective technologies. For example, @WebEndpoint
is exposed only over HTTP and not over JMX.
○ You can write technology-specific extensions by using @EndpointWebExtension
and @EndpointJmxExtension
. These annotations let you provide technology-specific operations to augment an existing endpoint.
○ Finally, if you need access to web-framework-specific functionality, you can implement Servlet or Spring @Controller
and @RestController
endpoints at the cost of them not being available over JMX or when using a different web framework.
☆ actuator可以提供JMX和HTTP两种方式,所以也提供对应实现的方式.
@Endpoint
针对JMX和HTTP.@JmxEndpoint
或@EndpointJmxExtension
只针对JMX.@WebEndpoint
或@EndpointWebExtension
只针对HTTP.@ReadOperation
,@WriteOperation
,@DeleteOperation
这三个用于指定请求方式.
Operation | HTTP method |
---|---|
@ReadOperation |
GET |
@WriteOperation |
POST |
@DeleteOperation |
DELETE |
☆ 使用actuator的自定义端点有特别的意义吗?
目前看,针对HTTP的方式,与平常我自己暴露端点也没区别.但针对JMX监控的话,这就是它优势之处了.
另外,个人认为没有特殊要求,使用@Endpoint
更好,既提供了JMX监控端点,也同时提供了HTTP监控端点.
如何使用
○ To allow the input to be mapped to the operation method’s parameters, Java code implementing an endpoint should be compiled with -parameters
, and Kotlin code implementing an endpoint should be compiled with -java-parameters
. This will happen automatically if you are using Spring Boot’s Gradle plugin or if you are using Maven and spring-boot-starter-parent
.
☆ 在使用@ReadOperation
等时,默认是按照参数名匹配入参,如果需要参数数量自动匹配,需要在spring boot时添加-parameters
.
新增端点
- 使用
@Endpoint
、@ReadOperation
,@WriteOperation
,@DeleteOperation
例子:
1 | @Component |
启动时,使用-parameters参数.
1 | # 调用health2方法 http://localhost:[port]/actuator/custom-health/{anystring} |
启动时,不使用-parameters参数.
1 | # 调用health无参方法 |
☆ 总结:actuator中端点的多参数请求方式,不按照参数名匹配.所以需要在启动时添加-parameters参数.
- 使用
WebEndpointExtension
或@EndpointJmxExtension
1 | @Endpoint(id = "myhealth") |
覆盖原端点
目前覆盖原端点只能通过重载的方式.我这里测试了health端点的覆盖.
1 | @Component |
遗留问题
端点默认缓存的默认时间是多少?
官网还有更多关于监控文章待学习.
参考
spring-boot-2.0.4-doc
Custom Endpoint in Spring Boot Actuator
How to make the @Endpoint(id = “health”)
working in Spring Boot 2.0?
kubernetes的DNS理解
Foreward
对于k8s的DNS理解还是有些模糊,这里梳理阅读相关文章后的理解.
- Using CoreDNS for Service Discovery
- 在 Kubernetes 中配置私有 DNS 和上游域名服务器
- DNS for Services and Pods
- Customizing DNS Service
Kubernetes提供的DNS服务
DNS是kubernetes内置的Pod服务.包含三个容器:
kubedns
监测kubernetes的master节点对Services和Endpoints的变化,并且保留在内存中,服务DNS查询.
dnsmasq
缓存DNS,提高查询效率.
sidecars
为dnsmasq和kubedns提供健康检查的端点.
Kube-DNS和CoreDNS
kubernetes提供了两种DNS服务
- kube-dns
- CoreDNS
从v1.11版本开始,CoreDNS已经是GA版本了,且已经作为Kubernetes的DNS服务.(CoreDNS已经作为CNCF的独立项目)
kube-dns是在1.9版本前使用.
在v1.11版本之后,(不建议)如果还想继续使用kube-dns,则在初始化集群是配置以下参数
1 | $ kubeadm init --feature-gates=CoreDNS=false |
配置kube-dns的存根域和上游DNS服务器
这里有两个概念:stub domains 和upstream nameservers.这里我翻译为存根域和上游DNS服务器.
- upstreamNameservers,会覆盖node节点上的/etc/resolv.conf文件,且最多配置三个upstream nameservers.
例子:
1 | apiVersion: v1 |
DNS请求如果后缀有acme.local
,则返回DNS Server的地址1.2.3.4.
Domain name | Server answering the query |
---|---|
kubernetes.default.svc.cluster.local | kube-dns |
foo.acme.local | custom DNS (1.2.3.4) |
widget.com | upstream DNS (one of 8.8.8.8, 8.8.4.4) |
Pod设置dnsPolicy对DNS查询的影响
当在pod中设置的dnsPolicy为default
和None
,则自定义的stub domain和upstream nameservers不会生效.
当dnsPolicy为ClusterFirst
后
未配置了存根域和upstream
如果咩有匹配的domain后缀,如
www.kubernetes.io
则去查找upstream nameserver.配置自定义存根域和upstream
首先查找kube-dns的DNS cache.
再查找自定义的stub domain,即图中的custom DNS.
最后查找upstream DNS.
配置CoreDNS的存根域和上游DNS服务器
CoreDNS提供链条插件式扩展,非常灵活.CoreDNS安装后默认包含了30个插件.CoreDNS的功能可以由一个或多个插件组成.只要会go语言,以及指导DNS工作原理就可以开发插件.
CoreDNS的配置文件Corefile.且语法规则如下:
1 | coredns.io { |
详细信息可浏览官网CoreDNS.
在v1.10版本后,kubeadm支持自动转换ConfigMap为Corefile.
Example:
kubedns使用以下配置.stubDomain存根域及upstream上游域.
1 | apiVersion: v1 |
等价的Corefile配置文件为:
- For federations:
1 | federation cluster.local { |
- For stubDomains:
1 | abc.com:53 { |
完整配置如下:DNS使用UDP协议,且端口为53
1 | .:53 { |
DNS中的记录生成规则
DNS包含A记录和SRV记录.A记录就是ip和域名的映射,SRV记录是端口映射.
Service
Service分为Headless和非Headless.(Headless Service:.spec.clusterIP
设置为None)
Service创建之后,默认会生成一条DNS映射的A记录,格式为:[.metadata.name].[namespace].svc.cluster.local
.
还会生成一条DNS映射的SRV(端口)记录,格式为:_my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local
.
对非Headless的Service,端口的DNS映射就是:[.metadata.name].[namespace].svc.cluster.local
.
对于Headless的Service,目前还不是特别明白.暂且先将原文描述贴下来.For a headless service, this resolves to multiple answers, one for each pod that is backing the service, and contains the port number and the domain name of the pod of the form auto-generated-name.my-svc.my-namespace.svc.cluster.local
.
Pod
创建Pod会生成一条DNS的A记录:pod-ip-address.my-namespace.pod.cluster.local
.
在集群中查找Pod,可以通过这种格式[.metadata.name].[.spec.subdomain].[namespace].svc.cluster.local
查找.
Pod的DNS规则
设置字段:.spec.dsnPolicy
.有四种规则:
Default
虽然名字是Default,但是不是默认规则.
The Pod inherits the name resolution configuration from the node that the pods run on
ClusterFirst
集群规则优先,如果没有查询到,则去上游域名服务器查询.集群DNS服务和上游DNS服务都可以配置.
ClusterFirstWithHostNet
For Pods running with hostNetwork, you should explicitly set its DNS policy
None
忽略在kubernetes环境DNS配置,并使用自定义的DNS配置.
.spec.dsnConfig
如何手动设置Pod中的DNS解析配置
需要在集群中开启功能支持,--feature-gates=CustomPodDNS=true
.
例如:
1 | apiVersion: v1 |
运行后,会在Pod中的/etc/resolv.conf文件中生成以下内容:
1 | nameserver 1.2.3.4 |
自定义DNS服务
DNS for Services and Pods翻译
原文链接
DNS for Services and Pods
这篇文章是kubernetes关于DNS的概述.
- Introduction
- Services
- Pods
Introduction
Kubernetes DNS 在集群中调度一个DNS Pod和Service,并配置kubelets去告诉独立的容器使用DNS Service’s IP去解析DNS名称.
What things get DNS names?
在集群中的每个service都会被分配一个DNS名称.默认情况下,客户端发起的Pod的DNS搜索包括Pod的namespace和集群默认的域名.这里有个例子说明:
假设一个Service名称为foo,在kubernetes中的namespace为bar.一个运行在namespace为bar的pod,通过简便的DNS查询到名称为foo的Service.一个运行在namespace为quux的pod,可以在DNS中通过搜索名称为foo.bar,并查到这个容器.
下面的章节详细的说明了支持的record类型以及支持的布局设计.
Services
A Record
正常(not headless)Services会被分配一个这种格式的DNS A 记录my-svc.my-namespace.svc.cluster.local
.这个解析集群的Service Ip.
Headless(没有cluster ip)Services会被分配一个这种格式的DNS A记录my-svc.my-namespace.svc.cluster.local
.和正常的Services不同的是,通过解析Pod中的一组Ip.
SRV Record
SRV记录是在正常或Headless Service创建时指定端口.每个指定的端口,SRV记录的格式是: _my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local
.对于headless serviec来说,这个解析会得到多个结果,pod中的容器端口号和子域名格式为:auto-generated-name.my-svc.my-namespace.svc.cluster.local
.
Pods
A Record
Pods会被分配这种格式的一条DNS A记录:pod-ip-address.my-namespace.pod.cluster.local
例如:一个pod的ip是1.2.3.4,namespace为default,DNS为cluster.local的DNS记录为:1-2-3-4.default.pod.cluster.local
Pod’s hostname and subdomain fields
创建pod时,它的hostname默认为Pod中的metadata.name的值.
Pod的spec有个hostname字段选项,用来指定pod的hostname.而.spec.hostname
优先级高于.metadata.name
.例如:给一个pod的hostname设置为”my-host”,那么这个pod的名称就是”my-host”.
Pod的spec有个subdomain字段选项,用来指定pod的子域名.例如:一个pod的hostname设置为”foo”,subdomain设置为”bar”,namespace为”my-namespace”,则它的查询名称就为:foo.bar.my-namespace.svc.cluster.local
例子:
1 | apiVersion: v1 |
如果存在一个headless service和一个pod在同一个namespace中,并且headless service的名称和pod中的subdomain名称相同,kubernetes的DNS服务一样查询到这个Service.例如:上面配置中,一个Pod的hostname是”busybox-1”,subdomain是default-subdomain,而headless service命名为”default-subdomain”.那么这个DNS为busybox-1.default-subdomain.my-namespace.svc.cluster.local
kubernetes常用命令
控制器
Deployment
1 | # nginx-deployment.yaml |
创建Deployments
- record参数设置为true,在Deployment revision时方便命令记录,以及在describe时能够看到Annotations中指令记录
1 | $ kubectl create -f nginx-deployment.yaml --record |
查看发布历史记录
- rollout 字面是上线意思,我理解为发布.
替换metadata.name,即上面文件中的nginx-deployment名称.
1 | # kubectl rollout history deployment/[metadata.name] |
查看历史版本的具体信息
1 | $ kubectl rollout history deployment/nginx-deployment --revision=1 |
查看Deployments
1 | $ kubectl get deployment [metadata.name](可选) |
查看RS(ReplicaSet)
1 | $ kubectl get rs |
更新Deployments(自动rollout)
1 | # 方式一,修改文件中的spec.template.spec.containers[0].image值 |
查看发布状态
1 | # kubectl rollout status deployment/[metadata.name] |
查看deployment详细信息
- 关注Events,记录了发布的过程.实际是创建了一个新的ReplicaSet->nginx-deployment-6ccc5f4cbb,然后逐渐下原来的ReplicaSet->nginx-deployment-966857787的Pod.
1 | # kubectl describe deployment/[metadata.name] |
版本回退
回滚之后,revision对应记录就会消失
undo是回滚到上一个版本的操作.
假设有三个版本:nginx:1.4.7, nginx:1.7.9, nginx:1.9.7, nginx:1.10.3,当前版本为nginx:1.10.3.
1、假如指定to-revision回滚到1.7.9版本,再执行undo(不指定to-revision),则恢复到1.10.3版本.
2、第一次执行undo(不指定to-revision),回滚到1.9.7版本,再次执行undo(不指定to-revision),则恢复到1.10.3版本.
3、第一次指定to-revision回滚到1.7.9,第二次指定to-revision回滚到1.9.7,第三次指定to-revision回滚到1.4.7,第四次指定undo(不指定to-revison),则是回滚到1.9.7.
假设有三个版本:nginx:1.7.9,nginx:1.9.7,nginx:1.10.3,当前版本为nginx:1.10.3.
1 | # 查看发布历史记录 |
第一次执行undo回退到前一个版本,即nginx:1.9.7.如果第二次再执行,又会回滚到原版本,即nginx:1.10.3.
1 | # kubectl rollout undo deployment/[metatdata.name] |
指定to-revision回退到指定历史
1 | # kubectl rollout undo deployment/[metadata.name] --to-revision=[number] |
Deployment扩容
指定 扩容/缩容 副本数量
1 | # kubectl scale deployment/[metadata.name] --replicas=[number] |
当集群启用horizontal pod autoscaling后,可以根据CPU利用率,在范围内扩容或缩容.(todo还不知道怎么做)
1 | $ $ kubectl autoscale deployment nginx-deployment --min=10 --max=15 --cpu-percent=80 |
设置Deployments发布历史记录
在nginx-deployment.yaml中设置spec.revisionHistoryLimits属性,默认是保留全部历史记录.可以不用去理会.
设置Deployments发布策略
spec.strategy
指定新的Pod替换旧的Pod的策略。 spec.strategy.type
可以是Recreate
或者是 RollingUpdate
。RollingUpdate
是默认值。
- Recreate 指在创建出新的Pod之前会杀掉已经存在的Pod.(强烈建议不使用)
- RollingUpdate 指滚动升级.逐步一个一个交替升级.可以指定
maxUnavailable
和maxSurge
来控制 rolling update 进程。- maxUnavaiables
.spec.strategy.rollingUpdate.maxUnavailable
指定在升级的过程中不可用的Pod数量.默认为1,也可以设置为百分比.如设置为30%,则原来的ReplicaSet会立刻缩容到70%. - maxSurge
.spec.strategy.rollingUpdate.maxSurge
指定在升级过程中,新老Pod的总数的最大值.如设置为30%,启动rolling update后新的ReplicatSet将会立即扩容,新老Pod的总数不能超过期望的Pod数量的130%。
- maxUnavaiables
Pause设置
.spec.paused
是可以可选配置项,boolean值。默认为false.
如果设置paused后,对Deployment中的PodTemplateSpec的修改都不会触发新的rollout。
后续需要学习
如何在集群中启用了horizontal pod autoscaling
参考
Kubernetes apis
Gluster入门
Notice
Recommend use XFS filesystem.
○ Typically, XFS is recommended but it can be used with other filesystems as well. Most commonly EXT4 is used when XFS isn’t, but you can (and many, many people do) use another filesystem that suits you.
☆ 推荐使用XFS文件系统.EXT4等其他文件系统也是可以.
Correct DNS entries (forward and reverse) and NTP are essential.
☆ DNS一般不需要特殊配置,采用默认即可.NTP,就是要求每台机器的时钟进行校对,节点机器都在同一个时区,并校对.校对方式很多,使用一致的即可.
Firewalls are great, except when they aren’t.In case you absolutely need to set up a firewall, have a look at Setting up clients for information on the ports used.
☆ 不建议Gluster节点之间开启防火墙.如果实在有必要开启防火墙,我是配置IP级别的,这样可以减少一些复杂度.
2 CPU’s, 2GB of RAM, 1GBE(千兆带宽)
☆ 服务端配置至少需要这种配置
客户端
Gluster Native Client
○ The Gluster Native Client is a FUSE-based client running in user space. Gluster Native Client is the recommended method for accessing volumes when high concurrency and high write performance is required.
☆ 推荐使用这种方式,其基于内核提供的FUSE,在高并发、大数据量写入时效果更好.
NFS Client
Foreward
What
○ GlusterFS is a scalable network filesystem suitable for data-intensive tasks such as cloud storage and media streaming. GlusterFS is free and open source software and can utilize common off-the-shelf hardware.
GlusterFS isn’t really a filesystem in and of itself. It concatenates existing filesystems into one (or more) big chunks so that data being written into or read out of Gluster gets distributed across multiple hosts simultaneously
☆ Gluster是一个开源的、可扩展的、分布式数据存储管理软件.其并不是一个文件系统,只是提供连接能力,将分布的文件系统组装成一个更大的文件存储系统.
Concept
TSP
○ A trusted storage pool(TSP) is a trusted network of storage servers. Before you can configure a GlusterFS volume, you must create a trusted storage pool of the storage servers that will provide bricks to the volume by peer probing the servers. The servers in a TSP are peers of each other.
☆ Gluster通过TSP(信任存储池)来确定可提供存储服务的机器有哪些.
Brick
○ A brick is used to refer to any device (really this means filesystem) that is being used for Gluster storage.
☆ Gluster使用的存储单位.在linux系统中,常用
fdisk -l
来查看挂载的磁盘,而brick对Gluster来说,就是它的挂载的磁盘.只不过这里将brick与磁盘进行一个bind,一一映射.Gluster volume
○ A Gluster volume is a collection of one or more bricks (of course, typically this is two or more). This is analogous(类似的) to /etc/exports entries for NFS.
☆ Brick的集合.
Global Namespace
○ The term Global Namespace is a fancy way of saying a Gluster volume.
☆ 对Cluster volume的另一种叫法.
Export
○ An export refers to the mount path of the brick(s) on a given server, for example, /export/brick1.
☆ 暂时不能理解,待补充.
GNFS and kNFS
○ GNFS is how we refer to our inline NFS server. kNFS stands for kernel NFS, or, as most people would say, just plain NFS. Most often, you will want kNFS services disabled on the Gluster nodes. Gluster NFS doesn’t take any additional configuration and works just like you would expect with NFSv3. It is possible to configure Gluster and NFS to live in harmony if you want to.
☆ Gluster内部的NFS服务,启动好Glusterd Daemon后,通过GNFS其他Gluster进行数据交互.kNFS是Linux系统内核的NFS服务.两者不会互相干扰,可以共同使用.
How
Install
Centos系统可参考<<Gluster安装>>.其他系统安装.
System Packaged version
各系统的安装包版本及依赖包.Packages
Manage Trust Storage Pool
○ The firewall on the servers must be configured to allow access to port 24007.
☆ 存储节点使用24007端口进行通信
假设有4台机器,server1,server2,server3,server4在一个TSP中.
Add to trust storage pool
在任意一台机器上机器上执行.
- gluster peer probe
1 | $ gluster peer probe <server> |
List Servers
在任意一台机器上执行.
- gluster pool list
1 | # 假设在server1上执行. |
Views peer status
- gluster peer status
1 | # 假设在server1上执行. |
Removing Servers
1 | # gluster peer detach <server> |
Brick Naming Convertions
/data/glusterfs/
/ /brick 是对linux磁盘绑定起的别名,如系统中/dev/sdb磁盘,我们绑定后,可以命名为test(环境使用类型),这样可区分所属环境. 就可以任意命名了,我是通过业务进行区分.如es表示搜索引擎业务使用,logs表示日志使用.不同的业务可能对磁盘性能可能也是不一样的,可以通过多个磁盘分出来.
要搞明白Brick的命名规范,就需要先理解brick的概念.在linux系统中,常用fdisk -l
来查看挂载的磁盘,而brick就Gluster来说,就是它的挂载的磁盘.只不过这里将brick与磁盘进行一个bind,一一映射.
举个例子:
比如一块物理磁盘/dev/sdb,现在我用于测试环境中,用户通常的业务,存储一些日志等.
1 | $ mkdir -p /data/glusterfs/test/biz |
这里有个疑问,为什么需要使用brick呢?
假如server1有两个磁盘/dev/sda,/dev/sdb.而server2有两个/dev/sdb1,/dev/sdb2.磁盘名称就存在不同,要通过brick去屏蔽底层磁盘的名称的不同和性能的不同.
1 | $ mkdir -p /data/glusterfs/test/biz |
Formatting and Mounting Bricks
待完善.这里主要涉及Linux卷相关概念:lV逻辑卷,VG卷组,PV物理卷.
https://wiki.archlinux.org/index.php/LVM
https://linux.cn/article-5117-1.html
https://askubuntu.com/questions/417642/logical-volume-physical-volume-and-volume-groups
Set ACL
待完善.这里主要是Linux中ACL与Gluster的使用.
Volume Types
○ A volume is a logical collection of bricks.
以下罗列了Gluster提供的Volume类型.
Distributed - Distributed volumes distribute files across the bricks in the volume. You can use distributed volumes where the requirement is to scale storage and the redundancy is either not important or is provided by other hardware/software layers.
Replicated – Replicated volumes replicate files across bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.
Distributed Replicated - Distributed replicated volumes distribute files across replicated bricks in the volume. You can use distributed replicated volumes in environments where the requirement is to scale storage and high-reliability is critical. Distributed replicated volumes also offer improved read performance in most environments.
Dispersed - Dispersed volumes are based on erasure codes, providing space-efficient protection against disk or server failures. It stores an encoded fragment of the original file to each brick in a way that only a subset of the fragments is needed to recover the original file. The number of bricks that can be missing without losing access to data is configured by the administrator on volume creation time.
☆ 这里的关键是erasure codes算法.Erasure-Code, 简称 EC, 也叫做 擦除码 或 纠删码, 指使用 范德蒙(Vandermonde) 矩阵的 里德-所罗门码(Reed-Solomon) 擦除码算法.
通过较少的数据冗余能够找回丢失数据.相比于Relica百分百冗余来说,这个方式更节省空间.
推荐学习这篇文章drdr.xp Blog
Distributed Dispersed - Distributed dispersed volumes distribute files across dispersed subvolumes. This has the same advantages of distribute replicate volumes, but using disperse to store the data into the bricks.
- Striped [Deprecated] 、Distributed Striped [Deprecated] 、Distributed Striped Replicated [Deprecated]、Striped Replicated [Deprecated]
☆ 上面主要有三种volume类型:Distributed, Replicated, Dispersed. 和组合后的二种:Distributed Replicated, Distributed Dispersed.
Create Command
stripe已经废弃,所以目前只有replica和disperse两种volume类型.
1 | # gluster volume create [stripe | replica | disperse] [transport tcp | rdma | tcp,rdma] |
Distributed
优点
节省空间,易扩展.
缺点
数据丢失风险:由于数据没有冗余,一旦机器故障数据就会丢失.
Note: Make sure you start your volumes before you try to mount them or else client operations after the mount will hang.
1 | # gluster volume create [transport tcp | rdma | tcp,rdma] |
Replicated
优点
数据有冗余,数据丢失依然可用.
缺点
存储空间消耗较多
Note:
Make sure you start your volumes before you try to mount them or else client operations after the mount will hang.
GlusterFS will fail to create a replicate volume if more than one brick of a replica set is present on the same peer. For eg. a four node replicated volume where more than one brick of a replica set is present on the same peer.
☆ 这种Volume类型不能指定同一台机器.如
1
2$ gluster volume create <volname> replica 4 server1:/brick1 server1:/brick2 server2:/brick3 server4:/brick4
volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior.这里指定了server1:/brick1和server1/brick2,所以报错,不能同时冗余一份到同一台机器.
1 | # gluster volume create [replica ] [transport tcp | rdma | tcp,rdma] |
Dispersed
优点
同replica一样,数据高可用,而且数据存储量更少.
缺点
暂未发现
○ Dispersed volumes are based on erasure codes.
☆ 基于纠偏码算法.不同于replica,冗余数据量大幅减少的情况下,依然做到数据高可用.
分布式系统中,为了保证数据高可用,一般选择副本数为3,这个可靠性的预期大约是11个9以上(99.999999999%的概率不丢数据).这里有业界报告来支撑这个数值.backblaze发布的硬盘故障率统计
1 | # 正常可以不用指定redundancy,在创建时会提示指定数量. |
Distributed Replicated
优点
数据冗余,数据丢失后依然可用.
缺点
于Replica不同的是,冗余的数据存在随机性,不便于管理.另外,存储空间消耗大.
Note: - Make sure you start your volumes before you try to mount them or else client operations after the mount will hang.
GlusterFS will fail to create a distribute replicate volume if more than one brick of a replica set is present on the same peer. For eg. for a four node distribute (replicated) volume where more than one brick of a replica set is present on the same peer.
☆ 这种类型的Volume,同样不能指定同一台机器,不然报错.
1
2$ gluster volume create <volname> replica 4 server1:/brick1 server1:/brick2 server2:/brick3 server4:/brick4
volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior.这里指定了server1:/brick1和server1/brick2,所以报错,不能同时冗余一份到同一台机器.
从执行指令上看,Relicated和Distributed Relicated是相同的.只是后面的节点数量多于replica的数量.
1 | # gluster volume create [replica ] [transport tcp | rdma | tcp,rdma] |
Distributed Dispersed
Gluster安装
Environment
- Centos 7
- Gluster 4.1
使用三台机器
- 192.168.1.100 server1
- 192.168.1.101 server2
- 192.168.1.102 server3
Foreward
kubernetes的持久化需要使用gluster.这里基于Centos提供的Quick-Start教程验证后的记录.
Install
Prepare
DNS和NTP
DNS如果没有特殊要求,采用默认配置即可,每台服务器对时间进行校对和统一时区.
Hosts Set
建议使用服务器名的方式管理gluster,则需要配置ip和名称的映射.每台机器都需要配置.
1 | $ vim /etc/hosts |
Add Yum Repository
添加gluster下载仓库
1 | $ yum install centos-release-gluster |
Use XFS
推荐使用XFS文件系统,这里我还是使用exts4,没有重新格式化文件系统.格式化参考教程网上很多.
Install Gluster And Start
1 | $ yum install glusterfs-server |
Set Firewalld
○ By default, glusterd will listen on tcp/24007. But each time you add a brick, it will open a new port (that you’ll be able to see with “gluster volume status”)
☆ 默认是24007端口,但是每增加brick都会新增监听端口,具体端口可以通过gluster volume status查看.不过在配置防火墙时,我使用授权ip方式.
1 | # 这里的TCP Port指定的49152就是需要开放的端口. |
centos7默认使用firewalld,
1 | # 允许指定ip.我使用三台机器做测试,每台机器要配置另外两台服务器的ip. |
Set Trusted Pool
这里使用三台机器,可以在任意一台服务器,将另外两台服务器添加到Trust Pool即可.
如在server1上将server2和server3添加到Trust Pool.
1 | # 注:/etc/hosts需要配置映射. |
Create a Volume
在三台服务器上分别创建/bricks/brick1/gv0目录
1 | # 官网推荐brick命名方式:/data/glusterfs/<volume>/<brick>/brick |
挂载目录到gluster上.(任意节点执行命令)
1 | # 这里选择的replicas模式,保证数据丢失,其他模式不在这里讨论. |
查看volume信息
1 | $ gluster volume info |
Testing
1 | # 将server1:/gv0 挂在到/mnt目录 |
总结
- 推荐部署机器为奇数,不然会出现脑裂现象.(猜测使用了poxis算法)
- gluster推荐使用xfs文件系统,centos/Red Hat Enterprise Linux 7默认使用,磁盘格式化时可能没有指定.