prometheus内存用量估算

发表于 2018-09-12

字数统计: 0 | 阅读时长 ≈ 1

Jib使用

发表于 2018-09-10 | 分类于 Spring boot

字数统计: 1,685 | 阅读时长 ≈ 8

Environment

Jib 0.9.10
Jdk1.8

Foreward

What

针对Java构建容器镜像.提供以maven插件和gradle插件的方式快速构建镜像.

特点:1、快速;2、不依赖docker命令,不需要写Dockerfile;3、同docker一样采用分层构建;

缺点:1、不能指定volume挂载;2、~~一些java启动命令不能使用,如java -jar **.jar –spring.profiles.active=prod …~~

Feature

Simple

原来构建步骤

现在构建步骤

开发人员无需了解Dockerfile的细节,通过jib上传镜像.

Fast

为什么会快呢?学习了docker的层概念,增量构建.传统构建是将jar包打入镜像,而jib的策略的将依赖的class文件上传.
不依赖于docker

构建过程完成不依赖于本地是否有安装docker.
缺点

不能通过jib指定volume,只能提前将volume构建在镜像中.

Usage

配置详解

<build>
    <plugins>
        <plugin>
        	<!-- 依赖包 -->
            <groupId>com.google.cloud.tools</groupId>
            <artifactId>jib-maven-plugin</artifactId>
            <version>0.9.11</version> 
            <!-- 将jib:build的命令绑定到maven的生命周期中.
                原来执行打包方式为mvn compile jib:build,现在只要执行mvn package -->
            <executions>
                <execution>
                <phase>package</phase>
                <goals>
                    <goal>build</goal>
                </goals>
                </execution>
            </executions>
            <configuration>
                <!-- 镜像拉取配置. -->
                <from>
                	<!-- 镜像地址. -->
                    <image></image>
                    <!-- 用户名/密码安全加密,可以不配置.使用auth -->
                    <credHelper>none</credHelper>
                    <auth>
                        <username>私仓用户名</username>
                        <password>私仓密码</password>
                    </auth>
                </from>
                <!-- 镜像上传配置. -->
                <to>
                    <!-- 镜像地址. -->
                    <image></image>
                    <credHelper>none</credHelper>
                    <auth>
                        <username>私仓用户名</username>
                        <password>私仓密码</password>
                    </auth>
                </to>
                <!-- 默认值为false,用于gradle的缓存设置. -->
                <useOnlyProjectCache>false</useOnlyProjectCache>
                <!-- 默认值为false,是否开启HTTPS的认证.(推荐为false) -->
                <allowInsecureRegistries>false</allowInsecureRegistries>
                <!-- Copies files from 'src/main/custom-extra-dir' instead of 'src/main/jib' 
                用于将其他文件添加到镜像中.-->
                <extraDirectory>${project.basedir}/src/main/custom-extra-dir</extraDirectory>
                <container>
                    <mainClass>com.bsd.proxy.monitor.MonitorApplication</mainClass>
                    <!-- JVM相关参数. -->
                    <jvmFlags>
                        <jvmFlag>-Xms512m</jvmFlag>
                        <jvmFlag>-Xmx512m</jvmFlag>
                    </jvmFlags>
                    <!-- 暴露端口.同docker一样.也可以配置端口范围. -->
                    <ports>
                        <port>9999</port>
                        <port>2000-2003/udp</port>
                    </ports>
                    <!-- 生成的Dockerfile中CMD的参数. -->
                    <args>
                        <arg>some</arg>
                        <arg>args</arg>
                    </args>
                    <!-- 同docker的labels. -->
                    <labels>
                        <key1>value1</key1>
                    </labels>
                    <!-- 构建容器镜像类型.还支持OCI -->
                    <format>docker</format>
                    <!-- 默认值为false,指在打包镜像是是否带时间戳.-->
                    <useCurrentTimestamp></useCurrentTimestamp>
                    <!-- (不建议使用)容器启动命令,同docker中的ENTRYPOINT,如果指定了则jvmFlags和mainClass失效.-->
                    <entrypoint></entrypoint>
                </container>
            </configuration>
        </plugin>
    </plugins>
</build>

常用命令

打包应用镜像并上传

不依赖本地docker

$ mvn compile com.google.cloud.tools:jib-maven-plugin:0.9.11:build -Dimage=<MY IMAGE>
# 或者
$ mvn compile jib:build
# 指定上传镜像超时时间(毫秒).默认是20000毫秒.或者设置0为没有超时时间.
$ mvn compile jib:build -Djib.httpTimeout=3000

依赖本地docker

1
2
3

$ mvn compile com.google.cloud.tools:jib-maven-plugin:0.9.10:dockerBuild
# 或者
$ mvn compile jib:dockerBuild

镜像打成tar包

# 会在项目路径的target目录下生成jib-image.tar.
$ mvn compile jib:buildTar
# 通过docker导入镜像.
$ docker load --input target/jib-image.tar

导出镜像的Dockerfile

# 默认生成在target/jib-docker-context下.-DjibTargetDir用于指定生成路径.
$ mvn compile jib:exportDockerContext -DjibTargetDir=my/docker/context/
# 下面是我pom中的配置,并生成的一个Dockerfile
FROM registry.cn-hangzhou.aliyuncs.com/jiuming/java-alpine-openjdk8-jre-shanghai
  
COPY libs /app/libs/
COPY resources /app/resources/
COPY classes /app/classes/

EXPOSE 9999
ENTRYPOINT ["java","-Xms512m, -Xmx512m", -cp","/app/resources/:/app/classes/:/app/libs/*","com.bsd.proxy.monitor.MonitorApplication"]
CMD []

配置私仓身份认证

如果没有使用credHelper,

通过Docker Credential Helps

待补充,Docker Credential Helps

通过Maven配置

配置Setting.xml文件

maven加密方式参见Password Encryption

<settings>
  ...
  <servers>
    ...
    <server>
      <!-- id为仓库名称 -->
      <id>MY_REGISTRY</id>
      <username>MY_USERNAME</username>
      <!-- maven encryption -->
      <password>{MY_SECRET}</password>
    </server>
  </servers>
</settings>

Example

目标描述

使用spring boot构建一个java应用程序镜像,并指定profile运行.

为了能够指定profile运行,main启动代码要改动.

平常启动应用的命令是:

1	java -jar -Xms512m -Xmx512m **.jar --spring.profiles.active=prod

Jib打成的镜像,ENTRYPOINT实际的启动命令是:

1	java -Xms512m -Xmx512m -cp app/libs/*:app/resources:app/classes package.MonitorApplication

为了能够指定profile运行,需要在docker启动添加参数.

@SpringBootApplication
public class MonitorApplication {

    private final static String ACTIVE = "prod";

    public static void main(String[] args) {
        SpringApplicationBuilder builder = new SpringApplicationBuilder(MonitorApplication.class);
        if (Arrays.stream(args).anyMatch(t -> t.equals(ACTIVE))) {
            builder.profiles(ACTIVE);
        }
        builder.run(args);
    }
}

配置pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
         
    ......
	<build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
            <plugin>
                <groupId>com.google.cloud.tools</groupId>
                <artifactId>jib-maven-plugin</artifactId>
                <version>0.9.10</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>build</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <from>
                        <image>指定拉取镜像的私仓地址</image>
                        <auth>
                            <username>私仓用户名</username>
                            <password>私仓密码</password>
                        </auth>
                    </from>
                    <to>
                        <image>指定推送镜像的私仓地址</image>
                        <auth>
                            <username>指定推送镜像的私仓地址</username>
                            <password>私仓密码</password>
                        </auth>
                    </to>
                    <container>
                    	<!-- 指定MonitorApplication的全路径. -->
                        <mainClass>mypackage.MonitorApplication</mainClass>
                        <jvmFlags>
                            <jvmFlag>-Xms512m</jvmFlag>
                            <jvmFlag>-Xmx512m</jvmFlag>
                        </jvmFlags>
                        <ports>
                            <port>9999</port>
                        </ports>
                        <!-- 这里用于配置默认执行的profile,可以不用配置,在docker启动命令中配置 -->
                        <!--args>
                            <arg>prod</arg>
                        </args-->
                    </container>
                </configuration>
            </plugin>
        </plugins>
    </build>
<profile>

构建并上传images

1	$ mvn package

启动容器

1 2	$ docker pull [镜像名称] $ docker run -ti --network host --name [容器名称] -v /store/logs:/store/logs -p 9999:9999 -d [镜像名称] [指定profile]

结合Ansible Scripts

配置ansible脚本

$ mkdir -p /opt/ansible-biz/roles/monitor/tasks
$ cd /opt/ansible-biz
$ vim site.yml
- name: deploy proxy
  hosts: all
  gather_facts: false
  roles:
    - {role: monitor, tags: "monitor"}
$ vim hosts
[product]
xxyp-dev ansible_host=192.168.1.222 ansible_port=22
$ vim /opt/ansible-biz/roles/monitor/tasks/main.yml
- pip:
    name: docker-py
    state: present
- name: Pull monitor image
  docker_image:
    name: registry.cn-hangzhou.aliyuncs.com/yuanshi/monitor
    tag: v1.0.0
    force: yes
- name: Restart monitor container
  docker_container:
    name: operation
    image: registry.cn-hangzhou.aliyuncs.com/yuanshi/monitor:v1.0.0
    network_mode: host
    volumes:
      - /store/logs:/store/logs
    exposed_ports:
      - 9082
    recreate: yes
    command: "{{ profile }}"
#    env:
#      zooCluster="{{ zooCluster }}"
    restart_policy: always
    state: started

启动

$ ansible-playbook -i hosts -e profile=prod sity.yml

Expend Study

使用alpine镜像或distroless镜像作为基础构建镜像.

Reference Resources

zookeeper学习视图

发表于 2018-09-05 | 分类于 zookeeper

字数统计: 84 | 阅读时长 ≈ 1

Zookeeper技术视图

Foreward

上面这个视图,就是我对zookeeper需要学习知识的总结.我也将按照这个方式记录自己学习的内容.

学习资源

官网
书籍:《从Paxos到ZooKeeper》

服务端部署

常规部署

docker部署

ansible-playbook部署

客户端

ZKClient

Curator

ZAB

典型应用场景

运维

源码解读

docker实战之zookeeper集群

发表于 2018-08-29 | 分类于 docker

字数统计: 700 | 阅读时长 ≈ 4

环境

centos7
zookeeper 3.4.12
docker 18.03.1-ce

Preface

基于docker创建zookeeper集群.

Install

开始之前,先梳理下几个关键问题:

zookeeper的配置文件、日志、数据文件需要映射到宿主机中
docker hub中提供的镜像,其在容器中zookeeper的路径(需查阅dockerfile文件)

Pull Image

1	$ docker pull zookeeper:3.4.12

Prepare

create persistent directory

1 2	# 创建数据持久目录、日志目录、启动配置文件目录 $ mkdir -p {/opt/zookeeper/conf,/data/zookeeper,/data/logs/zookeeper}

create zoo.cfg、myid

# 这里看需要多少节点的集群,配置文件有适当调整,这里以三台为例.
$ vim /opt/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data
dataLogDir=/datalog
clientPort=20181
server.1=58.221.62.131:2888:3888
server.2=58.221.61.57:2888:3888    
server.3=58.221.61.62:2888:3888
# 对应机器,都需要myid文件里的值不能重复.
$ echo 1 > /data/zookeeper/myid

Start

# 每台机器分别启动,注意替换--name的值
$ docker run -ti --restart always -d \
-v /opt/zookeeper/conf/zoo.cfg:/conf/zoo.cfg \
-v /data/zookeeper:/data \
-v /data/logs/zookeeper:/datalog \
--network host \
--name zoo1 \
zookeeper:3.4.12
# 如果使用--network host参数,则不需要映射端口
-p 20181:20181 \
-p 2888:2888 \
-p 3888:3888 \
# 查询容器日志,需要全部启动才不会报错.
$ docker logs -t zoo1
# 查看集群,stat
$ telnet 58.221.62.131 20181
Trying 58.221.61.57...
Connected to 58.221.61.57.
Escape character is '^]'.
stat # 输入
Zookeeper version: 3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b, built on 03/27/2018 03:55 GMT
Clients:
 /58.221.62.131:52061[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/2/17
Received: 16
Sent: 15
Connections: 1
Outstanding: 0
Zxid: 0x100000bb0
Mode: follower
Node count: 1009
Connection closed by foreign host.

# 以下只做记录,不会出现这种问题.可用于其他docker容器启动报错排查参考.
mkdir: can't create directory '/opt/': Permission denied
# 查看容器使用用户
$ docker run -ti --rm --entrypoint="/bin/bash" zookeeper:3.4.12 -c "whoami && id"
# 查询容器中zookeeper目录权限
$ docker run -ti --rm --entrypoint="/bin/bash" zookeeper:3.4.12 -c "ls -la"
......
drwxr-xr-x 2 zookeepe zookeepe 4096 Jun 12 01:28 bin
-rw-rw-r-- 1 zookeepe zookeepe 87945 Mar 27 04:32 build.xml
drwxr-xr-x 2 zookeepe zookeepe 6 Jun 12 01:28 conf
....
# 查询容器启动用户id
$ docker run -ti --rm --entrypoint="/bin/bash" zookeeper:3.4.12 -c "cat /etc/passwd | grep zookeeper"
zookeeper:x:1000:1000:Linux User,,,:/home/zookeeper:
# 修改目录权限所属用户
$ chown -R 1000:1000 /opt/zookeeper

Don’t Forget

# 防火墙添加端口
## zookeeper访问端口,和配置文件中的clientPort一致
$ firewall-cmd --add-port=20181/tcp --permanent
$ firewall-cmd --add-port=2888/tcp --permanent
$ firewall-cmd --add-port=3888/tcp --permanent
$ firewall-cmd --reload

Use Ansible-playbook

待完成

参考

Spring boot之Actuator理解

发表于 2018-08-27 | 分类于 Spring boot

字数统计: 2,070 | 阅读时长 ≈ 10

Environment

Spring boot 2.0.4
Jdk1.8

Preface

今天生产上的日志文件把磁盘撑爆了,非常悲剧.既然出了问题,就得找到原因以及解决方案.(一劳永逸)

通过梳理,识别出以下几个问题:

代码中的日志没有行至有效的规范.debug和info没有明确的规范.
有些debug日志有助于生产上排查错误,需要动态切换日志级别的能力.

针对日志规范问题,每个人的见解不一样,我自己的梳理在《Java日志规范看法》中.

根据第二个问题,发现spring boot actuator提供了这个能力,这就促使我去研究一番.

Spring Boot Actuator

○ Spring Boot includes a number of additional features to help you monitor and manage your application when you push it to production. You can choose to manage and monitor your application by using HTTP endpoints or with JMX. Auditing, health, and metrics gathering can also be automatically applied to your application.

☆ Spring boot 提供了以HTTP或JMX管理和监控应用程序.适用于应用程序的审计、健康情况、度量收集.

依赖包

maven项目

<dependencies>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-actuator</artifactId>
	</dependency>
</dependencies>

提供的端点

详细用法参考Usage.

下表提供了16个请求端点,HTTP和JMX都可以使用.另外4个端点在使用web时可以用.

请求前缀为/actuators

例如:以http方式,监测应用程序健康情况:/actuators/health

1 2	$ curl http://localhost:[port]/actuator/health {"status":"UP"}

列表:

ID	Description	Enabled by default
`auditevents`	Exposes audit events information for the current application.	Yes
`beans`	Displays a complete list of all the Spring beans in your application.	Yes
`conditions`	Shows the conditions that were evaluated on configuration and auto-configuration classes and the reasons why they did or did not match.	Yes
`configprops`	Displays a collated list of all `@ConfigurationProperties`.	Yes
`env`	Exposes properties from Spring’s `ConfigurableEnvironment`.	Yes
`flyway`	Shows any Flyway database migrations that have been applied.	Yes
`health`	Shows application health information.	Yes
`httptrace`	Displays HTTP trace information (by default, the last 100 HTTP request-response exchanges).	Yes
`info`	Displays arbitrary application info.	Yes
`loggers`	Shows and modifies the configuration of loggers in the application.	Yes
`liquibase`	Shows any Liquibase database migrations that have been applied.	Yes
`metrics`	Shows ‘metrics’ information for the current application.	Yes
`mappings`	Displays a collated list of all `@RequestMapping` paths.	Yes
`scheduledtasks`	Displays the scheduled tasks in your application.	Yes
`sessions`	Allows retrieval and deletion of user sessions from a Spring Session-backed session store. Not available when using Spring Session’s support for reactive web applications.	Yes
`shutdown`	Lets the application be gracefully shutdown.	No
`threaddump`	Performs a thread dump.	Yes

○ If your application is a web application (Spring MVC, Spring WebFlux, or Jersey),you can use the following additional endpoints.

ID	Description	Enabled by default
`heapdump`	Returns a GZip compressed `hprof` heap dump file.	Yes
`jolokia`	Exposes JMX beans over HTTP (when Jolokia is on the classpath, not available for WebFlux).	Yes
`logfile`	Returns the contents of the logfile (if `logging.file` or `logging.path` properties have been set). Supports the use of the HTTP `Range` header to retrieve part of the log file’s content.	Yes
`prometheus`	Exposes metrics in a format that can be scraped by a Prometheus server.	Yes

开启端点

actuator中提供的20个端点,默认级别开关级别如下:

ID	JMX	Web
`auditevents`	Yes	No
`beans`	Yes	No
`conditions`	Yes	No
`configprops`	Yes	No
`env`	Yes	No
`flyway`	Yes	No
`health`	Yes	Yes
`heapdump`	N/A	No
`httptrace`	Yes	No
`info`	Yes	Yes
`jolokia`	N/A	No
`logfile`	N/A	No
`loggers`	Yes	No
`liquibase`	Yes	No
`metrics`	Yes	No
`mappings`	Yes	No
`prometheus`	N/A	No
`scheduledtasks`	Yes	No
`sessions`	Yes	No
`shutdown`	Yes	No
`threaddump`	Yes	No

JMX

除prometheus、logfile、jolokia、heapdump没有提供外,其余端点默认开启.
Web

默认只开启health、info端点.

正确打开姿势:

JMX

# 打开 management.endpoints.jmx.exposure.include
# 关闭 management.endpoints.jmx.exposure.exclude
# 如关闭mappings、shutdown端点.
management.endpoints.jmx.exposure.exclude=mappings,shutdown
# 如关闭所有端点.
management.endpoints.jmx.exposure.exclude=*
# YAML中*有特殊含义,需要用"*"

Web

# 打开 management.endpoints.web.exposure.include
# 关闭 management.endpoints.web.exposure.exclude
# 如打开env、mappings端点
management.endpoints.web.exposure.include=env,mappings
# 如打开所有端点
management.endpoints.web.exposure.include=*
# YAML中*有特殊含义,需要用"*"

端点详细用法

请求前缀为/actuators.具体用法参考文档Usage.

配置端点缓存

○ Endpoints automatically cache responses to read operations that do not take any parameters.

☆ 官方说是默认缓存读取操作的无参端点返回值.至于具体时间只能看源码了.

对应的源码类为:EndpointAutoConfiguration、EndpointIdTimeToLivePropertyFunction

正确配置姿势如下:

1
2
3

# The prefix management.endpoint.<name> is used to uniquely identify the endpoint that is being configured.
# 如配置bean端点返回值缓存时间为10秒
management.endpoint.beans.cache.time-to-live=10s

自定义端点路径

base-path

默认值是:/actuator
path-mapping

端点映射路径,默认是官方提供的20个端点名称.

完整请求路径为:[base-path]+[path-mapping]

1
2
3

# 原health端点为:/actuator/health,现在改为:/manager/healthcheck
management.endpoints.web.base-path=/manager
management.endpoints.web.path-mapping.health=healthcheck

自定义端点

○ If you add a @Bean annotated with @Endpoint, any methods annotated with @ReadOperation, @WriteOperation, or @DeleteOperation are automatically exposed over JMX and, in a web application, over HTTP as well. Endpoints can be exposed over HTTP using Jersey, Spring MVC, or Spring WebFlux.

○ You can also write technology-specific endpoints by using @JmxEndpoint or @WebEndpoint. These endpoints are restricted to their respective technologies. For example, @WebEndpoint is exposed only over HTTP and not over JMX.

○ You can write technology-specific extensions by using @EndpointWebExtension and @EndpointJmxExtension. These annotations let you provide technology-specific operations to augment an existing endpoint.

○ Finally, if you need access to web-framework-specific functionality, you can implement Servlet or Spring @Controller and @RestController endpoints at the cost of them not being available over JMX or when using a different web framework.

☆ actuator可以提供JMX和HTTP两种方式,所以也提供对应实现的方式.

@Endpoint针对JMX和HTTP.
@JmxEndpoint或@EndpointJmxExtension只针对JMX.
@WebEndpoint或@EndpointWebExtension只针对HTTP.
@ReadOperation, @WriteOperation, @DeleteOperation 这三个用于指定请求方式.

Operation	HTTP method
`@ReadOperation`	`GET`
`@WriteOperation`	`POST`
`@DeleteOperation`	`DELETE`

☆ 使用actuator的自定义端点有特别的意义吗?

目前看,针对HTTP的方式,与平常我自己暴露端点也没区别.但针对JMX监控的话,这就是它优势之处了.

另外,个人认为没有特殊要求,使用@Endpoint更好,既提供了JMX监控端点,也同时提供了HTTP监控端点.

如何使用

○ To allow the input to be mapped to the operation method’s parameters, Java code implementing an endpoint should be compiled with -parameters, and Kotlin code implementing an endpoint should be compiled with -java-parameters. This will happen automatically if you are using Spring Boot’s Gradle plugin or if you are using Maven and spring-boot-starter-parent.

☆ 在使用@ReadOperation等时,默认是按照参数名匹配入参,如果需要参数数量自动匹配,需要在spring boot时添加-parameters.

新增端点

使用@Endpoint、@ReadOperation, @WriteOperation, @DeleteOperation

例子:

@Component
@Endpoint(id = "custom-health")
public class CustomHealthEndpoint {

    @ReadOperation
    public String health() {
        return "health";
    }

    @ReadOperation
    public String health2(@Selector String name) {
        return "custom-end-point get parameter: " + name;
    }

    @ReadOperation
    public String health3(@Selector String name, @Selector String name2) {
        return "custom-end-point get parameter1: " + name +",parameter2: " + name2;
    }

    @WriteOperation
    public String writeOperation(@Selector String name) {
        return "custom-end-point post";
    }

    @DeleteOperation
    public String deleteOperation(@Selector String name) {
        return "custom-end-point delete";
    }
}

启动时,使用-parameters参数.

# 调用health2方法 http://localhost:[port]/actuator/custom-health/{anystring}
$ curl http://localhost:[port]/actuator/custom-health/hello
custom-end-point get parameter: hello

# 调用health3方法,参数名必须相同,{angstring}用任意字符串替换就行.http://localhost:[port]/actuator/custom-health/{anystring}/{anystring}
$ curl http://localhost:[port]/actuator/custom-health/hello/world
custom-end-point get parameter1: hello,parameter2: world

启动时,不使用-parameters参数.

# 调用health无参方法
$ curl http://localhost:[port]/actuator/custom-health
health

# 调用health2方法
$ curl http://localhost:[port]/actuator/custom-health/name
{"timestamp":"2018-08-27T11:04:06.709+0000","status":400,"error":"Bad Request","message":"Missing parameters: name","path":"/actuator/custom-health/p1"}

只能通过以下方式才能请求到health2方法,参数名必须相同.{angstring}用任意字符串替换就行.
$ curl http://localhost:[port]/actuator/custom-health/{anystring}?name=hello
custom-end-point get parameter: hello

# 调用health3方法,参数名必须相同,{angstring}用任意字符串替换就行.
$ curl http://localhost:[port]/actuator/custom-health/{anystring}/{anystring}?name=hello&name=world
custom-end-point get parameter1: hello,parameter2: world

☆ 总结:actuator中端点的多参数请求方式,不按照参数名匹配.所以需要在启动时添加-parameters参数.

使用WebEndpointExtension或@EndpointJmxExtension

@Endpoint(id = "myhealth")
public class MyHealthEndpoint {

    @ReadOperation
    public String health() {
        return "health";
    }
}

@EndpointWebExtension(endpoint = MyHealthEndpoint.class)
public class MyHealthWebEndpointExtension {

    private final MyHealthEndpoint delegate;

    public MyHealthWebEndpointExtension(MyHealthEndpoint delegate) {
        this.delegate = delegate;
    }

    @ReadOperation
    public WebEndpointResponse<String> getHealth() {
        return new WebEndpointResponse<>("health", 200);
    }
}

@Configuration
public class ActuatorConfiguration {

    @Bean
    @ConditionalOnMissingBean
    @ConditionalOnEnabledEndpoint
    public MyHealthEndpoint myHealthEndpoint() {
        return new MyHealthEndpoint();
    }

    @Bean
    @ConditionalOnMissingBean
    @ConditionalOnEnabledEndpoint
    @ConditionalOnBean({MyHealthEndpoint.class})
    public MyHealthWebEndpointExtension myHealthWebEndpointExtension(
            MyHealthEndpoint delegate) {
        return new MyHealthWebEndpointExtension(delegate);
    }
}
# application.yml
management:
  endpoints:
    myhealth:
      enabled: true

覆盖原端点

目前覆盖原端点只能通过重载的方式.我这里测试了health端点的覆盖.

@Component
public class MyHealthIndicator implements HealthIndicator {

    @Override
    public Health health() {
        return Health.down().build();
    }

}
# 再次请求Health端点,返回值就为{"status":"DOWN"}

遗留问题

端点默认缓存的默认时间是多少?

官网还有更多关于监控文章待学习.

参考

spring-boot-2.0.4-doc

Custom Endpoint in Spring Boot Actuator

How to make the `@Endpoint(id = “health”)` working in Spring Boot 2.0?

kubernetes的DNS理解

发表于 2018-08-23 | 分类于 kubernetes

字数统计: 1,335 | 阅读时长 ≈ 6

Foreward

对于k8s的DNS理解还是有些模糊,这里梳理阅读相关文章后的理解.

Kubernetes提供的DNS服务

DNS是kubernetes内置的Pod服务.包含三个容器:

kubedns

监测kubernetes的master节点对Services和Endpoints的变化,并且保留在内存中,服务DNS查询.
dnsmasq

缓存DNS,提高查询效率.
sidecars

为dnsmasq和kubedns提供健康检查的端点.

Kube-DNS和CoreDNS

kubernetes提供了两种DNS服务

kube-dns
CoreDNS

从v1.11版本开始,CoreDNS已经是GA版本了,且已经作为Kubernetes的DNS服务.(CoreDNS已经作为CNCF的独立项目)

kube-dns是在1.9版本前使用.

在v1.11版本之后,(不建议)如果还想继续使用kube-dns,则在初始化集群是配置以下参数

1	$ kubeadm init --feature-gates=CoreDNS=false

配置kube-dns的存根域和上游DNS服务器

这里有两个概念:stub domains 和upstream nameservers.这里我翻译为存根域和上游DNS服务器.

upstreamNameservers,会覆盖node节点上的/etc/resolv.conf文件,且最多配置三个upstream nameservers.

例子:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
data:
  stubDomains: |
    {"acme.local": ["1.2.3.4"]}
  upstreamNameservers: |
    ["8.8.8.8", "8.8.4.4"]

DNS请求如果后缀有acme.local,则返回DNS Server的地址1.2.3.4.

Domain name	Server answering the query
kubernetes.default.svc.cluster.local	kube-dns
foo.acme.local	custom DNS (1.2.3.4)
widget.com	upstream DNS (one of 8.8.8.8, 8.8.4.4)

Pod设置dnsPolicy对DNS查询的影响

当在pod中设置的dnsPolicy为default 和None,则自定义的stub domain和upstream nameservers不会生效.

当dnsPolicy为ClusterFirst后

未配置了存根域和upstream

如果咩有匹配的domain后缀,如www.kubernetes.io则去查找upstream nameserver.
配置自定义存根域和upstream

首先查找kube-dns的DNS cache.

再查找自定义的stub domain,即图中的custom DNS.

最后查找upstream DNS.

配置CoreDNS的存根域和上游DNS服务器

CoreDNS提供链条插件式扩展,非常灵活.CoreDNS安装后默认包含了30个插件.CoreDNS的功能可以由一个或多个插件组成.只要会go语言,以及指导DNS工作原理就可以开发插件.

CoreDNS的配置文件Corefile.且语法规则如下:

coredns.io {
    file coredns.io.signed {
        transfer to * 185.49.140.62
    }
    prometheus
    errors
    log
}

详细信息可浏览官网CoreDNS.

在v1.10版本后,kubeadm支持自动转换ConfigMap为Corefile.

Example:

kubedns使用以下配置.stubDomain存根域及upstream上游域.

apiVersion: v1
data:
  federations: |
    {"foo" : "foo.feddomain.com"}
  stubDomains: |
    {"abc.com" : ["1.2.3.4"], "my.cluster.local" : ["2.3.4.5"]}
  upstreamNameservers: |
    ["8.8.8.8", "8.8.4.4"]
kind: ConfigMap

等价的Corefile配置文件为:

For federations:

1
2
3

federation cluster.local {
       foo foo.feddomain.com
    }

For stubDomains:

abc.com:53 {
    errors
    cache 30
    proxy . 1.2.3.4
}
my.cluster.local:53 {
    errors
    cache 30
    proxy . 2.3.4.5
}

完整配置如下:DNS使用UDP协议,且端口为53

.:53 {
        errors
        health
        kubernetes cluster.local  in-addr.arpa ip6.arpa {
           upstream  8.8.8.8 8.8.4.4
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
        }
        federation cluster.local {
           foo foo.feddomain.com
        }
        prometheus :9153
        proxy .  8.8.8.8 8.8.4.4
        cache 30
    }
    abc.com:53 {
        errors
        cache 30
        proxy . 1.2.3.4
    }
    my.cluster.local:53 {
        errors
        cache 30
        proxy . 2.3.4.5
    }

DNS中的记录生成规则

DNS包含A记录和SRV记录.A记录就是ip和域名的映射,SRV记录是端口映射.

Service

Service分为Headless和非Headless.(Headless Service:.spec.clusterIP设置为None)

Service创建之后,默认会生成一条DNS映射的A记录,格式为:[.metadata.name].[namespace].svc.cluster.local.

还会生成一条DNS映射的SRV(端口)记录,格式为:_my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local.

对非Headless的Service,端口的DNS映射就是:[.metadata.name].[namespace].svc.cluster.local.

对于Headless的Service,目前还不是特别明白.暂且先将原文描述贴下来.For a headless service, this resolves to multiple answers, one for each pod that is backing the service, and contains the port number and the domain name of the pod of the form auto-generated-name.my-svc.my-namespace.svc.cluster.local.

Pod

创建Pod会生成一条DNS的A记录:pod-ip-address.my-namespace.pod.cluster.local.

在集群中查找Pod,可以通过这种格式[.metadata.name].[.spec.subdomain].[namespace].svc.cluster.local查找.

Pod的DNS规则

设置字段:.spec.dsnPolicy.有四种规则:

Default

虽然名字是Default,但是不是默认规则.

The Pod inherits the name resolution configuration from the node that the pods run on
ClusterFirst

集群规则优先,如果没有查询到,则去上游域名服务器查询.集群DNS服务和上游DNS服务都可以配置.
ClusterFirstWithHostNet

For Pods running with hostNetwork, you should explicitly set its DNS policy
None

忽略在kubernetes环境DNS配置,并使用自定义的DNS配置..spec.dsnConfig

如何手动设置Pod中的DNS解析配置

需要在集群中开启功能支持,--feature-gates=CustomPodDNS=true.

例如:

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 1.2.3.4
    searches:
      - ns1.svc.cluster.local
      - my.dns.search.suffix
    options:
      - name: ndots
        value: "2"
      - name: edns0

运行后,会在Pod中的/etc/resolv.conf文件中生成以下内容:

1
2
3

nameserver 1.2.3.4
search ns1.svc.cluster.local my.dns.search.suffix
options ndots:2 edns0

自定义DNS服务

DNS for Services and Pods翻译

发表于 2018-08-22 | 分类于 kubernetes

字数统计: 748 | 阅读时长 ≈ 3

原文链接

DNS for Services and Pods

这篇文章是kubernetes关于DNS的概述.

Introduction
Services
Pods

Introduction

Kubernetes DNS 在集群中调度一个DNS Pod和Service,并配置kubelets去告诉独立的容器使用DNS Service’s IP去解析DNS名称.

What things get DNS names?

在集群中的每个service都会被分配一个DNS名称.默认情况下,客户端发起的Pod的DNS搜索包括Pod的namespace和集群默认的域名.这里有个例子说明:

假设一个Service名称为foo,在kubernetes中的namespace为bar.一个运行在namespace为bar的pod,通过简便的DNS查询到名称为foo的Service.一个运行在namespace为quux的pod,可以在DNS中通过搜索名称为foo.bar,并查到这个容器.

下面的章节详细的说明了支持的record类型以及支持的布局设计.

Services

A Record

正常(not headless)Services会被分配一个这种格式的DNS A 记录my-svc.my-namespace.svc.cluster.local.这个解析集群的Service Ip.

Headless(没有cluster ip)Services会被分配一个这种格式的DNS A记录my-svc.my-namespace.svc.cluster.local.和正常的Services不同的是,通过解析Pod中的一组Ip.

SRV Record

SRV记录是在正常或Headless Service创建时指定端口.每个指定的端口,SRV记录的格式是: _my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local.对于headless serviec来说,这个解析会得到多个结果,pod中的容器端口号和子域名格式为:auto-generated-name.my-svc.my-namespace.svc.cluster.local.

Pods

A Record

Pods会被分配这种格式的一条DNS A记录:pod-ip-address.my-namespace.pod.cluster.local

例如:一个pod的ip是1.2.3.4,namespace为default,DNS为cluster.local的DNS记录为:1-2-3-4.default.pod.cluster.local

Pod’s hostname and subdomain fields

创建pod时,它的hostname默认为Pod中的metadata.name的值.

Pod的spec有个hostname字段选项,用来指定pod的hostname.而.spec.hostname优先级高于.metadata.name.例如:给一个pod的hostname设置为”my-host”,那么这个pod的名称就是”my-host”.

Pod的spec有个subdomain字段选项,用来指定pod的子域名.例如:一个pod的hostname设置为”foo”,subdomain设置为”bar”,namespace为”my-namespace”,则它的查询名称就为:foo.bar.my-namespace.svc.cluster.local

例子:

apiVersion: v1
kind: Service
metadata:
  name: default-subdomain
spec:
  selector:
    name: busybox
  clusterIP: None
  ports:
  - name: foo # Actually, no port is needed.
    port: 1234
    targetPort: 1234
---
apiVersion: v1
kind: Pod
metadata:
  name: busybox1
  labels:
    name: busybox
spec:
  hostname: busybox-1
  subdomain: default-subdomain
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    name: busybox
---
apiVersion: v1
kind: Pod
metadata:
  name: busybox2
  labels:
    name: busybox
spec:
  hostname: busybox-2
  subdomain: default-subdomain
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    name: busybox

如果存在一个headless service和一个pod在同一个namespace中,并且headless service的名称和pod中的subdomain名称相同,kubernetes的DNS服务一样查询到这个Service.例如:上面配置中,一个Pod的hostname是”busybox-1”,subdomain是default-subdomain,而headless service命名为”default-subdomain”.那么这个DNS为busybox-1.default-subdomain.my-namespace.svc.cluster.local

kubernetes常用命令

发表于 2018-08-21 | 分类于 kubernetes

字数统计: 1,810 | 阅读时长 ≈ 9

控制器

Deployment

# nginx-deployment.yaml
$ vim nginx-deployment.yaml
apiVersion: apps/v1 # 1.11版本之后
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3 # 3个副本
  selector:
    matchLabels:
      app: nginx # 与template.metadata.labels一致
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

创建Deployments

record参数设置为true,在Deployment revision时方便命令记录,以及在describe时能够看到Annotations中指令记录

1 2	$ kubectl create -f nginx-deployment.yaml --record deployment "nginx-deployment" created

查看发布历史记录

rollout 字面是上线意思,我理解为发布.

替换metadata.name,即上面文件中的nginx-deployment名称.

# kubectl rollout history deployment/[metadata.name]
$ kubectl rollout history deployment/nginx-deployment
# 如果在创建Deployments时没有使用--record,没有命令记录.
deployments "nginx-deployment"
REVISION  CHANGE-CAUSE
1         <none>
# 如果在创建时指定--record,会有命令记录.
deployments "nginx-deployment"
REVISION  CHANGE-CAUSE
1         kubectl create --filename=nginx-deployment.yaml --record=true

查看历史版本的具体信息

$ kubectl rollout history deployment/nginx-deployment --revision=1
deployments "nginx-deployment" with revision #1
Pod Template:
  Labels:	app=nginx
	pod-template-hash=2777190766
  Annotations:	kubernetes.io/change-cause=kubectl create --filename=nginx-deployment.yaml --record=true
  Containers:
   nginx:
    Image:	nginx
    Port:	80/TCP
    Host Port:	0/TCP
    Environment:	<none>
    Mounts:	<none>
  Volumes:	<none>

查看Deployments

$ kubectl get deployment [metadata.name](可选)
# 不指定deployment的名字将查询全部的deployments.如下:
NAME                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
hello-node-deployment   1         1         1            1           17m
nginx-deployment        1         1         1            1           4m

查看RS(ReplicaSet)

1
2
3

$ kubectl get rs
NAME                              DESIRED   CURRENT   READY     AGE
nginx-deployment-966857787        1         1         1         5m

更新Deployments(自动rollout)

# 方式一,修改文件中的spec.template.spec.containers[0].image值
kubectl edit deployment/[metadata.name]
$ kubectl edit deployment/nginx-deployment
# 方式二 
kubectl set image deployment [metadata.name] [spec.spec.containers.name]=镜像名称
$ kubectl set image deployment nginx-deployment nginx=nginx:1.7.9

查看发布状态

# kubectl rollout status deployment/[metadata.name]
$ kubectl rollout status deployment/nginx-deployment
Waiting for deployment "nginx-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "nginx-deployment" rollout to finish: 1 old replicas are pending termination...
deployment "nginx-deployment" successfully rolled out

查看deployment详细信息

关注Events,记录了发布的过程.实际是创建了一个新的ReplicaSet->nginx-deployment-6ccc5f4cbb,然后逐渐下原来的ReplicaSet->nginx-deployment-966857787的Pod.

# kubectl describe deployment/[metadata.name]
$ kubectl describe deployment/nginx-deployment
Name:                   nginx-deployment
Namespace:              default
CreationTimestamp:      Tue, 21 Aug 2018 15:18:18 +0800
Labels:                 app=nginx
Annotations:            deployment.kubernetes.io/revision=2
                        kubernetes.io/change-cause=kubectl create --filename=nginx-deployment.yaml --record=true
Selector:               app=nginx
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=nginx
  Containers:
   nginx:
    Image:        nginx:1.7.9
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   nginx-deployment-6ccc5f4cbb (3/3 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  4m    deployment-controller  Scaled up replica set nginx-deployment-966857787 to 3
  Normal  ScalingReplicaSet  3m    deployment-controller  Scaled up replica set nginx-deployment-6ccc5f4cbb to 1
  Normal  ScalingReplicaSet  3m    deployment-controller  Scaled down replica set nginx-deployment-966857787 to 2
  Normal  ScalingReplicaSet  3m    deployment-controller  Scaled up replica set nginx-deployment-6ccc5f4cbb to 2
  Normal  ScalingReplicaSet  3m    deployment-controller  Scaled down replica set nginx-deployment-966857787 to 1
  Normal  ScalingReplicaSet  3m    deployment-controller  Scaled up replica set nginx-deployment-6ccc5f4cbb to 3
  Normal  ScalingReplicaSet  3m    deployment-controller  Scaled down replica set nginx-deployment-966857787 to 0

版本回退

回滚之后,revision对应记录就会消失
undo是回滚到上一个版本的操作.

假设有三个版本:nginx:1.4.7, nginx:1.7.9, nginx:1.9.7, nginx:1.10.3,当前版本为nginx:1.10.3.

1、假如指定to-revision回滚到1.7.9版本,再执行undo(不指定to-revision),则恢复到1.10.3版本.

2、第一次执行undo(不指定to-revision),回滚到1.9.7版本,再次执行undo(不指定to-revision),则恢复到1.10.3版本.

3、第一次指定to-revision回滚到1.7.9,第二次指定to-revision回滚到1.9.7,第三次指定to-revision回滚到1.4.7,第四次指定undo(不指定to-revison),则是回滚到1.9.7.

假设有三个版本:nginx:1.7.9,nginx:1.9.7,nginx:1.10.3,当前版本为nginx:1.10.3.

# 查看发布历史记录
$ kubectl rollout history deployment/nginx-deployment 
deployments "nginx-deployment"
REVISION  CHANGE-CAUSE
1         kubectl create --filename=nginx-deployment.yaml --record=true
2         kubectl create --filename=nginx-deployment.yaml --record=true
3         kubectl create --filename=nginx-deployment.yaml --record=true

第一次执行undo回退到前一个版本,即nginx:1.9.7.如果第二次再执行,又会回滚到原版本,即nginx:1.10.3.

# kubectl rollout undo deployment/[metatdata.name]
$ kubectl rollout undo deployment/nginx-deployment
deployment.extensions/nginx-deployment
# 这里查看下发布历史记录,发现revision为3的记录消失了.
deployments "nginx-deployment"
REVISION  CHANGE-CAUSE
1         kubectl create --filename=nginx-deployment.yaml --record=true
2         kubectl create --filename=nginx-deployment.yaml --record=true
4         kubectl create --filename=nginx-deployment.yaml --record=true

指定to-revision回退到指定历史

# kubectl rollout undo deployment/[metadata.name] --to-revision=[number]
$ kubectl rollout undo deployment/nginx-deployment --to-revision=2
deployment.extensions/nginx-deployment
# 这里如果执行undo但不指定--to-revision,则恢复导原来版本.

Deployment扩容

指定扩容/缩容副本数量

# kubectl scale deployment/[metadata.name] --replicas=[number]
$ kubectl scale deployment/nginx-deployment --replicas=6
deployment.extensions/nginx-deployment scaled
# 查看扩容状态
$ kubectl rollout status deployment/nginx-deployment
Waiting for deployment "nginx-deployment" rollout to finish: 3 of 6 updated replicas are available...
Waiting for deployment "nginx-deployment" rollout to finish: 4 of 6 updated replicas are available...
Waiting for deployment "nginx-deployment" rollout to finish: 5 of 6 updated replicas are available...
deployment "nginx-deployment" successfully rolled out

当集群启用horizontal pod autoscaling后,可以根据CPU利用率,在范围内扩容或缩容.(todo还不知道怎么做)

1 2	$ $ kubectl autoscale deployment nginx-deployment --min=10 --max=15 --cpu-percent=80 deployment "nginx-deployment" autoscaled

设置Deployments发布历史记录

在nginx-deployment.yaml中设置spec.revisionHistoryLimits属性,默认是保留全部历史记录.可以不用去理会.

设置Deployments发布策略

spec.strategy 指定新的Pod替换旧的Pod的策略。 spec.strategy.type 可以是Recreate或者是 RollingUpdate。RollingUpdate是默认值。

Recreate 指在创建出新的Pod之前会杀掉已经存在的Pod.(强烈建议不使用)
RollingUpdate 指滚动升级.逐步一个一个交替升级.可以指定maxUnavailable 和 maxSurge 来控制 rolling update 进程。
- maxUnavaiables .spec.strategy.rollingUpdate.maxUnavailable指定在升级的过程中不可用的Pod数量.默认为1,也可以设置为百分比.如设置为30%,则原来的ReplicaSet会立刻缩容到70%.
- maxSurge .spec.strategy.rollingUpdate.maxSurge指定在升级过程中,新老Pod的总数的最大值.如设置为30%,启动rolling update后新的ReplicatSet将会立即扩容,新老Pod的总数不能超过期望的Pod数量的130%。

Pause设置

.spec.paused是可以可选配置项，boolean值。默认为false.

如果设置paused后,对Deployment中的PodTemplateSpec的修改都不会触发新的rollout。

后续需要学习

如何在集群中启用了horizontal pod autoscaling

参考

Kubernetes apis

Gluster入门

发表于 2018-08-20 | 分类于 Kubernetes

字数统计: 2,888 | 阅读时长 ≈ 14

Gluster技术视图

Notice

Recommend use XFS filesystem.

○ Typically, XFS is recommended but it can be used with other filesystems as well. Most commonly EXT4 is used when XFS isn’t, but you can (and many, many people do) use another filesystem that suits you.

☆ 推荐使用XFS文件系统.EXT4等其他文件系统也是可以.
Correct DNS entries (forward and reverse) and NTP are essential.

☆ DNS一般不需要特殊配置,采用默认即可.NTP,就是要求每台机器的时钟进行校对,节点机器都在同一个时区,并校对.校对方式很多,使用一致的即可.
Firewalls are great, except when they aren’t.In case you absolutely need to set up a firewall, have a look at Setting up clients for information on the ports used.

☆ 不建议Gluster节点之间开启防火墙.如果实在有必要开启防火墙,我是配置IP级别的,这样可以减少一些复杂度.
2 CPU’s, 2GB of RAM, 1GBE(千兆带宽)

☆ 服务端配置至少需要这种配置

客户端

Gluster Native Client

○ The Gluster Native Client is a FUSE-based client running in user space. Gluster Native Client is the recommended method for accessing volumes when high concurrency and high write performance is required.

☆ 推荐使用这种方式,其基于内核提供的FUSE,在高并发、大数据量写入时效果更好.

NFS Client

Foreward

What

○ GlusterFS is a scalable network filesystem suitable for data-intensive tasks such as cloud storage and media streaming. GlusterFS is free and open source software and can utilize common off-the-shelf hardware.

GlusterFS isn’t really a filesystem in and of itself. It concatenates existing filesystems into one (or more) big chunks so that data being written into or read out of Gluster gets distributed across multiple hosts simultaneously

☆ Gluster是一个开源的、可扩展的、分布式数据存储管理软件.其并不是一个文件系统,只是提供连接能力,将分布的文件系统组装成一个更大的文件存储系统.

Concept

TSP

○ A trusted storage pool(TSP) is a trusted network of storage servers. Before you can configure a GlusterFS volume, you must create a trusted storage pool of the storage servers that will provide bricks to the volume by peer probing the servers. The servers in a TSP are peers of each other.

☆ Gluster通过TSP(信任存储池)来确定可提供存储服务的机器有哪些.
Brick

○ A brick is used to refer to any device (really this means filesystem) that is being used for Gluster storage.

☆ Gluster使用的存储单位.在linux系统中,常用fdisk -l来查看挂载的磁盘,而brick对Gluster来说,就是它的挂载的磁盘.只不过这里将brick与磁盘进行一个bind,一一映射.
Gluster volume

○ A Gluster volume is a collection of one or more bricks (of course, typically this is two or more). This is analogous(类似的) to /etc/exports entries for NFS.

☆ Brick的集合.
Global Namespace

○ The term Global Namespace is a fancy way of saying a Gluster volume.

☆ 对Cluster volume的另一种叫法.
Export

○ An export refers to the mount path of the brick(s) on a given server, for example, /export/brick1.

☆ 暂时不能理解,待补充.
GNFS and kNFS

○ GNFS is how we refer to our inline NFS server. kNFS stands for kernel NFS, or, as most people would say, just plain NFS. Most often, you will want kNFS services disabled on the Gluster nodes. Gluster NFS doesn’t take any additional configuration and works just like you would expect with NFSv3. It is possible to configure Gluster and NFS to live in harmony if you want to.

☆ Gluster内部的NFS服务,启动好Glusterd Daemon后,通过GNFS其他Gluster进行数据交互.kNFS是Linux系统内核的NFS服务.两者不会互相干扰,可以共同使用.

How

Install

Centos系统可参考<<Gluster安装>>.其他系统安装.

System Packaged version

各系统的安装包版本及依赖包.Packages

Manage Trust Storage Pool

○ The firewall on the servers must be configured to allow access to port 24007.

☆ 存储节点使用24007端口进行通信

假设有4台机器,server1,server2,server3,server4在一个TSP中.

Add to trust storage pool

在任意一台机器上机器上执行.

gluster peer probe

1 2	$ gluster peer probe <server> Probe successful

List Servers

在任意一台机器上执行.

gluster pool list

# 假设在server1上执行.
$ gluster pool list
UUID                                    Hostname        State
d18d36c5-533a-4541-ac92-c471241d5418    localhost       Connected
5e987bda-16dd-43c2-835b-08b7d55e94e5    server2         Connected
1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7    server3         Connected
3e0cabaa-9df7-4f66-8e5d-cbc348f29ff7    server4         Connected

Views peer status

gluster peer status

# 假设在server1上执行.
$ gluster peer status
Hostname: server2
Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5
State: Peer in Cluster (Connect
Hostname: server3
Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
State: Peer in Cluster (Connect
Hostname: server4
Uuid: 3e0cabaa-9df7-4f66-8e5d-cbc348f29ff7
State: Peer in Cluster (Connected)

Removing Servers

1	# gluster peer detach <server>

Brick Naming Convertions

/data/glusterfs///brick

是对linux磁盘绑定起的别名,如系统中/dev/sdb磁盘,我们绑定后,可以命名为test(环境使用类型),这样可区分所属环境.

就可以任意命名了,我是通过业务进行区分.如es表示搜索引擎业务使用,logs表示日志使用.不同的业务可能对磁盘性能可能也是不一样的,可以通过多个磁盘分出来.

要搞明白Brick的命名规范,就需要先理解brick的概念.在linux系统中,常用fdisk -l来查看挂载的磁盘,而brick就Gluster来说,就是它的挂载的磁盘.只不过这里将brick与磁盘进行一个bind,一一映射.

举个例子:

比如一块物理磁盘/dev/sdb,现在我用于测试环境中,用户通常的业务,存储一些日志等.

$ mkdir -p /data/glusterfs/test/biz
$ mount /dev/sdb /data/glusterfs/test/biz
$ gluster volume create test replica 2 server{1..4}:/data/glusterfs/test/biz/brick
# 如果要启用还要start
$ gluster volume start test

这里有个疑问,为什么需要使用brick呢?

假如server1有两个磁盘/dev/sda,/dev/sdb.而server2有两个/dev/sdb1,/dev/sdb2.磁盘名称就存在不同,要通过brick去屏蔽底层磁盘的名称的不同和性能的不同.

$ mkdir -p /data/glusterfs/test/biz
# server1
$ mount /dev/sda /data/glusterfs/test/biz
# server2
$ mount /dev/sdb1 /data/glusterfs/test/biz
$ gluster volume create test replica 2 server{1..4}:/data/glusterfs/test/biz/brick

Formatting and Mounting Bricks

待完善.这里主要涉及Linux卷相关概念:lV逻辑卷,VG卷组,PV物理卷.

https://wiki.archlinux.org/index.php/LVM

https://linux.cn/article-5117-1.html

https://askubuntu.com/questions/417642/logical-volume-physical-volume-and-volume-groups

Set ACL

待完善.这里主要是Linux中ACL与Gluster的使用.

Volume Types

○ A volume is a logical collection of bricks.

以下罗列了Gluster提供的Volume类型.

Distributed - Distributed volumes distribute files across the bricks in the volume. You can use distributed volumes where the requirement is to scale storage and the redundancy is either not important or is provided by other hardware/software layers.
Replicated – Replicated volumes replicate files across bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.
Distributed Replicated - Distributed replicated volumes distribute files across replicated bricks in the volume. You can use distributed replicated volumes in environments where the requirement is to scale storage and high-reliability is critical. Distributed replicated volumes also offer improved read performance in most environments.
Dispersed - Dispersed volumes are based on erasure codes, providing space-efficient protection against disk or server failures. It stores an encoded fragment of the original file to each brick in a way that only a subset of the fragments is needed to recover the original file. The number of bricks that can be missing without losing access to data is configured by the administrator on volume creation time.

☆ 这里的关键是erasure codes算法.Erasure-Code, 简称 EC, 也叫做 擦除码 或 纠删码, 指使用范德蒙(Vandermonde) 矩阵的里德-所罗门码(Reed-Solomon) 擦除码算法.

通过较少的数据冗余能够找回丢失数据.相比于Relica百分百冗余来说,这个方式更节省空间.

推荐学习这篇文章drdr.xp Blog
Distributed Dispersed - Distributed dispersed volumes distribute files across dispersed subvolumes. This has the same advantages of distribute replicate volumes, but using disperse to store the data into the bricks.
Striped [Deprecated] 、Distributed Striped [Deprecated] 、Distributed Striped Replicated [Deprecated]、Striped Replicated [Deprecated]

☆ 上面主要有三种volume类型:Distributed, Replicated, Dispersed. 和组合后的二种:Distributed Replicated, Distributed Dispersed.

Create Command

stripe已经废弃,所以目前只有replica和disperse两种volume类型.

# gluster volume create [stripe | replica | disperse] [transport tcp | rdma | tcp,rdma]
  volume  create  <NEW-VOLNAME> [stripe <COUNT>] [replica <COUNT>] [disperse
  [<COUNT>]] [redundancy <COUNT>] [transport <tcp|rdma|tcp,rdma>] <NEW-BRICK>
  ...
  Create a new volume of the specified type using the specified bricks
  and transport type (the default transport type is tcp).  To create a
  volume   with  both  transports  (tcp  and  rdma),  give  'transport
  tcp,rdma' as an option.

Distributed

优点

节省空间,易扩展.
缺点

数据丢失风险:由于数据没有冗余,一旦机器故障数据就会丢失.

Note: Make sure you start your volumes before you try to mount them or else client operations after the mount will hang.

# gluster volume create  [transport tcp | rdma | tcp,rdma]
# If the transport type is not specified, tcp is used as the default.
$ gluster volume create test-volume server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
$ gluster volume info
Volume Name: test-volume
Type: Distribute
Status: Created
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: server1:/exp1
Brick2: server2:/exp2
Brick3: server3:/exp3
Brick4: server4:/exp4

Replicated

优点

数据有冗余,数据丢失依然可用.
缺点

存储空间消耗较多

Note:

Make sure you start your volumes before you try to mount them or else client operations after the mount will hang.

GlusterFS will fail to create a replicate volume if more than one brick of a replica set is present on the same peer. For eg. a four node replicated volume where more than one brick of a replica set is present on the same peer.

☆ 这种Volume类型不能指定同一台机器.如

1
2

$ gluster volume create <volname> replica 4 server1:/brick1 server1:/brick2 server2:/brick3 server4:/brick4
volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior.

这里指定了server1:/brick1和server1/brick2,所以报错,不能同时冗余一份到同一台机器.

1
2
3

# gluster volume create  [replica ] [transport tcp | rdma | tcp,rdma]
# transport type is not specified, tcp is used as the default. 
$ gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2

Dispersed

优点

同replica一样,数据高可用,而且数据存储量更少.
缺点

暂未发现

○ Dispersed volumes are based on erasure codes.

☆ 基于纠偏码算法.不同于replica,冗余数据量大幅减少的情况下,依然做到数据高可用.

分布式系统中,为了保证数据高可用,一般选择副本数为3,这个可靠性的预期大约是11个9以上(99.999999999%的概率不丢数据).这里有业界报告来支撑这个数值.backblaze发布的硬盘故障率统计

# 正常可以不用指定redundancy,在创建时会提示指定数量.
# gluster volume create [disperse [<count>]] [redundancy <count>] [transport tcp | rdma | tcp,rdma]
# 如下:提示使用redundancy.
$ gluster volume create test-volume disperse 4 server{1..4}:/bricks/test-volume
There isn't an optimal redundancy value for this configuration. Do you want to create the volume with redundancy 1 ? (y/n)

Distributed Replicated

优点

数据冗余,数据丢失后依然可用.
缺点

于Replica不同的是,冗余的数据存在随机性,不便于管理.另外,存储空间消耗大.

Note: - Make sure you start your volumes before you try to mount them or else client operations after the mount will hang.

GlusterFS will fail to create a distribute replicate volume if more than one brick of a replica set is present on the same peer. For eg. for a four node distribute (replicated) volume where more than one brick of a replica set is present on the same peer.

☆ 这种类型的Volume,同样不能指定同一台机器,不然报错.

1
2

$ gluster volume create <volname> replica 4 server1:/brick1 server1:/brick2 server2:/brick3 server4:/brick4
volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior.

这里指定了server1:/brick1和server1/brick2,所以报错,不能同时冗余一份到同一台机器.

从执行指令上看,Relicated和Distributed Relicated是相同的.只是后面的节点数量多于replica的数量.

# gluster volume create [replica ] [transport tcp | rdma | tcp,rdma]
# 这个执行结果结果就是上图所示.
$ gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
# 如果指定6个节点,replica为2,则会随机选取2个节点存储数据,1个节点存储原数据,1个节点冗余数据.
$ gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6

Distributed Dispersed

Gluster安装

发表于 2018-08-20 | 分类于 kubernetes

字数统计: 854 | 阅读时长 ≈ 4

Environment

Centos 7
Gluster 4.1

使用三台机器

192.168.1.100 server1
192.168.1.101 server2
192.168.1.102 server3

Foreward

kubernetes的持久化需要使用gluster.这里基于Centos提供的Quick-Start教程验证后的记录.

Install

Prepare

DNS和NTP

DNS如果没有特殊要求,采用默认配置即可,每台服务器对时间进行校对和统一时区.

Hosts Set

建议使用服务器名的方式管理gluster,则需要配置ip和名称的映射.每台机器都需要配置.

$ vim /etc/hosts
192.168.1.100 server1
192.168.1.101 server2
192.168.1.102 server3

Add Yum Repository

添加gluster下载仓库

1	$ yum install centos-release-gluster

Use XFS

推荐使用XFS文件系统,这里我还是使用exts4,没有重新格式化文件系统.格式化参考教程网上很多.

Install Gluster And Start

1 2	$ yum install glusterfs-server $ systemctl enable glusterd && systemctl start glusterd

Set Firewalld

○ By default, glusterd will listen on tcp/24007. But each time you add a brick, it will open a new port (that you’ll be able to see with “gluster volume status”)

☆ 默认是24007端口,但是每增加brick都会新增监听端口,具体端口可以通过gluster volume status查看.不过在配置防火墙时,我使用授权ip方式.

# 这里的TCP Port指定的49152就是需要开放的端口.
$ gluster volume status
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick server1:/bricks/brick1/gv0            49152     0          Y       6566
Brick server2:/bricks/brick1/gv0            49152     0          Y       26880
Self-heal Daemon on localhost               N/A       N/A        Y       6590
Self-heal Daemon on server2                 N/A       N/A        Y       26903

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

centos7默认使用firewalld,

1
2
3

# 允许指定ip.我使用三台机器做测试,每台机器要配置另外两台服务器的ip.
$ firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='192.168.1.101' accept"
$ firewall-cmd --reload

Set Trusted Pool

这里使用三台机器,可以在任意一台服务器,将另外两台服务器添加到Trust Pool即可.

如在server1上将server2和server3添加到Trust Pool.

1
2
3

# 注:/etc/hosts需要配置映射.
$ gluster peer probe server2
$ gluster peer probe server3

Create a Volume

在三台服务器上分别创建/bricks/brick1/gv0目录

1 2	# 官网推荐brick命名方式:/data/glusterfs/<volume>/<brick>/brick $ mkdir -p /bricks/brick1/gv0

挂载目录到gluster上.(任意节点执行命令)

1
2
3

# 这里选择的replicas模式,保证数据丢失,其他模式不在这里讨论.
$ gluster volume create gv0 replica 3 server1:/bricks/brick1/gv0 server2:/bricks/brick1/gv0 server3:/bricks/brick1/gv0
$ gluster volume start gv0

查看volume信息

$ gluster volume info
Volume Name: gv0
Type: Replicate
Volume ID: 5835a014-b598-467d-ba34-3301d6730d6f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: server1:/bricks/brick1/gv0
Brick2: server2:/bricks/brick1/gv0
Brick3: server3:/bricks/brick1/gv0
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Testing

# 将server1:/gv0 挂在到/mnt目录
$ mount -t glusterfs server1:/gv0 /mnt
# 生成100个copy-test的文件
$ for i in `seq -w 1 100`; do cp -rp /var/log/messages /mnt/copy-test-$i; done
# 在三台机器上查看是否都有100个文件.
$ ls -lA /bricks/brick1/gv0

总结

推荐部署机器为奇数,不然会出现脑裂现象.(猜测使用了poxis算法)
gluster推荐使用xfs文件系统,centos/Red Hat Enterprise Linux 7默认使用,磁盘格式化时可能没有指定.

拓展

参考

gluster-Quickstart

溏溏爸爸

向终身学习者致敬

GitHub