一、安装软件
1.1 集群所需软件列表
OS:Centos 7.2
host: 172.31.8.8、172.31.8.13、172.31.8.107、172.31.8.75、172.31.8.11(五台)
Elasticsearch:elasticsearch-6.2.2.tar.gz
Kibana: kibana-6.2.2-linux-x86_64.tar.gz
Logstash:logstash-6.2.2.tar.xz
redis:redis-5.0.5.tar.gz
JDK:jdk-8u51-linux-x64.tar.gz
Nginx:
安装目录:/software/
1.2 架构图(草图)
该图仅供参考,如果要求高可用的话logstash-server、redis需要分别安装在多台服务器上且logstash-server、redis都需要至少两台服务器(后端监控的web应用不一定都是nginx,也可能是jboss、tomcat等其他web应用)
如果方案没有改动的话我会按照架构图上设计的方案去部署,否则我可能会把Elasticsearch、kibana、redis部署在一台服务器上
目前elasticsearch、logstash、kibana最高版本已经达到7.2.*,安装7版本需要jdk1.9支持,否则程序无法启动并会报错:
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
二、安装
2.1 安装Elasticsearch
2.1.1 修改环境参数:
配置线程个数。修改配置文件/etc/security/limits.conf,增加配置
* hard nofile 65536
* soft nofile 65536
* soft nproc 2048
* hard nproc 4096
修改/etc/sysctl.conf文件,增加配置:
vim /etc/sysctl.conf
vm.max_map_count=262144
执行 sysctl -p 命令,是配置生效
2.1.2 添加普通用户
groupadd elsearch --- 添加elsearch组
useradd elsearch -g elsearch ---添加elsearch用户,并加入elsearch组
2.1.3 修改Elasticsearch配置文件:
vim /software/elasticsearch-6.2.2/config/elasticsearch.yml --- 修改以下参数
cluster.name: es-cluster --- 集群名称
node.name: master --- Elasticsearch主节点写为master,备节点写为slave
path.data: /software/elasticsearch-6.2.2/data --- 数据存储目录
path.logs: /software/elasticsearch-6.2.2/logs --- 程序日志存储目录
network.host: 172.31.8.8 --- 可写为本机IP或者0.0.0.0
http.port: 9200 --- 默认端口9200,打开注释即可
discovery.zen.ping.unicast.hosts: ["172.31.8.8", "172.31.8.13"] --- 集群主机IP
2.1.4 修改java环境变量
vim /software/elasticsearch-6.2.2/bin/elasticsearch-env --- 在头部添加java环境变量
#!/bin/bash
JAVA_HOME=/software/jdk1.8.0_51
JRE_HOME=/software/jdk1.8.0_51/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH
2.1.5 修改程序权限(Elasticsearch不能使用root权限启动,只能使用普通用户)
chown -R elsearch.elsearch elasticsearch-6.2.2/
2.1.6 启动服务
su - elsearch
/software/elasticsearch-6.2.2/bin/elasticsearch -d --- -d参数指定程序在后台运行
访问:
http://IPaddr:9200
master
slave
查看集群状态
http://172.31.8.8:9200/_cat/health?v
2.1.7 集群状态相关参数说明
URL中_cat表示查看信息,health表明返回的信息为集群健康信息,?v表示返回的信息加上头信息,跟返回JSON信息加上?pretty同理,就是为了获得更直观的信息,当然,你也可以不加,不要头信息,特别是通过代码获取返回信息进行解释,头信息有时候不需要,写shell脚本也一样,经常要去除一些多余的信息。
通过这个链接会返回下面的信息,下面的信息包括:
集群的状态(status):red红表示集群不可用,有故障。yellow黄表示集群不可靠但可用,一般单节点时就是此状态。green正常状态,表示集群一切正常。
节点数(node.total):节点数,这里是2,表示该集群有两个节点。
数据节点数(node.data):存储数据的节点数,这里是2。数据节点在Elasticsearch概念介绍有。
分片数(shards):这是 0,表示我们把数据分成多少块存储。
主分片数(pri):primary shards,这里是6,实际上是分片数的两倍,因为有一个副本,如果有两个副本,这里的数量应该是分片数的三倍,这个会跟后面的索引分片数对应起来,这里只是个总数。
激活的分片百分比(active_shards_percent):这里可以理解为加载的数据分片数,只有加载所有的分片数,集群才算正常启动,在启动的过程中,如果我们不断刷新这个页面,我们会发现这个百分比会不断加大。
2.1.8 安装elasticsearch-head 插件
因为head是一个用于管理Elasticsearch的web前端插件,该插件在es5版本以后采用独立服务的形式进行安装使用(之前的版本可以直接在es安装目录中直接安装),因此需要安装nodejs、npm
yum -y install nodejs npm
如果没有安装git,还需要先安装git:
yum -y install git
然后安装elasticsearch-head插件:
git clone https://github.com/mobz/elasticsearch-head.git
git下载完成后,进入目录,进行操作:
cd elasticsearch-head/
执行npm install 命令, 执行该命名可能会出现以下错误:
npm ERR! phantomjs-prebuilt@2.1.16 install: `node install.js`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the phantomjs-prebuilt@2.1.16 install script 'node install.js'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the phantomjs-prebuilt package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! node install.js
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs phantomjs-prebuilt
npm ERR! Or if that isn't available, you can get their info via:
npm ERR! npm owner ls phantomjs-prebuilt
npm ERR! There is likely additional logging output above.
npm ERR! Please include the following file with any support request:
npm ERR! /software/elasticsearch-6.2.2/elasticsearch-head/npm-debug.log
此时忽略phantomjs-prebuilt@2.1.16,执行命令如下
npm install phantomjs-prebuilt@2.1.16 --ignore-scripts
然后执行:
npm install
npm WARN deprecated coffee-script@1.10.0: CoffeeScript on NPM has moved to "coffeescript" (no hyphen)
npm WARN deprecated http2@3.3.7: Use the built-in module in node 9.0.0 or newer, instead
npm WARN deprecated phantomjs-prebuilt@2.1.16: this package is now deprecated
npm WARN deprecated json3@3.2.6: Please use the native JSON object instead of JSON 3
npm WARN deprecated json3@3.3.2: Please use the native JSON object instead of JSON 3
npm WARN prefer global coffee-script@1.10.0 should be installed with -g
> phantomjs-prebuilt@2.1.16 install /software/elasticsearch-6.2.2/elasticsearch-head/node_modules/phantomjs-prebuilt
> node install.js
PhantomJS not found on PATH
Downloading https://github.com/Medium/phantomjs/releases/download/v2.1.1/phantomjs-2.1.1-linux-x86_64.tar.bz2
Saving to /tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2
Receiving...
[=======---------------------------------] 19%
插件安装相对会慢一些。。。
配置插件:
停止elasticsearch
ps -ef | grep java | grep elsearch
kill -9 PID
修改:
vim /software/elasticsearch-6.2.2/config/elasticsearch.yml
添加以下参数:
http.cors.enabled: true
http.cors.allow-origin: "*"
启动elasticsearch
/software/elasticsearch-6.2.2/bin/elasticsearch -d
启动elasticsearch-head 插件(后台运行)
nohup npm run start &
[1] 11047
nohup: 忽略输入并把输出追加到"/home/elsearch/nohup.out"
netstat -anlp | grep 9100
tcp 0 0 0.0.0.0:9100 0.0.0.0:* LISTEN 11058/grunt
使用浏览器访问插件并与ES进行交互
master
slave
2.2 安装kibana
2.2.1 修改配置文件
tar xf kibana-6.2.2-linux-x86_64.tar.gz
cd kibana-6.2.2-linux-x86_64
vim /software/kibana-6.2.2-linux-x86_64/config/kibana.yml
server.port: 5601
server.host: "172.31.8.8"
elasticsearch.url: "http://172.31.8.8:9200" --- 这个写的就是本机安装的Elasticsearch,只能写一个地址,目前还不支持写多个节点。如果想要对接Elasticsearch集群就需要搭建一个只能用来进行协调的Elasticsearch节点,这个节点不参与主节点选举、不存储数据。
只是用来处理传入的HTTP请求,并将操作重定向到集群中的其他Elasticsearch节点,然后收集并返回结果。这个“协调”节点本质上也起了一个负载均衡的作用。
2.2.2 Kibana启动脚本配置
#/bin/sh
RETVAL=
PID=`ps -ef | grep "kibana.yml" | awk -F ' ' '{print $2}'`
echo $PID
KIBANA_DIR=/software/kibana-6.2.2-linux-x86_64
KIBANA=$KIBANA_DIR/bin/kibana
PROG=$(basename $KIBANA)
CONF=$KIBANA_DIR/config/kibana.yml
if [ ! -x $KIBANA ]; then
echo -n $"$KIBANA not exist.";warning;echo
exit 0
fi
start(){
echo -n $"Starting $PROG: "
nohup $KIBANA >/dev/null 2>&1 &
RETVAL=$?
if [ $RETVAL -eq 0 ]; then
echo "start OK"
else
echo "start failure"
fi
return $RETVAL
}
stop(){
echo -n $"Stopping $PROG: "
kill -TERM $PID >/dev/null 2>&1
RETVAL=$?
echo "stop OK"
return $RETVAL
}
restart(){
stop
sleep 2
start
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
status)
ps -ef|grep $PID|grep kibana
RETVAL=$?
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
RETVAL=1
esac
exit $RETVAL
2.2.3 启动Kibana
./kibana.sh start
Starting kibana: start OK
2.3 redis 安装
wget http://45.252.224.74/files/503000000DD76BB8/download.redis.io/releases/redis-5.0.5.tar.gz
cd /software/ && tar xf redis-5.0.5.tar.gz && mkdir redis
cd redis-5.0.5
make && cd src/
make install PREFIX=/software/redis/ -- 指定redis安装目录为/software/redis/
cd ../ && mkdir /software/conf && cp redis.conf /software/redis/conf/
vim /software/redis/conf/redis.conf
修改以下参数:
bind 172.31.8.107 --- 将这里的127.0.0.1改为172.31.8.107,否则只能连接127.0.0.1本地回环地址,无法远程连接
protected-mode yes 改为 protected-mode no --- yes改为no,目的是为了解决安全模式引起的报错
port 6379 --- 打开注释
daemonize no 改为 daemonize yes --- no改为yes,目的是为了设置后台运行
pidfile /software/redis/redis.pid --- 设置redis.pid 文件存储目录
logfile "/software/redis/logs/redis.log" --- 设置redis.log 文件存储目录
安装测试:
/software/redis/bin/redis-cli -h 172.31.8.107 -p 6379
如果出现如下,则表明连接成功
172.31.8.107:6379>
2.4 logstash-server 安装
2.4.1 编辑配置文件:
vim /software/logstash-6.2.2/config/logstash.yml
修改参数:
node.name: logstash-server -- 设置节点名称,一般为主机名
path.data: /software/logstash-6.2.2/data --- 设置logstash 和插件使用的持久化目录
config.reload.automatic: true --- 开启配置文件自动加载
config.reload.interval: 10s --- 定义配置文件重载时间周期
http.host: "172.31.8.107" --- 定义访问主机名,一般为域名或IP
http.port: 9600-9700 --- 打开logstash 端口注释
vim /software/logstash-6.2.2/config/logstash_server.conf
input {
redis {
port => "6379"
host => "127.0.0.1"
data_type => "list"
batch_count => "1"
key => "nginx-accesslog"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
output {
elasticsearch {
hosts => ["172.31.8.8:9200"]
index => "nginx-accesslog-%{+YYYY.MM.dd}"
}
}
2.4.2 编写logstash 启动脚本
#/bin/sh
RETVAL=
PID=`ps -ef | grep java | grep "logstash_server\.conf" | awk -F ' ' '{print $2}'`
LOGSTASH_DIR=/software/logstash-6.2.2
LOGSTASH=$LOGSTASH_DIR/bin/logstash
PROG=$(basename $LOGSTASH)
CONF=$LOGSTASH_DIR/config/logstash_server.conf
LOG=$LOGSTASH_DIR/logs/logstash.log
if [ ! -x $LOGSTASH ]; then
echo -n $"$LOGSTASH not exist.";warning;echo
exit 0
fi
start(){
echo -n $"Starting $PROG: "
nohup $LOGSTASH --config $CONF --log $LOG >/dev/null 2>&1 &
RETVAL=$?
if [ $RETVAL -eq 0 ]; then
echo "start OK"
else
echo "start failure"
fi
return $RETVAL
}
stop(){
echo -n $"Stopping $PROG: "
kill -TERM $PID >/dev/null 2>&1
RETVAL=$?
echo "stop OK"
return $RETVAL
}
restart(){
stop
sleep 2
start
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
status)
ps -ef|grep $PID|grep logstash_server\.conf
RETVAL=$?
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
RETVAL=1
esac
exit $RETVAL
2.4.3 测试启动脚本
2.4.4 logstash-server 调试
停止logstash-server
/software/logstash-6.2.2/logstash.sh stop
编辑配置文件
vim /software/logstash-6.2.2/config/logstash_server.conf
修改为以下参数:
input {
redis {
port => "6379"
host => "127.0.0.1"
data_type => "list"
key => "nginx-access"
db => "0"
codec => "json"
}
}
output {
elasticsearch {
hosts => ["172.31.8.8:9200","172.31.8.13:9200"]
index => "nginx-access-%{+YYYY.MM.dd}"
}
}
修改logstash-server JVM
vim /software/logstash-6.2.2/config/jvm.options
-Xms1g 改为 -Xms500m -- 根据自己的实际情况
-Xmx1g 改为 -Xmx500m -- 根据自己的实际情况
目前我这个日志数据比较少,使用500M内存足够
验证配置是否正确
/software/logstash-6.2.2/bin/logstash -f /software/logstash-6.2.2/config/logstash_server.conf -t
Sending Logstash's logs to /software/logstash-6.2.2/logs which is now configured via log4j2.properties
[INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/software/logstash-6.2.2/modules/fb_apache/configuration"}
[INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/software/logstash-6.2.2/modules/netflow/configuration"}
[WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ][logstash.config.source.local.configpathloader] No config files found in path {:path=>"/software/logstash-6.2.2/config/logstash"}
[ERROR][logstash.config.sourceloader] No configuration found in the configured sources.
Configuration OK
[INFO ][logstash.runner ] Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash
启动logstash
程序已经正常运行
2.5 安装nginx
2.6 安装logstash-agent
2.6.1 修改配置文件
vim /software/logstash-6.2.2/config/logstash.yml
修改参数:
node.name: logstash-server -- 设置节点名称,一般为主机名
path.data: /software/logstash-6.2.2/data --- 设置logstash 和插件使用的持久化目录
config.reload.automatic: true --- 开启配置文件自动加载
config.reload.interval: 10s --- 定义配置文件重载时间周期
http.host: "172.31.8.75" --- 定义访问主机名,一般为域名或IP
http.port: 9600-9700 --- 打开logstash 端口注释
2.6.2 新建程序启动文件:
vim /software/logstash-6.2.2/config/logstash-nginx.conf
写入以下内容:
input {
file {
type => "nginx-access"
path => ["/software/nginx/logs/172.31.8.75_json_access*"]
}
file {
type => "nginx-error"
path => "/software/nginx/logs/nginx_error.log"
}
}
# output to redis
output {
if [type] == "nginx-access" {
redis {
host => "172.31.8.107"
port => "6379"
db => "0"
data_type => "list"
key => "nginx-access"
}
}
}
2.6.3 编辑 修改logstash-agent JVM
-Xms1g 改为 -Xms256m -- 根据自己的实际情况
-Xmx1g 改为 -Xmx256m -- 根据自己的实际情况
2.6.4 配置logstash-agent java环境变量
vim /software/logstash-6.2.2/bin/logstash
插入以下内容:
JAVA_HOME=/software/jdk1.8.0_51
JRE_HOME=/software/jdk1.8.0_51/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH
2.6.5 同样使用以下命令验证配置文件
/software/logstash-6.2.2/bin/logstash -f /software/logstash-6.2.2/config/logstash-nginx.conf -t
2.6.6 验证正常后启动logstash服务 (另一个节点操作同样)
nohup /software/logstash-6.2.2/bin/logstash -f /software/logstash-6.2.2/config/logstash-nginx.conf &
三、配置ELK监控
3.1 登陆redis,验证
/software/redis/bin/redis-cli -h 172.31.8.107 -p 6379
172.31.8.107:6379> keys *
1) "nginx-access" --- 数据已经传输到redis
3.2 打开elasticsearch-head
http://172.31.8.8:9100
索引已经可以在elasticsearch上展示
3.3 打开kibana创建索引
http://172.31.8.8:5601
点击Discover
数据已经可以正常展示
3.4 使用ab 压测工具,生成日志
3.4.1 安装
yum -y install httpd-tools
3.4.2 测试安装是否成功
ab -V
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
3.4.2 ab 参数说明
ab --help
ab: wrong number of arguments
Usage: ab [options] [http[s]://]hostname[:port]/path
Options are:
-n requests Number of requests to perform
-c concurrency Number of multiple requests to make at a time
-t timelimit Seconds to max. to spend on benchmarking
This implies -n 50000
-s timeout Seconds to max. wait for each response
Default is 30 seconds
-b windowsize Size of TCP send/receive buffer, in bytes
-B address Address to bind to when making outgoing connections
-p postfile File containing data to POST. Remember also to set -T
-u putfile File containing data to PUT. Remember also to set -T
-T content-type Content-type header to use for POST/PUT data, eg.
'application/x-www-form-urlencoded'
Default is 'text/plain'
-v verbosity How much troubleshooting info to print
-w Print out results in HTML tables
-i Use HEAD instead of GET
-x attributes String to insert as table attributes
-y attributes String to insert as tr attributes
-z attributes String to insert as td or th attributes
-C attribute Add cookie, eg. 'Apache=1234'. (repeatable)
-H attribute Add Arbitrary header line, eg. 'Accept-Encoding: gzip'
Inserted after all normal header lines. (repeatable)
-A attribute Add Basic WWW Authentication, the attributes
are a colon separated username and password.
-P attribute Add Basic Proxy Authentication, the attributes
are a colon separated username and password.
-X proxy:port Proxyserver and port number to use
-V Print version number and exit
-k Use HTTP KeepAlive feature
-d Do not show percentiles served table.
-S Do not show confidence estimators and warnings.
-q Do not show progress when doing more than 150 requests
-g filename Output collected data to gnuplot format file.
-e filename Output CSV file with percentages served
-r Don't exit on socket receive errors.
-h Display usage information (this message)
-Z ciphersuite Specify SSL/TLS cipher suite (See openssl ciphers)
-f protocol Specify SSL/TLS protocol
(SSL3, TLS1, TLS1.1, TLS1.2 or ALL)
ab的命令参数比较多,我们经常使用的是-c和-n参数。
ab -c 10 -n 100 http://172.31.8.75/
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 172.31.8.75 (be patient).....done
Server Software: nginx/1.8.1
Server Hostname: 172.31.8.75
Server Port: 80
Document Path: /
Document Length: 612 bytes
Concurrency Level: 10
Time taken for tests: 0.013 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 84400 bytes
HTML transferred: 61200 bytes
Requests per second: 7569.45 [#/sec] (mean)
Time per request: 1.321 [ms] (mean)
Time per request: 0.132 [ms] (mean, across all concurrent requests)
Transfer rate: 6238.88 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 0
Processing: 0 1 0.2 1 1
Waiting: 0 1 0.2 1 1
Total: 1 1 0.1 1 1
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 1
98% 1
99% 1
100% 1 (longest request)