ELK日志监控系统部署

一、安装软件

1.1 集群所需软件列表
OS:Centos 7.2
host: 172.31.8.8、172.31.8.13、172.31.8.107、172.31.8.75、172.31.8.11(五台)
Elasticsearch:elasticsearch-6.2.2.tar.gz
Kibana: kibana-6.2.2-linux-x86_64.tar.gz
Logstash:logstash-6.2.2.tar.xz
redis:redis-5.0.5.tar.gz
JDK:jdk-8u51-linux-x64.tar.gz
Nginx: 
安装目录:/software/
1.2 架构图(草图)

该图仅供参考,如果要求高可用的话logstash-server、redis需要分别安装在多台服务器上且logstash-server、redis都需要至少两台服务器(后端监控的web应用不一定都是nginx,也可能是jboss、tomcat等其他web应用)
ELK监控架构图.png
如果方案没有改动的话我会按照架构图上设计的方案去部署,否则我可能会把Elasticsearch、kibana、redis部署在一台服务器上

目前elasticsearch、logstash、kibana最高版本已经达到7.2.*,安装7版本需要jdk1.9支持,否则程序无法启动并会报错:

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.

二、安装

2.1 安装Elasticsearch
2.1.1 修改环境参数:

配置线程个数。修改配置文件/etc/security/limits.conf,增加配置

*                hard    nofile          65536
*                soft    nofile          65536

*                soft    nproc           2048
*                hard    nproc           4096

修改/etc/sysctl.conf文件,增加配置:

vim /etc/sysctl.conf

vm.max_map_count=262144

执行 sysctl -p 命令,是配置生效

2.1.2 添加普通用户
groupadd elsearch   --- 添加elsearch组
useradd elsearch -g elsearch  ---添加elsearch用户,并加入elsearch组
2.1.3 修改Elasticsearch配置文件:
vim /software/elasticsearch-6.2.2/config/elasticsearch.yml --- 修改以下参数

cluster.name: es-cluster   --- 集群名称
node.name: master    --- Elasticsearch主节点写为master,备节点写为slave
path.data: /software/elasticsearch-6.2.2/data    --- 数据存储目录
path.logs: /software/elasticsearch-6.2.2/logs    --- 程序日志存储目录
network.host: 172.31.8.8      --- 可写为本机IP或者0.0.0.0
http.port: 9200       --- 默认端口9200,打开注释即可
discovery.zen.ping.unicast.hosts: ["172.31.8.8", "172.31.8.13"]     --- 集群主机IP
2.1.4 修改java环境变量
vim /software/elasticsearch-6.2.2/bin/elasticsearch-env  --- 在头部添加java环境变量

#!/bin/bash
JAVA_HOME=/software/jdk1.8.0_51
JRE_HOME=/software/jdk1.8.0_51/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH    
2.1.5 修改程序权限(Elasticsearch不能使用root权限启动,只能使用普通用户)
chown -R  elsearch.elsearch elasticsearch-6.2.2/
2.1.6 启动服务
su - elsearch
/software/elasticsearch-6.2.2/bin/elasticsearch -d     --- -d参数指定程序在后台运行
访问:
http://IPaddr:9200 

master

master.png

slave
slve.png
查看集群状态

http://172.31.8.8:9200/_cat/health?v
集群状态.png

2.1.7 集群状态相关参数说明
URL中_cat表示查看信息,health表明返回的信息为集群健康信息,?v表示返回的信息加上头信息,跟返回JSON信息加上?pretty同理,就是为了获得更直观的信息,当然,你也可以不加,不要头信息,特别是通过代码获取返回信息进行解释,头信息有时候不需要,写shell脚本也一样,经常要去除一些多余的信息。
通过这个链接会返回下面的信息,下面的信息包括:

集群的状态(status):red红表示集群不可用,有故障。yellow黄表示集群不可靠但可用,一般单节点时就是此状态。green正常状态,表示集群一切正常。

节点数(node.total):节点数,这里是2,表示该集群有两个节点。

数据节点数(node.data):存储数据的节点数,这里是2。数据节点在Elasticsearch概念介绍有。

分片数(shards):这是 0,表示我们把数据分成多少块存储。

主分片数(pri):primary shards,这里是6,实际上是分片数的两倍,因为有一个副本,如果有两个副本,这里的数量应该是分片数的三倍,这个会跟后面的索引分片数对应起来,这里只是个总数。

激活的分片百分比(active_shards_percent):这里可以理解为加载的数据分片数,只有加载所有的分片数,集群才算正常启动,在启动的过程中,如果我们不断刷新这个页面,我们会发现这个百分比会不断加大。
2.1.8 安装elasticsearch-head 插件

因为head是一个用于管理Elasticsearch的web前端插件,该插件在es5版本以后采用独立服务的形式进行安装使用(之前的版本可以直接在es安装目录中直接安装),因此需要安装nodejs、npm

yum -y install nodejs npm

如果没有安装git,还需要先安装git:

yum -y install git

然后安装elasticsearch-head插件:

git clone https://github.com/mobz/elasticsearch-head.git

git下载完成后,进入目录,进行操作:

cd elasticsearch-head/
执行npm install 命令, 执行该命名可能会出现以下错误:
npm ERR! phantomjs-prebuilt@2.1.16 install: `node install.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the phantomjs-prebuilt@2.1.16 install script 'node install.js'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the phantomjs-prebuilt package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node install.js
npm ERR! You can get information on how to open an issue for this project with:
npm ERR!     npm bugs phantomjs-prebuilt
npm ERR! Or if that isn't available, you can get their info via:
npm ERR!     npm owner ls phantomjs-prebuilt
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR!     /software/elasticsearch-6.2.2/elasticsearch-head/npm-debug.log

此时忽略phantomjs-prebuilt@2.1.16,执行命令如下
npm install phantomjs-prebuilt@2.1.16 --ignore-scripts

然后执行:
npm install

npm WARN deprecated coffee-script@1.10.0: CoffeeScript on NPM has moved to "coffeescript" (no hyphen)
npm WARN deprecated http2@3.3.7: Use the built-in module in node 9.0.0 or newer, instead
npm WARN deprecated phantomjs-prebuilt@2.1.16: this package is now deprecated
npm WARN deprecated json3@3.2.6: Please use the native JSON object instead of JSON 3
npm WARN deprecated json3@3.3.2: Please use the native JSON object instead of JSON 3
npm WARN prefer global coffee-script@1.10.0 should be installed with -g

> phantomjs-prebuilt@2.1.16 install /software/elasticsearch-6.2.2/elasticsearch-head/node_modules/phantomjs-prebuilt
> node install.js

PhantomJS not found on PATH
Downloading https://github.com/Medium/phantomjs/releases/download/v2.1.1/phantomjs-2.1.1-linux-x86_64.tar.bz2
Saving to /tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2
Receiving...
[=======---------------------------------] 19%

插件安装相对会慢一些。。。

配置插件:

停止elasticsearch

ps -ef | grep java | grep elsearch
kill -9 PID

修改:

vim /software/elasticsearch-6.2.2/config/elasticsearch.yml
添加以下参数:
http.cors.enabled: true
http.cors.allow-origin: "*"

启动elasticsearch

/software/elasticsearch-6.2.2/bin/elasticsearch -d

启动elasticsearch-head 插件(后台运行)

nohup npm run start &
[1] 11047
nohup: 忽略输入并把输出追加到"/home/elsearch/nohup.out"
netstat -anlp | grep 9100
tcp        0      0 0.0.0.0:9100            0.0.0.0:*               LISTEN      11058/grunt 

使用浏览器访问插件并与ES进行交互

master
eshead.png
slave
head.png

2.2 安装kibana
2.2.1 修改配置文件
tar xf kibana-6.2.2-linux-x86_64.tar.gz
cd kibana-6.2.2-linux-x86_64

vim /software/kibana-6.2.2-linux-x86_64/config/kibana.yml

server.port: 5601
server.host: "172.31.8.8"
elasticsearch.url: "http://172.31.8.8:9200"     --- 这个写的就是本机安装的Elasticsearch,只能写一个地址,目前还不支持写多个节点。如果想要对接Elasticsearch集群就需要搭建一个只能用来进行协调的Elasticsearch节点,这个节点不参与主节点选举、不存储数据。
只是用来处理传入的HTTP请求,并将操作重定向到集群中的其他Elasticsearch节点,然后收集并返回结果。这个“协调”节点本质上也起了一个负载均衡的作用。
2.2.2 Kibana启动脚本配置
#/bin/sh
RETVAL=
PID=`ps -ef | grep "kibana.yml" | awk -F ' ' '{print $2}'`
echo $PID
KIBANA_DIR=/software/kibana-6.2.2-linux-x86_64
KIBANA=$KIBANA_DIR/bin/kibana
PROG=$(basename $KIBANA)
CONF=$KIBANA_DIR/config/kibana.yml
if [ ! -x $KIBANA ]; then
        echo -n $"$KIBANA not exist.";warning;echo
        exit 0
fi

start(){
        echo -n $"Starting $PROG: "
        nohup $KIBANA >/dev/null 2>&1 &
        RETVAL=$?
        if [ $RETVAL -eq 0 ]; then
                echo "start OK"
        else
                echo "start failure"
        fi
        return $RETVAL
}

stop(){
        echo -n $"Stopping $PROG: "
        kill -TERM $PID >/dev/null 2>&1
        RETVAL=$?
        echo "stop OK"
        return $RETVAL
}

restart(){
        stop
        sleep 2
        start
}

case "$1" in
        start)
        start
        ;;
        stop)
        stop
        ;;
        restart)
        restart
        ;;
        status)
        ps -ef|grep $PID|grep kibana
        RETVAL=$?
        ;;
        *)
        echo $"Usage: $0 {start|stop|status|restart}"
        RETVAL=1
esac
exit $RETVAL
2.2.3 启动Kibana
 ./kibana.sh start

Starting kibana: start OK

访问:http://172.31.8.8:5601
kibana.png

2.3 redis 安装
 wget http://45.252.224.74/files/503000000DD76BB8/download.redis.io/releases/redis-5.0.5.tar.gz

cd /software/ &&  tar xf redis-5.0.5.tar.gz && mkdir redis 
cd redis-5.0.5

make && cd src/
make install PREFIX=/software/redis/     -- 指定redis安装目录为/software/redis/ 

cd ../ && mkdir /software/conf && cp redis.conf /software/redis/conf/
vim /software/redis/conf/redis.conf
修改以下参数:

bind 172.31.8.107      --- 将这里的127.0.0.1改为172.31.8.107,否则只能连接127.0.0.1本地回环地址,无法远程连接
protected-mode yes   改为 protected-mode no  --- yes改为no,目的是为了解决安全模式引起的报错
port 6379    --- 打开注释
daemonize no  改为 daemonize yes   --- no改为yes,目的是为了设置后台运行
pidfile /software/redis/redis.pid   --- 设置redis.pid 文件存储目录
logfile "/software/redis/logs/redis.log"    --- 设置redis.log 文件存储目录

安装测试:

/software/redis/bin/redis-cli -h 172.31.8.107 -p 6379
如果出现如下,则表明连接成功
172.31.8.107:6379>

res.png

2.4 logstash-server 安装
2.4.1 编辑配置文件:
vim /software/logstash-6.2.2/config/logstash.yml

修改参数:
node.name: logstash-server    -- 设置节点名称,一般为主机名
path.data: /software/logstash-6.2.2/data   --- 设置logstash 和插件使用的持久化目录
config.reload.automatic: true    --- 开启配置文件自动加载
config.reload.interval: 10s      --- 定义配置文件重载时间周期
http.host: "172.31.8.107"        --- 定义访问主机名,一般为域名或IP
http.port: 9600-9700             --- 打开logstash 端口注释

vim /software/logstash-6.2.2/config/logstash_server.conf

    input {
    redis {
        port => "6379"
        host => "127.0.0.1"
        data_type => "list"
        batch_count => "1"
        key => "nginx-accesslog"
}
}

filter {
        grok {
                match => { "message" => "%{COMBINEDAPACHELOG}" }
        }
}

output {
        elasticsearch {
                hosts => ["172.31.8.8:9200"]
                index => "nginx-accesslog-%{+YYYY.MM.dd}"
        }
}
2.4.2 编写logstash 启动脚本
#/bin/sh
RETVAL=
PID=`ps -ef | grep java | grep "logstash_server\.conf" | awk -F ' ' '{print $2}'`
LOGSTASH_DIR=/software/logstash-6.2.2
LOGSTASH=$LOGSTASH_DIR/bin/logstash
PROG=$(basename $LOGSTASH)
CONF=$LOGSTASH_DIR/config/logstash_server.conf
LOG=$LOGSTASH_DIR/logs/logstash.log

if [ ! -x $LOGSTASH ]; then
    echo -n $"$LOGSTASH not exist.";warning;echo
    exit 0
fi
start(){
    echo -n $"Starting $PROG: "
    nohup $LOGSTASH --config $CONF --log $LOG >/dev/null 2>&1 &
    RETVAL=$?
    if [ $RETVAL -eq 0 ]; then
    echo "start OK"
    else
        echo "start failure"
    fi
    return $RETVAL
}
stop(){
    echo -n $"Stopping $PROG: "
    kill -TERM $PID >/dev/null 2>&1
    RETVAL=$?
    echo "stop OK"
    return $RETVAL
}
restart(){
    stop
    sleep 2
    start
}
case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    restart)
        restart
        ;;
    status)
        ps -ef|grep $PID|grep logstash_server\.conf
        RETVAL=$?
        ;;
    *)
        echo $"Usage: $0 {start|stop|status|restart}"
        RETVAL=1
esac
exit $RETVAL
2.4.3 测试启动脚本

测试.png

2.4.4 logstash-server 调试

停止logstash-server

/software/logstash-6.2.2/logstash.sh stop

编辑配置文件

vim /software/logstash-6.2.2/config/logstash_server.conf 
修改为以下参数:

input {
    redis {
        port => "6379"
        host => "127.0.0.1"
        data_type => "list"
        key => "nginx-access"
        db => "0"
        codec => "json"
       }
}

output {
        elasticsearch {
                hosts => ["172.31.8.8:9200","172.31.8.13:9200"]
                index => "nginx-access-%{+YYYY.MM.dd}"
    }
}

修改logstash-server JVM

vim /software/logstash-6.2.2/config/jvm.options
-Xms1g    改为    -Xms500m      -- 根据自己的实际情况
-Xmx1g    改为    -Xmx500m      -- 根据自己的实际情况

目前我这个日志数据比较少,使用500M内存足够

验证配置是否正确

/software/logstash-6.2.2/bin/logstash -f /software/logstash-6.2.2/config/logstash_server.conf  -t
Sending Logstash's logs to /software/logstash-6.2.2/logs which is now configured via log4j2.properties
[INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/software/logstash-6.2.2/modules/fb_apache/configuration"}
[INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/software/logstash-6.2.2/modules/netflow/configuration"}
[WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ][logstash.config.source.local.configpathloader] No config files found in path {:path=>"/software/logstash-6.2.2/config/logstash"}
[ERROR][logstash.config.sourceloader] No configuration found in the configured sources.
Configuration OK
[INFO ][logstash.runner          ] Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash

启动logstash
12.png
程序已经正常运行

2.5 安装nginx

nginx 安装过程参考,点击传送

2.6 安装logstash-agent
2.6.1 修改配置文件
vim /software/logstash-6.2.2/config/logstash.yml

修改参数:
node.name: logstash-server    -- 设置节点名称,一般为主机名
path.data: /software/logstash-6.2.2/data   --- 设置logstash 和插件使用的持久化目录
config.reload.automatic: true    --- 开启配置文件自动加载
config.reload.interval: 10s      --- 定义配置文件重载时间周期
http.host: "172.31.8.75"        --- 定义访问主机名,一般为域名或IP
http.port: 9600-9700             --- 打开logstash 端口注释
2.6.2 新建程序启动文件:
vim /software/logstash-6.2.2/config/logstash-nginx.conf
写入以下内容:

input {
      file {
     type => "nginx-access"
          path => ["/software/nginx/logs/172.31.8.75_json_access*"]
   }

   file {
     type => "nginx-error"
          path => "/software/nginx/logs/nginx_error.log"
   }
}

# output to redis
output {
       if [type] == "nginx-access" {
                 redis {
         host => "172.31.8.107"
         port => "6379"
         db => "0"
         data_type => "list"
         key => "nginx-access"
        }
     }
  }
2.6.3 编辑 修改logstash-agent JVM
-Xms1g    改为    -Xms256m      -- 根据自己的实际情况
-Xmx1g    改为    -Xmx256m      -- 根据自己的实际情况
2.6.4 配置logstash-agent java环境变量
vim /software/logstash-6.2.2/bin/logstash

插入以下内容:

JAVA_HOME=/software/jdk1.8.0_51
JRE_HOME=/software/jdk1.8.0_51/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH
2.6.5 同样使用以下命令验证配置文件
/software/logstash-6.2.2/bin/logstash -f /software/logstash-6.2.2/config/logstash-nginx.conf -t
2.6.6 验证正常后启动logstash服务 (另一个节点操作同样)
nohup /software/logstash-6.2.2/bin/logstash -f /software/logstash-6.2.2/config/logstash-nginx.conf &

三、配置ELK监控

3.1 登陆redis,验证
/software/redis/bin/redis-cli -h 172.31.8.107 -p 6379
172.31.8.107:6379> keys *
1) "nginx-access"       --- 数据已经传输到redis
3.2 打开elasticsearch-head
http://172.31.8.8:9100

eshead2.png
索引已经可以在elasticsearch上展示

3.3 打开kibana创建索引
http://172.31.8.8:5601

increate.png
increate2.png
创建完成.png
点击Discover
完成.png
数据已经可以正常展示

3.4 使用ab 压测工具,生成日志
3.4.1 安装
yum -y install httpd-tools
3.4.2 测试安装是否成功
ab -V

This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
3.4.2 ab 参数说明
ab --help

ab: wrong number of arguments
Usage: ab [options] [http[s]://]hostname[:port]/path
Options are:
    -n requests     Number of requests to perform
    -c concurrency  Number of multiple requests to make at a time
    -t timelimit    Seconds to max. to spend on benchmarking
                    This implies -n 50000
    -s timeout      Seconds to max. wait for each response
                    Default is 30 seconds
    -b windowsize   Size of TCP send/receive buffer, in bytes
    -B address      Address to bind to when making outgoing connections
    -p postfile     File containing data to POST. Remember also to set -T
    -u putfile      File containing data to PUT. Remember also to set -T
    -T content-type Content-type header to use for POST/PUT data, eg.
                    'application/x-www-form-urlencoded'
                    Default is 'text/plain'
    -v verbosity    How much troubleshooting info to print
    -w              Print out results in HTML tables
    -i              Use HEAD instead of GET
    -x attributes   String to insert as table attributes
    -y attributes   String to insert as tr attributes
    -z attributes   String to insert as td or th attributes
    -C attribute    Add cookie, eg. 'Apache=1234'. (repeatable)
    -H attribute    Add Arbitrary header line, eg. 'Accept-Encoding: gzip'
                    Inserted after all normal header lines. (repeatable)
    -A attribute    Add Basic WWW Authentication, the attributes
                    are a colon separated username and password.
    -P attribute    Add Basic Proxy Authentication, the attributes
                    are a colon separated username and password.
    -X proxy:port   Proxyserver and port number to use
    -V              Print version number and exit
    -k              Use HTTP KeepAlive feature
    -d              Do not show percentiles served table.
    -S              Do not show confidence estimators and warnings.
    -q              Do not show progress when doing more than 150 requests
    -g filename     Output collected data to gnuplot format file.
    -e filename     Output CSV file with percentages served
    -r              Don't exit on socket receive errors.
    -h              Display usage information (this message)
    -Z ciphersuite  Specify SSL/TLS cipher suite (See openssl ciphers)
    -f protocol     Specify SSL/TLS protocol
                    (SSL3, TLS1, TLS1.1, TLS1.2 or ALL)

ab的命令参数比较多,我们经常使用的是-c和-n参数。

ab -c 10 -n 100 http://172.31.8.75/

This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 172.31.8.75 (be patient).....done


Server Software:        nginx/1.8.1
Server Hostname:        172.31.8.75
Server Port:            80

Document Path:          /
Document Length:        612 bytes

Concurrency Level:      10
Time taken for tests:   0.013 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      84400 bytes
HTML transferred:       61200 bytes
Requests per second:    7569.45 [#/sec] (mean)
Time per request:       1.321 [ms] (mean)
Time per request:       0.132 [ms] (mean, across all concurrent requests)
Transfer rate:          6238.88 [Kbytes/sec] received

Connection Times (ms)
            min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:     0    1   0.2      1       1
Waiting:        0    1   0.2      1       1
Total:          1    1   0.1      1       1

Percentage of the requests served within a certain time (ms)
50%      1
66%      1
75%      1
80%      1
90%      1
95%      1
98%      1
99%      1
100%      1 (longest request)
-------------本文结束感谢您的阅读-------------
请我吃辣条
0%