Pak DBA: Configuring Logstash with Elasticsearch

Introduction

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash can dynamically unify data from disparate sources and normalize the data into destinations of your choice. Cleanse and democratize all your data for diverse advanced downstream analytics and visualization use cases. You can clean and transform your data during ingestion to gain near real-time insights immediately at index or output time. Logstash comes out-of-box with many aggregations and mutations along with pattern matching, geo mapping, and dynamic lookup capabilities.

Logstash provided Grok which is a great way to parse unstructured log data into something structured and queryable. This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption.

The logstash agent processes pipeline with 3 stages: inputs → filters → outputs. Inputs generate events (having properties), filters modify them, outputs ship them elsewhere.
Installation

Logstash requires Java 8. Download the Logstash installation file that matches your host environment. Unpack the file and it is ready to work with.

Test the Installation

Test your Logstash installation by running the most basic Logstash pipeline. A Logstash pipeline has two required elements, input and output, and one optional element, filter. The input plugins consume data from a source, the filter plugins modify the data as you specify, and the output plugins write the data to a destination. You can set the OS environment variables for your installation.

export LS_HOME=/usr/hadoopsw/elk/logstash-6.2.3
export PATH=$PATH:$LS_HOME/bin

[hdpsysuser@hdpmaster bin]$ logstash -e 'input { stdin { } } output { stdout {} }'

Sending Logstash's logs to /usr/hadoopsw/elk/logstash-6.2.3/logs which is now configured via log4j2.properties

[2018-04-23T11:15:04,818][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/hadoopsw/elk/logstash-6.2.3/modules/fb_apache/configuration"}

[2018-04-23T11:15:04,943][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/hadoopsw/elk/logstash-6.2.3/modules/netflow/configuration"}

[2018-04-23T11:15:07,951][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified

[2018-04-23T11:15:12,524][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.2.3"}

[2018-04-23T11:15:14,563][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

[2018-04-23T11:15:22,118][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}

[2018-04-23T11:15:22,580][INFO ][logstash.pipeline ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#"}

The stdin plugin is now waiting for input:

[2018-04-23T11:15:23,385][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}

2018-04-23T11:15:45.166Z hdpmaster Hi

The -e flag enables you to specify a configuration directly from the command line.

After starting Logstash, wait until you see "Pipeline main started" and then enter "Hi"

Logstash adds timestamp and IP address information to the message. Exit Logstash by issuing a CTRL-D command in the shell where Logstash is running.

Logstash Pipeline

Before you create the Logstash pipeline, you’ll configure Filebeat to send log lines to Logstash. The Filebeat client , designed for reliability and low latency, is a lightweight, resource-friendly tool that collects logs from files on the server and forwards these logs to your Logstash instance for processing. In a typical use case, Filebeat runs on a separate machine from the machine running your Logstash instance. The default Logstash installation includes the Beats input plugin.

To install Filebeat on your data source machine, download the appropriate package from the Filebeat product page https://www.elastic.co/downloads/beats/filebeat and upack.

export FB_HOME=/usr/hadoopsw/elk/filebeat-6.2.4-linux-x86_64

Configure Filebeat

After installing Filebeat, you need to configure it. Open the filebeat.yml file located in your

Filebeat installation directory, and replace the contents with the following lines.

Make sure paths points to the example Apache log file, logstash-tutorial.log which can be downloaded from https://download.elastic.co/demos/logstash/gettingstarted/logstash-tutorial.log.gz

filebeat.yml

filebeat.prospectors:

- type: log

paths:

- /usr/hadoopsw/mydata/elkdata/logstash-tutorial-dataset

output.logstash:

hosts: ["localhost:5044"]

At the data source machine, run Filebeat with the following command:

sudo ./filebeat -e -c filebeat.yml -d "publish"

[hdpsysuser@hdpmaster filebeat-6.2.4-linux-x86_64]$ ./filebeat -e -c filebeat.yml -d "publish"

2018-04-23T12:14:10.757Z INFO instance/beat.go:468 Home path: [/usr/hadoopsw/elk/filebeat-6.2.4-linux-x86_64] Config path: [/usr/hadoopsw/elk/filebeat-6.2.4-linux-x86_64] Data path: [/usr/hadoopsw/elk/filebeat-6.2.4-linux-x86_64/data] Logs path: [/usr/hadoopsw/elk/filebeat-6.2.4-linux-x86_64/logs]

2018-04-23T12:14:10.758Z INFO instance/beat.go:475 Beat UUID: 97f293dd-9903-4644-8892-6bb4d9e8999b

2018-04-23T12:14:10.758Z INFO instance/beat.go:213 Setup Beat: filebeat; Version: 6.2.4

2018-04-23T12:14:10.761Z INFO pipeline/module.go:76 Beat name: hdpmaster

2018-04-23T12:14:10.764Z INFO [monitoring] log/log.go:97 Starting metrics logging every 30s

2018-04-23T12:14:10.764Z INFO instance/beat.go:301 filebeat start running.

2018-04-23T12:14:10.765Z INFO registrar/registrar.go:73 No registry file found under: /usr/hadoopsw/elk/filebeat-6.2.4-linux-x86_64/data/registry. Creating a new registry file.

2018-04-23T12:14:10.869Z INFO registrar/registrar.go:110 Loading registrar data from /usr/hadoopsw/elk/filebeat-6.2.4-linux-x86_64/data/registry

2018-04-23T12:14:10.870Z INFO registrar/registrar.go:121 States Loaded from registrar: 0

2018-04-23T12:14:10.870Z WARN beater/filebeat.go:261 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.

2018-04-23T12:14:10.870Z INFO crawler/crawler.go:48 Loading Prospectors: 1

2018-04-23T12:14:10.871Z INFO log/prospector.go:111 Configured paths: [/path/to/file/logstash-tutorial.log]

2018-04-23T12:14:10.872Z INFO crawler/crawler.go:82 Loading and starting Prospectors completed. Enabled prospectors: 1

Filebeat will attempt to connect on port 5044. Until Logstash starts with an active Beats plugin, there won’t be any answer on that port, so any messages you see regarding failure to connect on that port are normal for now.

Configure Logstash for Filebeat Input

Create a Logstash configuration pipeline that uses the Beats input plugin to receive events from Beats.

The following text represents the skeleton of a configuration pipeline:

# The # character at the beginning of a line indicates a comment. Use

# comments to describe your configuration.

input {

}

# The filter part of this file is commented out to indicate that it is

# optional.

# filter {

# }

output {

}

Configure your Logstash instance to use the Beats input plugin by adding the following lines to the input section of the first-pipeline.conf file:

vi /usr/hadoopsw/elk/logstash-6.2.3/config/first-pipeline.conf

input {

beats {

port => "5044"

}

# The filter part of this file is commented out to indicate that it is

# optional.

# filter {

# }

output {

stdout { codec => rubydebug }

}

We’ll configure Logstash to write to Elasticsearch later. For now, we add the line to the output section so that the output is printed to stdout when you run Logstash:

Verify Configuration

Verify your configuration, run the following command:

logstash -f /usr/hadoopsw/elk/logstash-6.2.3/config/first-pipeline.conf --config.test_and_exit

[hdpsysuser@hdpmaster ~]$ logstash -f $LS_HOME/config/first-pipeline.conf --config.test_and_exit

Sending Logstash's logs to /usr/hadoopsw/elk/logstash-6.2.3/logs which is now configured via log4j2.properties

[2018-04-23T12:55:09,135][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/hadoopsw/elk/logstash-6.2.3/modules/fb_apache/configuration"}

[2018-04-23T12:55:09,261][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/hadoopsw/elk/logstash-6.2.3/modules/netflow/configuration"}

[2018-04-23T12:55:11,904][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified

Configuration OK

[2018-04-23T12:55:25,976][INFO ][logstash.runner ] Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash

The --config.test_and_exit option parses your configuration file and reports any errors.

Start Logstash

If the configuration file passes the configuration test, start Logstash with the following command:

[hdpsysuser@hdpmaster ~]$ logstash -f $LS_HOME/config/first-pipeline.conf --config.reload.automatic

Sending Logstash's logs to /usr/hadoopsw/elk/logstash-6.2.3/logs which is now configured via log4j2.properties

[2018-04-23T13:02:20,437][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/hadoopsw/elk/logstash-6.2.3/modules/fb_apache/configuration"}

[2018-04-23T13:02:20,473][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/hadoopsw/elk/logstash-6.2.3/modules/netflow/configuration"}

[2018-04-23T13:02:21,322][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified

[2018-04-23T13:02:22,715][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.2.3"}

[2018-04-23T13:02:25,156][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

[2018-04-23T13:02:39,277][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}

[2018-04-23T13:02:40,391][INFO ][logstash.inputs.beats ] Beats inputs: Starting input listener {:address=>"0.0.0.0:5044"}

[2018-04-23T13:02:40,609][INFO ][logstash.pipeline ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#"}

[2018-04-23T13:02:40,853][INFO ][org.logstash.beats.Server] Starting server on port: 5044

[2018-04-23T13:02:41,071][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}

The --config.reload.automatic option enables automatic config reloading so that you don’t have to stop and restart Logstash every time you modify the configuration file.

If our pipeline is working correctly, we should see a series of events like the following written to the console:

{

"source" => "/usr/hadoopsw/mydata/elkdata/logstash-tutorial-dataset",

"offset" => 24464,

"beat" => {

"name" => "hdpmaster",

"version" => "6.2.4",

"hostname" => "hdpmaster"

"@timestamp" => 2018-04-23T13:34:17.165Z,

"host" => "hdpmaster",

"@version" => "1",

"prospector" => {

"type" => "log"

"tags" => [

[0] "beats_input_codec_plain_applied"

"message" => "86.1.76.62 - - [04/Jan/2015:05:30:37 +0000] \"GET /style2.css HTTP/1.1\" 200 4877 \"http://www.semicomplete.com/projects/xdotool/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\""

}

Parsing Web Logs with the Grok Filter Plugin

Parse the log messages to create specific, named fields from the logs. The grok filter plugin is one of several plugins that are available by default in Logstash. The grok filter plugin enables you to parse the unstructured log data into something structured and queryable.

Because the grok filter plugin looks for patterns in the incoming log data, configuring the plugin requires you to make decisions about how to identify the patterns that are of interest to your use case. A representative line from the web server log sample looks like this:

83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"

The IP address at the beginning of the line is easy to identify, as is the timestamp in brackets. To parse the data, you can use the %{COMBINEDAPACHELOG} grok pattern, which structures lines from the Apache log using the following schema:

Information	Field Name
IP Address	`clientip`
User ID	`ident`
User Authentication	`auth`
timestamp	`timestamp`
HTTP Verb	`verb`
Request body	`request`
HTTP Version	`httpversion`
HTTP Status Code	`response`
Bytes served	`bytes`
Referrer URL	`referrer`
User agent	`agent`

Edit the first-pipeline.conf file and replace the entire filter section with the following text:

first-pipeline.conf

input {

beats {

port => "5044"

}

filter {

grok {

match => { "message" => "%{COMBINEDAPACHELOG}"}

}

output {

stdout { codec => rubydebug }

}

Save changes. Because we’ve enabled automatic config reloading, we don’t have to restart Logstash to pick up our changes. However, we do need to force Filebeat to read the log file from scratch. To do this, go to the terminal window where Filebeat is running and press Ctrl+C to shut down Filebeat. Then delete the Filebeat registry file. Since Filebeat stores the state of each file it harvests in the registry, deleting the registry file forces Filebeat to read all the files it’s harvesting from scratch. Next, restart Filebeat.

After Logstash applies the grok pattern, the events will have the following JSON representation:

{

"message" => ".1.76.62 - - [04/Jan/2015:05:30:37 +0000] \"GET /MyURL-3 HTTP/1.1\" 200 4877 \"http://www.semicomplete.com/projects/xdotool/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\"",

"httpversion" => "1.1",

"agent" => "\"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\"",

"ident" => "-",

"beat" => {

"hostname" => "hdpmaster",

"name" => "hdpmaster",

"version" => "6.2.4"

"source" => "/usr/hadoopsw/mydata/elkdata/logstash-tutorial-dataset",

"verb" => "GET",

"timestamp" => "04/Jan/2015:05:30:37 +0000",

"auth" => "-",

"referrer" => "\"http://www.semicomplete.com/projects/xdotool/\"",

"@version" => "1",

"host" => "hdpmaster",

"offset" => 24672,

"response" => "200",

"prospector" => {

"type" => "log"

"request" => "/MyURL-3",

"bytes" => "4877",

"clientip" => "1.76.62",

"@timestamp" => 2018-04-23T14:05:20.967Z,

"tags" => [

[0] "beats_input_codec_plain_applied"

]

}

Notice that the event includes the original message, but the log message is also broken down into specific fields.

Enhancing Data with the Geoip Filter Plugin

In addition to parsing log data for better searches, filter plugins can derive supplementary information from existing data. As an example, the geoip plugin looks up IP addresses, derives geographic location information from the addresses, and adds that location information to the logs. Configure your Logstash instance to use the geoip filter plugin. The geoip plugin configuration requires you to specify the name of the source field that contains the IP address to look up.

Since filters are evaluated in sequence, make sure that the geoip section is after the grok section of the configuration file and that both the grok and geoip sections are nested within the filter section.

When we’re done, the contents of first-pipeline.conf should look like this:

first-pipeline.conf

input {

beats {

port => "5044"

}

filter {

grok {

match => { "message" => "%{COMBINEDAPACHELOG}"}

}

geoip {

source => "clientip"

}

output {

stdout { codec => rubydebug }

}

Save changes. To force Filebeat to read the log file from scratch, shut down Filebeat (press Ctrl+C), delete the registry file, and then restart Filebeat.

Notice that the event now contains geographic location information:

{

"message" => "1.76.62 - - [04/Jan/2015:05:30:37 +0000] \"GET /MyURL-4 HTTP/1.1\" 200 4877 \"http://www.semicomplete.com/projects/xdotool/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\"",

"httpversion" => "1.1",

"agent" => "\"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\"",

"ident" => "-",

"beat" => {

"hostname" => "hdpmaster",

"name" => "hdpmaster",

"version" => "6.2.4"

"source" => "/usr/hadoopsw/mydata/elkdata/logstash-tutorial-dataset",

"verb" => "GET",

"timestamp" => "04/Jan/2015:05:30:37 +0000",

"auth" => "-",

"referrer" => "\"http://www.semicomplete.com/projects/xdotool/\"",

"@version" => "1",

"host" => "hdpmaster",

"offset" => 24885,

"geoip" => {

"region_name" => "Tokyo",

"city_name" => "Tokyo",

"country_name" => "Japan",

"location" => {

"lon" => 139.7514,

"lat" => 35.685

"region_code" => "13",

"postal_code" => "190-0031",

"country_code2" => "JP",

"longitude" => 139.7514,

"timezone" => "Asia/Tokyo",

"country_code3" => "JP",

"ip" => "1.76.0.62",

"continent_code" => "AS",

"latitude" => 35.685

"response" => "200",

"prospector" => {

"type" => "log"

"request" => "/MyURL-4",

"bytes" => "4877",

"clientip" => "1.76.62",

"@timestamp" => 2018-04-23T14:19:33.037Z,

"tags" => [

[0] "beats_input_codec_plain_applied"

]

}

Indexing Data into Elasticsearch

Now that the web logs are broken down into specific fields, the Logstash pipeline can index the data into an Elasticsearch cluster. Edit the first-pipeline.conf file and replace the entire output section.

first-pipeline.conf

input {

beats {

port => "5044"

}

filter {

grok {

match => { "message" => "%{COMBINEDAPACHELOG}"}

}

geoip {

source => "clientip"

}

output {

elasticsearch {

hosts => [ "localhost:9200" ]

}

Save changes. To force Filebeat to read the log file from scratch, shut down Filebeat (press Ctrl+C), delete the registry file, and then restart Filebeat.

Testing Pipeline

Now that the Logstash pipeline is configured to index the data into an Elasticsearch cluster, we can query Elasticsearch.

First take the list of all indices, notice the logstash-CURRENETDATE is created by logstash eg; logstash-2018.04.23

[hdpsysuser@hdpmaster ~]$ curl -XGET 'localhost:9200/_cat/indices?v&pretty'

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

green open .kibana 4YKQ6kwWSCyQHhM1Ks_ZIQ 1 0 23 3 90.3kb 90.3kb

yellow open logstash-2018.04.23 MELDbsEdQvSlDL8nXj-lyQ 5 1 1 0 23kb 23kb

Try a test query to Elasticsearch based on the fields created by the grok filter plugin.

curl -XGET 'localhost:9200/logstash-$DATE/_search?pretty&q=response=200'

curl -XGET 'localhost:9200/logstash-2018.04.23/_search?pretty&q=response=200'

[hdpsysuser@hdpmaster ~]$ curl -XGET 'localhost:9200/logstash-2018.04.23/_search?pretty&q=response=200'

{

"took" : 48,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : 1,

"max_score" : 0.2876821,

"hits" : [

{

"_index" : "logstash-2018.04.23",

"_type" : "doc",

"_id" : "eOfu8mIBFjVpawNmVgj1",

"_score" : 0.2876821,

"_source" : {

"message" : "1.76.62 - - [04/Jan/2015:05:30:37 +0000] \"GET /MyURL-5 HTTP/1.1\" 200 4877 \"http://www.semicomplete.com/projects/xdotool/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\"",

"httpversion" : "1.1",

"agent" : "\"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\"",

"ident" : "-",

"beat" : {

"hostname" : "hdpmaster",

"name" : "hdpmaster",

"version" : "6.2.4"

"source" : "/usr/hadoopsw/mydata/elkdata/logstash-tutorial-dataset",

"verb" : "GET",

"timestamp" : "04/Jan/2015:05:30:37 +0000",

"auth" : "-",

"referrer" : "\"http://www.semicomplete.com/projects/xdotool/\"",

"@version" : "1",

"host" : "hdpmaster",

"offset" : 25098,

"geoip" : {

"region_name" : "Tokyo",

"city_name" : "Tokyo",

"country_name" : "Japan",

"location" : {

"lon" : 139.7514,

"lat" : 35.685

"region_code" : "13",

"postal_code" : "190-0031",

"country_code2" : "JP",

"longitude" : 139.7514,

"timezone" : "Asia/Tokyo",

"country_code3" : "JP",

"ip" : "1.76.0.62",

"continent_code" : "AS",

"latitude" : 35.685

"response" : "200",

"prospector" : {

"type" : "log"

"request" : "/MyURL-5",

"bytes" : "4877",

"clientip" : "1.76.62",

"@timestamp" : "2018-04-23T14:35:17.340Z",

"tags" : [

"beats_input_codec_plain_applied"

]

}

]

}

Try another search for the geographic information derived from the IP address.

[hdpsysuser@hdpmaster ~]$ curl -XGET 'localhost:9200/logstash-2018.04.23/_search?pretty&q=geoip.city_name=Tokyo'

{

"took" : 27,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : 1,

"max_score" : 0.2876821,

"hits" : [

{

"_index" : "logstash-2018.04.23",

"_type" : "doc",

"_id" : "eOfu8mIBFjVpawNmVgj1",

"_score" : 0.2876821,

"_source" : {

"httpversion" : "1.1",

"agent" : "\"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0\"",

"ident" : "-",

"beat" : {

"hostname" : "hdpmaster",

"name" : "hdpmaster",

"version" : "6.2.4"

"source" : "/usr/hadoopsw/mydata/elkdata/logstash-tutorial-dataset",

"verb" : "GET",

"timestamp" : "04/Jan/2015:05:30:37 +0000",

"auth" : "-",

"referrer" : "\"http://www.semicomplete.com/projects/xdotool/\"",

"@version" : "1",

"host" : "hdpmaster",

"offset" : 25098,

"geoip" : {

"region_name" : "Tokyo",

"city_name" : "Tokyo",

"country_name" : "Japan",

"location" : {

"lon" : 139.7514,

"lat" : 35.685

"region_code" : "13",

"postal_code" : "190-0031",

"country_code2" : "JP",

"longitude" : 139.7514,

"timezone" : "Asia/Tokyo",

"country_code3" : "JP",

"ip" : "1.76.0.62",

"continent_code" : "AS",

"latitude" : 35.685

"response" : "200",

"prospector" : {

"type" : "log"

"request" : "/MyURL-5",

"bytes" : "4877",

"clientip" : "1.76.62",

"@timestamp" : "2018-04-23T14:35:17.340Z",

"tags" : [

"beats_input_codec_plain_applied"

]

}

]

}

If you are using Kibana to visualize your data, you can also explore the Filebeat data in Kibana:

Stashing to HDFS

Webhdfs output plugin

This plugin sends Logstash events into files in HDFS via the webhdfs REST API. The HTTP REST API supports the complete FileSystem interface for HDFS.

vi /usr/hadoopsw/elk/logstash-6.2.3/config/logstash2hdfs.conf

input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
geoip {
source => "clientip"
}
}
output {
webhdfs {
host => "127.0.0.1" # (required)
port => 50070 # (optional, default: 50070)
path => "/user/logstash/%{+YYYY-MM-dd}logstash-%{+HH}.log" #(required) JodaFmt
user => "hdpsysuser" # (required)
}
}

output {
elasticsearch {
hosts => [ "localhost:9200" ]
}
}

Run logstash with new file

[hdpsysuser@hdpmaster ~]$ logstash -f $LS_HOME/config/logstash2hdfs.conf --config.reload.automatic

Create the related folder in HDFS and then add one line in the Filebeat monitored file eg; /usr/hadoopsw/mydata/elkdata/logstash-tutorial-dataset
You will see the new file generated in HDFS. Check its contents.

[hdpsysuser@hdpmaster ~]$ hdfs dfs -mkdir /user/logstash

[hdpsysuser@hdpmaster ~]$ hdfs dfs -ls /user/logstash

Found 1 items
-rwxr-xr-x 1 hdpsysuser supergroup 248 2018-04-23 17:03 /user/logstash/2018-04-23logstash-17.log

[hdpsysuser@hdpmaster ~]$ hdfs dfs -cat /user/logstash/2018-04-23logstash-17.log
2018-04-23T17:03:18.660Z hdpmaster .76.62 - - [04/Jan/2015:05:30:37 +0000] "GET /MyURL-HDFS1 HTTP/1.1" 200 4877 "http://www.semicomplete.com/projects/xdotool/" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0"

Managing Multi Input and Output

You can configure logstash pipeline for multiple inputs and outputs. Sample configuration is provided below.

/data/elk/logstash-6.2.3/config/multifile.cfg

##Input Plugin

input {
file {
type => "technical"
path => "/data/elk/elkdata/tech.log"
}

file {
type => "business"
path => "/data/elk/elkdata/buss.log"
}

file {
type => "application"
path => "/data/elk/elkdata/app.log"
}

}
##input ends

##Filter Plugin
filter {
if [type] == "technical" {
# processing .......
}

if [type] == "business" {
# processing .......
}

if [type] == "application" {
# processing .......
}

}
###filter ends
###Output Plugin
output {
if [type] == "technical" {

webhdfs {
codec=>"json"
host => "nn01" # (required)
port => 50070 # (optional, default: 50070)
path => "/logstash/techlog/%{+YYYY-MM-dd}-technical-%{+HH}.log" #(required) JodaFmt
user => "elk" # (required)
}

} ##technical

if [type] == "business" {

webhdfs {
host => "nn01" # (required)
port => 50070 # (optional, default: 50070)
path => "/logstash/busslog/%{+YYYY-MM-dd}-business-%{+HH}.log" #(required) JodaFmt
user => "elk" # (required)
}
} ##business

if [type] == "application" {

webhdfs {
host => "nn01" # (required)
port => 50070 # (optional, default: 50070)
path => "/logstash/applog/%{+YYYY-MM-dd}-application-%{+HH}.log" #(required) JodaFmt
user => "elk" # (required)
}
} ##application

}
##output ends

Managing Multi Input and Output with beats

Here is the sample configuration to manage multiple inputs and outputs with beats.

First configure the filebeat (filebeat.yml) for harvesting different log files on different locations then configure the logstash pipleline to process the beat.

/data/elk/filebeat-6.2.4-linux-x86_64/filebeat.yml

filebeat.prospectors:

- type: log

paths:

- /data/elk/elkdata/apache.log

fields:

logtype: apache

- type: log

paths:

- /data/elk/elkdata/jboss.log

fields:

logtype: jboss

output.logstash:

hosts: ["dn04:5044"]

/data/elk/logstash-6.2.3/config/multi-beats-pipeline.cfg

input {
beats {
port => "5044"
}
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
if [fields][logtype] == "apache" {
webhdfs {
host => "nn01" # (required)
port => 50070 # (optional, default: 50070)
path => "/logstash/apache/%{+YYYY-MM-dd}-technical-%{+HH}.log" #(required) JodaFmt
user => "elk" # (required)
}
}

if [fields][logtype] == "jboss" {
webhdfs {
host => "nn01" # (required)
port => 50070 # (optional, default: 50070)
path => "/logstash/jboss/%{+YYYY-MM-dd}-technical-%{+HH}.log" #(required) JodaFmt

# path => "/logstash/jboss/%{+YYYY}/%{+MM}/%{+YYYYMMdd}-technical-%{+HH}.log" #(required) JodaFmt

user => "elk" # (required)
}
}

}

Pak DBA

Pages

Monday, January 1, 2018

Configuring Logstash with Elasticsearch

No comments:

Post a Comment