187 lines
7.2 KiB
Text
187 lines
7.2 KiB
Text
|
[[how-metricbeat-works]]
|
||
|
== How Metricbeat works
|
||
|
|
||
|
Metricbeat consists of modules and metricsets. A Metricbeat _module_ defines the
|
||
|
basic logic for collecting data from a specific service, such as Redis, MySQL,
|
||
|
and so on. The module specifies details about the service, including how to connect,
|
||
|
how often to collect metrics, and which metrics to collect.
|
||
|
|
||
|
Each module has one or more metricsets. A _metricset_ is the part of the module
|
||
|
that fetches and structures the data. Rather than collecting each metric as a
|
||
|
separate event, metricsets retrieve a list of multiple related metrics in a single request
|
||
|
to the remote system. So, for example, the Redis module provides an `info`
|
||
|
metricset that collects information and statistics from Redis by running the
|
||
|
http://redis.io/commands/INFO[`INFO`] command and parsing the returned result.
|
||
|
|
||
|
image:./images/module-overview.png[Modules and metricsets]
|
||
|
|
||
|
Likewise, the MySQL module provides a `status` metricset that collects data
|
||
|
from MySQL by running a http://dev.mysql.com/doc/refman/5.7/en/show-status.html[`SHOW GLOBAL STATUS`]
|
||
|
SQL query. Metricsets make it easier for you by grouping sets of related metrics together
|
||
|
in a single request returned by the remote server.
|
||
|
|
||
|
Metricbeat retrieves metrics by periodically interrogating the host system based
|
||
|
on the `period` value that you specify when you configure the module. Because multiple
|
||
|
metricsets can send requests to the same service, Metricbeat reuses connections
|
||
|
whenever possible. If Metricbeat cannot connect to the host system within the time
|
||
|
specified by the `timeout` config setting, it returns an error. Metricbeat sends
|
||
|
the events asynchronously, which means the event retrieval is not acknowledged. If
|
||
|
the configured output is not available, events may be lost.
|
||
|
|
||
|
When Metricbeat encounters an error (for example, when it cannot connect to the host
|
||
|
system), it sends an event error to the specified output. This means that Metricbeat
|
||
|
always sends an event, even when there is a failure. This allows you to monitor
|
||
|
for errors and see debug messages to help you diagnose what went wrong.
|
||
|
|
||
|
The following topics provide more detail about the structure of Metricbeat events:
|
||
|
|
||
|
* <<metricbeat-event-structure>>
|
||
|
* <<error-event-structure>>
|
||
|
|
||
|
For more about the benefits of using Metricbeat, see <<key-features>>.
|
||
|
|
||
|
[[metricbeat-event-structure]]
|
||
|
=== Event structure
|
||
|
|
||
|
Every event sent by Metricbeat has the same basic structure. It contains the following fields:
|
||
|
|
||
|
*`@timestamp`*:: Time when the event was captured
|
||
|
*`beat.hostname`*:: Hostname of the server on which the Beat is running
|
||
|
*`beat.name`*:: Name given to the Beat
|
||
|
*`metricset.module`*:: Name of the module that the data is from
|
||
|
*`metricset.name`*:: Name of the metricset that the data is from
|
||
|
*`metricset.rtt`*:: Round trip time of the request in microseconds
|
||
|
*`type`*:: This is always "metricsets"
|
||
|
|
||
|
For example:
|
||
|
|
||
|
[source,json]
|
||
|
----
|
||
|
{
|
||
|
"@timestamp": "2016-06-22T22:05:53.291Z",
|
||
|
"beat": {
|
||
|
"hostname": "host.example.com",
|
||
|
"name": "host.example.com"
|
||
|
},
|
||
|
"metricset": {
|
||
|
"module": "system",
|
||
|
"name": "process",
|
||
|
"rtt": 7419
|
||
|
},
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
|
||
|
"type": "metricsets"
|
||
|
}
|
||
|
----
|
||
|
|
||
|
For more information about the exported fields, see <<exported-fields>>.
|
||
|
|
||
|
[[error-event-structure]]
|
||
|
=== Error event structure
|
||
|
|
||
|
Metricbeat sends an error event when the service is not reachable. The error event
|
||
|
has the same structure as the <<metricbeat-event-structure,base event>>, but also
|
||
|
has an error field that contains an error string. This makes it possible to check
|
||
|
for errors across all metric events.
|
||
|
|
||
|
The following example shows an error event sent when the Apache server is not
|
||
|
reachable:
|
||
|
|
||
|
[source,json]
|
||
|
----
|
||
|
{
|
||
|
"@timestamp": "2016-03-18T12:18:57.124Z",
|
||
|
"apache-status": {},
|
||
|
"beat": {
|
||
|
"hostname": "host.example.com",
|
||
|
"name": "host.example.com"
|
||
|
},
|
||
|
"error": {
|
||
|
"message": "Get http://127.0.0.1/server-status?auto: dial tcp 127.0.0.1:80: getsockopt: connection refused",
|
||
|
},
|
||
|
"metricset": {
|
||
|
"module": "apache",
|
||
|
"name": "status",
|
||
|
"rtt": 1082
|
||
|
},
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
|
||
|
"type": "metricsets"
|
||
|
----
|
||
|
|
||
|
[[key-features]]
|
||
|
=== Key metricbeat features
|
||
|
|
||
|
Metricbeat has some key features that are critical to how it works:
|
||
|
|
||
|
* <<metricbeat-error-events>>
|
||
|
* <<no-aggregations>>
|
||
|
* <<more-than-numbers>>
|
||
|
* <<multiple-events-in-one>>
|
||
|
|
||
|
[[metricbeat-error-events]]
|
||
|
==== Metricbeat error events
|
||
|
|
||
|
Metricbeat sends more than just metrics. When it cannot retrieve metrics, it
|
||
|
sends error events. The error is not simply a flag, but a full error string that is
|
||
|
created during fetching from the host systems. This enables you to monitor not
|
||
|
only the metrics, but also any errors that occur during metrics monitoring.
|
||
|
|
||
|
Because you see the full error message, you can track down the error faster.
|
||
|
Metricbeat is installed locally on the host machine, which means that you can
|
||
|
differentiate errors that happen locally from other issues, such as network problems.
|
||
|
|
||
|
Each metricset is retrieved based on a predefined period, so when Metricbeat fails to
|
||
|
retrieve metrics for more than one interval, you can infer that there is potentially
|
||
|
something wrong with the host or host connectivity.
|
||
|
|
||
|
[[no-aggregations]]
|
||
|
==== No aggregations when data is fetched
|
||
|
|
||
|
Metricbeat doesn't do aggregations like gauge, sum, counters, and so on. Metricbeat
|
||
|
sends the raw data retrieved from the host to the output for processing. When using
|
||
|
Elasticsearch, this has the advantage that all raw data is available on the
|
||
|
Elasticsearch host for drilling down into the details, and the data can be
|
||
|
reprocessed at any time. It also reduces the complexity of Metricbeat.
|
||
|
|
||
|
[[more-than-numbers]]
|
||
|
==== Sends more than just numbers
|
||
|
|
||
|
Metricbeat sends more than just numbers. The metrics that Metricbeat sends can also
|
||
|
contain strings to report status information. This is useful when you're using
|
||
|
Elasticsearch to store the metrics data. Because each metricset has a predefined
|
||
|
structure, Elasticsearch knows in advance which types will be stored in
|
||
|
Elasticsearch, and it can optimize storage.
|
||
|
|
||
|
Basic meta information about each metric (such as the host) is also sent as part
|
||
|
of each event.
|
||
|
|
||
|
[[multiple-events-in-one]]
|
||
|
==== Multiple metrics in one event
|
||
|
|
||
|
Rather than containing a single metric, each event created by Metricbeat
|
||
|
contains a list of metrics. This means that you can retrieve all the metrics
|
||
|
in a single request to the host system, resulting in less load on the host
|
||
|
system. If you are sending the metrics to Elasticsearch as the output,
|
||
|
Elasticsearch can directly store and query the metrics as a nested
|
||
|
JSON document, making it very efficient for sending metrics data to Elasticsearch.
|
||
|
|
||
|
Because the full raw event data is available, Metricbeat or Elasticsearch can
|
||
|
do any required transformations on the data later. For example, if you need to
|
||
|
store data in the http://metrics20.org/[Metrics2.0] format, you could generate
|
||
|
the format out of the existing event by splitting up the full event into multiple
|
||
|
metrics2.0 events.
|
||
|
|
||
|
Meta information about the type of each metric is stored in the mapping
|
||
|
template. Meta information that is common to all metric events, such as host and
|
||
|
timestamp, is part of the event structure itself and is only stored once for
|
||
|
all events in the metricset.
|
||
|
|
||
|
Having all the related metrics in a single event also makes it easier to look
|
||
|
at other values when one of the metrics for a service seems off.
|
||
|
|