[[how-metricbeat-works]]
== How Metricbeat works

Metricbeat consists of modules and metricsets. A Metricbeat _module_ defines the
basic logic for collecting data from a specific service, such as Redis, MySQL,
and so on. The module specifies details about the service, including how to connect,
how often to collect metrics, and which metrics to collect.

Each module has one or more metricsets. A _metricset_ is the part of the module
that fetches and structures the data. Rather than collecting each metric as a
separate event, metricsets retrieve a list of multiple related metrics in a single request
to the remote system. So, for example, the Redis module provides an `info`
metricset that collects information and statistics from Redis by running the
http://redis.io/commands/INFO[`INFO`] command and parsing the returned result.

image:./images/module-overview.png[Modules and metricsets]

Likewise, the MySQL module provides a `status` metricset that collects data
from MySQL by running a http://dev.mysql.com/doc/refman/5.7/en/show-status.html[`SHOW GLOBAL STATUS`]
SQL query. Metricsets make it easier for you by grouping sets of related metrics together
in a single request returned by the remote server.

Metricbeat retrieves metrics by periodically interrogating the host system based
on the `period` value that you specify when you configure the module. Because multiple
metricsets can send requests to the same service, Metricbeat reuses connections
whenever possible. If Metricbeat cannot connect to the host system within the time
specified by the `timeout` config setting, it returns an error. Metricbeat sends
the events asynchronously, which means the event retrieval is not acknowledged. If
the configured output is not available, events may be lost.

When Metricbeat encounters an error (for example, when it cannot connect to the host
system), it sends an event error to the specified output. This means that Metricbeat
always sends an event, even when there is a failure. This allows you to monitor
for errors and see debug messages to help you diagnose what went wrong.

The following topics provide more detail about the structure of Metricbeat events:

* <<metricbeat-event-structure>>
* <<error-event-structure>>

For more about the benefits of using Metricbeat, see <<key-features>>.

[[metricbeat-event-structure]]
===  Event structure

Every event sent by Metricbeat has the same basic structure. It contains the following fields:

*`@timestamp`*:: Time when the event was captured
*`beat.hostname`*:: Hostname of the server on which the Beat is running
*`beat.name`*:: Name given to the Beat
*`metricset.module`*:: Name of the module that the data is from
*`metricset.name`*:: Name of the metricset that the data is from
*`metricset.rtt`*:: Round trip time of the request in microseconds
*`type`*:: This is always "metricsets"

For example:

[source,json]
----
{
  "@timestamp": "2016-06-22T22:05:53.291Z",
  "beat": {
    "hostname": "host.example.com",
    "name": "host.example.com"
  },
  "metricset": {
    "module": "system",
    "name": "process",
    "rtt": 7419
  },
  .
  .
  .

  "type": "metricsets"
}
----

For more information about the exported fields, see <<exported-fields>>.

[[error-event-structure]]
===  Error event structure

Metricbeat sends an error event when the service is not reachable. The error event
has the same structure as the <<metricbeat-event-structure,base event>>, but also
has an error field that contains an error string. This makes it possible to check
for errors across all metric events.

The following example shows an error event sent when the Apache server is not
reachable:

[source,json]
----
{
  "@timestamp": "2016-03-18T12:18:57.124Z",
  "apache-status": {},
  "beat": {
    "hostname": "host.example.com",
    "name": "host.example.com"
  },
  "error": {
    "message": "Get http://127.0.0.1/server-status?auto: dial tcp 127.0.0.1:80: getsockopt: connection refused",
  },
  "metricset": {
    "module": "apache",
    "name": "status",
    "rtt": 1082
  },
  .
  .
  .

  "type": "metricsets"
----

[[key-features]]
=== Key metricbeat features

Metricbeat has some key features that are critical to how it works:

* <<metricbeat-error-events>>
* <<no-aggregations>>
* <<more-than-numbers>>
* <<multiple-events-in-one>>

[[metricbeat-error-events]]
==== Metricbeat error events

Metricbeat sends more than just metrics. When it cannot retrieve metrics, it
sends error events. The error is not simply a flag, but a full error string that is
created during fetching from the host systems. This enables you to monitor not
only the metrics, but also any errors that occur during metrics monitoring.

Because you see the full error message, you can track down the error faster.
Metricbeat is installed locally on the host machine, which means that you can
differentiate errors that happen locally from other issues, such as network problems.

Each metricset is retrieved based on a predefined period, so when Metricbeat fails to
retrieve metrics for more than one interval, you can infer that there is potentially
something wrong with the host or host connectivity.

[[no-aggregations]]
==== No aggregations when data is fetched

Metricbeat doesn't do aggregations like gauge, sum, counters, and so on. Metricbeat
sends the raw data retrieved from the host to the output for processing. When using
Elasticsearch, this has the advantage that all raw data is available on the
Elasticsearch host for drilling down into the details, and the data can be
reprocessed at any time. It also reduces the complexity of Metricbeat.

[[more-than-numbers]]
==== Sends more than just numbers

Metricbeat sends more than just numbers. The metrics that Metricbeat sends can also
contain strings to report status information. This is useful when you're using
Elasticsearch to store the metrics data. Because each metricset has a predefined
structure, Elasticsearch knows in advance which types will be stored in
Elasticsearch, and it can optimize storage.

Basic meta information about each metric (such as the host) is also sent as part
of each event.

[[multiple-events-in-one]]
==== Multiple metrics in one event

Rather than containing a single metric, each event created by Metricbeat
contains a list of metrics. This means that you can retrieve all the metrics
in a single request to the host system, resulting in less load on the host
system. If you are sending the metrics to Elasticsearch as the output,
Elasticsearch can directly store and query the metrics as a nested
JSON document, making it very efficient for sending metrics data to Elasticsearch.

Because the full raw event data is available, Metricbeat or Elasticsearch can
do any required transformations on the data later. For example, if you need to
store data in the http://metrics20.org/[Metrics2.0] format, you could generate
the format out of the existing event by splitting up the full event into multiple
metrics2.0 events.

Meta information about the type of each metric is stored in the mapping
template. Meta information that is common to all metric events, such as host and
timestamp, is part of the event structure itself  and is only stored once for
all events in the metricset.

Having all the related metrics in a single event also makes it easier to look
at other values when one of the metrics for a service seems off.