1123 lines
32 KiB
Text
1123 lines
32 KiB
Text
[[defining-processors]]
|
||
=== Define processors
|
||
|
||
You can use processors to filter and enhance data before sending it to the
|
||
configured output. To define a processor, you specify the processor name, an
|
||
optional condition, and a set of parameters:
|
||
|
||
[source,yaml]
|
||
------
|
||
processors:
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
|
||
...
|
||
------
|
||
|
||
Where:
|
||
|
||
* `<processor_name>` specifies a <<processors,processor>> that performs some kind
|
||
of action, such as selecting the fields that are exported or adding metadata to
|
||
the event.
|
||
* `<condition>` specifies an optional <<conditions,condition>>. If the
|
||
condition is present, then the action is executed only if the condition is
|
||
fulfilled. If no condition is passed, then the action is always executed.
|
||
* `<parameters>` is the list of parameters to pass to the processor.
|
||
|
||
|
||
[[where-valid]]
|
||
==== Where are processors valid?
|
||
|
||
// TODO: ANY NEW BEATS THAT RE-USE THIS TOPIC NEED TO DEFINE processor-scope.
|
||
|
||
ifeval::["{beatname_lc}"=="filebeat"]
|
||
:processor-scope: input
|
||
endif::[]
|
||
|
||
ifeval::["{beatname_lc}"=="auditbeat" or "{beatname_lc}"=="metricbeat"]
|
||
:processor-scope: module
|
||
endif::[]
|
||
|
||
ifeval::["{beatname_lc}"=="packetbeat"]
|
||
:processor-scope: protocol
|
||
endif::[]
|
||
|
||
ifeval::["{beatname_lc}"=="heartbeat"]
|
||
:processor-scope: monitor
|
||
endif::[]
|
||
|
||
ifeval::["{beatname_lc}"=="winlogbeat"]
|
||
:processor-scope: event log shipper
|
||
endif::[]
|
||
|
||
Processors are valid:
|
||
|
||
* At the top-level in the configuration. The processor is applied to all data
|
||
collected by {beatname_uc}.
|
||
* Under a specific {processor-scope}. The processor is applied to the data
|
||
collected for that {processor-scope}.
|
||
ifeval::["{beatname_lc}"=="filebeat"]
|
||
For example:
|
||
+
|
||
[source,yaml]
|
||
------
|
||
- type: <input_type>
|
||
processors:
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
...
|
||
------
|
||
+
|
||
Similarly, for {beatname_uc} modules, you can define processors under the
|
||
`input` section of the module definition.
|
||
endif::[]
|
||
ifeval::["{beatname_lc}"=="metricbeat"]
|
||
[source,yaml]
|
||
----
|
||
- module: <module_name>
|
||
metricsets: ["<metricset_name>"]
|
||
processors:
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
----
|
||
endif::[]
|
||
ifeval::["{beatname_lc}"=="auditbeat"]
|
||
For example:
|
||
+
|
||
[source,yaml]
|
||
----
|
||
auditbeat.modules:
|
||
- module: <module_name>
|
||
processors:
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
----
|
||
endif::[]
|
||
ifeval::["{beatname_lc}"=="packetbeat"]
|
||
For example:
|
||
+
|
||
[source,yaml]
|
||
----
|
||
packetbeat.protocols:
|
||
- type: <protocol_type>
|
||
processors:
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
----
|
||
|
||
* Under `packetbeat.flows`. The processor is applied to the data in
|
||
<<configuration-flows,network flows>>:
|
||
+
|
||
[source,yaml]
|
||
----
|
||
packetbeat.flows:
|
||
processors:
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
----
|
||
endif::[]
|
||
ifeval::["{beatname_lc}"=="heartbeat"]
|
||
For example:
|
||
+
|
||
[source,yaml]
|
||
----
|
||
heartbeat.monitors:
|
||
- type: <monitor_type>
|
||
processors:
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
----
|
||
endif::[]
|
||
ifeval::["{beatname_lc}"=="winlogbeat"]
|
||
For example:
|
||
+
|
||
[source,yaml]
|
||
----
|
||
winlogbeat.event_logs:
|
||
- name: <network_shipper_name>
|
||
processors:
|
||
- <processor_name>:
|
||
when:
|
||
<condition>
|
||
<parameters>
|
||
----
|
||
endif::[]
|
||
|
||
|
||
[[processors]]
|
||
==== Processors
|
||
|
||
The supported processors are:
|
||
|
||
* <<add-cloud-metadata,`add_cloud_metadata`>>
|
||
* <<add-locale,`add_locale`>>
|
||
* <<decode-json-fields,`decode_json_fields`>>
|
||
* <<drop-event,`drop_event`>>
|
||
* <<drop-fields,`drop_fields`>>
|
||
* <<include-fields,`include_fields`>>
|
||
* <<rename-fields,`rename`>>
|
||
* <<add-kubernetes-metadata,`add_kubernetes_metadata`>>
|
||
* <<add-docker-metadata,`add_docker_metadata`>>
|
||
* <<add-host-metadata,`add_host_metadata`>>
|
||
* <<dissect, `dissect`>>
|
||
* <<processor-dns, `dns`>>
|
||
* <<add-process-metadata,`add_process_metadata`>>
|
||
|
||
[[conditions]]
|
||
==== Conditions
|
||
|
||
Each condition receives a field to compare. You can specify multiple fields
|
||
under the same condition by using `AND` between the fields (for example,
|
||
`field1 AND field2`).
|
||
|
||
For each field, you can specify a simple field name or a nested map, for example
|
||
`dns.question.name`.
|
||
|
||
See <<exported-fields>> for a list of all the fields that are exported by
|
||
{beatname_uc}.
|
||
|
||
The supported conditions are:
|
||
|
||
* <<condition-equals,`equals`>>
|
||
* <<condition-contains,`contains`>>
|
||
* <<condition-regexp,`regexp`>>
|
||
* <<condition-range, `range`>>
|
||
* <<condition-has_fields, `has_fields`>>
|
||
* <<condition-or, `or`>>
|
||
* <<condition-and, `and`>>
|
||
* <<condition-not, `not`>>
|
||
|
||
|
||
[float]
|
||
[[condition-equals]]
|
||
===== `equals`
|
||
|
||
With the `equals` condition, you can compare if a field has a certain value.
|
||
The condition accepts only an integer or a string value.
|
||
|
||
For example, the following condition checks if the response code of the HTTP
|
||
transaction is 200:
|
||
|
||
[source,yaml]
|
||
-------
|
||
equals:
|
||
http.response.code: 200
|
||
-------
|
||
|
||
[float]
|
||
[[condition-contains]]
|
||
===== `contains`
|
||
|
||
The `contains` condition checks if a value is part of a field. The field can be
|
||
a string or an array of strings. The condition accepts only a string value.
|
||
|
||
For example, the following condition checks if an error is part of the
|
||
transaction status:
|
||
|
||
[source,yaml]
|
||
------
|
||
contains:
|
||
status: "Specific error"
|
||
------
|
||
|
||
[float]
|
||
[[condition-regexp]]
|
||
===== `regexp`
|
||
|
||
The `regexp` condition checks the field against a regular expression. The
|
||
condition accepts only strings.
|
||
|
||
For example, the following condition checks if the process name starts with
|
||
`foo`:
|
||
|
||
[source,yaml]
|
||
-----
|
||
regexp:
|
||
system.process.name: "foo.*"
|
||
-----
|
||
|
||
[float]
|
||
[[condition-range]]
|
||
===== `range`
|
||
|
||
The `range` condition checks if the field is in a certain range of values. The
|
||
condition supports `lt`, `lte`, `gt` and `gte`. The condition accepts only
|
||
integer or float values.
|
||
|
||
For example, the following condition checks for failed HTTP transactions by
|
||
comparing the `http.response.code` field with 400.
|
||
|
||
|
||
[source,yaml]
|
||
------
|
||
range:
|
||
http.response.code:
|
||
gte: 400
|
||
------
|
||
|
||
This can also be written as:
|
||
|
||
[source,yaml]
|
||
----
|
||
range:
|
||
http.response.code.gte: 400
|
||
----
|
||
|
||
The following condition checks if the CPU usage in percentage has a value
|
||
between 0.5 and 0.8.
|
||
|
||
[source,yaml]
|
||
------
|
||
range:
|
||
system.cpu.user.pct.gte: 0.5
|
||
system.cpu.user.pct.lt: 0.8
|
||
------
|
||
|
||
|
||
[float]
|
||
[[condition-has_fields]]
|
||
===== `has_fields`
|
||
|
||
The `has_fields` condition checks if all the given fields exist in the
|
||
event. The condition accepts a list of string values denoting the field names.
|
||
|
||
For example, the following condition checks if the `http.response.code` field
|
||
is present in the event.
|
||
|
||
|
||
[source,yaml]
|
||
------
|
||
has_fields: ['http.response.code']
|
||
------
|
||
|
||
|
||
[float]
|
||
[[condition-or]]
|
||
===== `or`
|
||
|
||
The `or` operator receives a list of conditions.
|
||
|
||
[source,yaml]
|
||
-------
|
||
or:
|
||
- <condition1>
|
||
- <condition2>
|
||
- <condition3>
|
||
...
|
||
|
||
-------
|
||
|
||
For example, to configure the condition
|
||
`http.response.code = 304 OR http.response.code = 404`:
|
||
|
||
[source,yaml]
|
||
------
|
||
or:
|
||
- equals:
|
||
http.response.code: 304
|
||
- equals:
|
||
http.response.code: 404
|
||
------
|
||
|
||
[float]
|
||
[[condition-and]]
|
||
===== `and`
|
||
|
||
The `and` operator receives a list of conditions.
|
||
|
||
[source,yaml]
|
||
-------
|
||
and:
|
||
- <condition1>
|
||
- <condition2>
|
||
- <condition3>
|
||
...
|
||
|
||
-------
|
||
|
||
For example, to configure the condition
|
||
`http.response.code = 200 AND status = OK`:
|
||
|
||
[source,yaml]
|
||
------
|
||
and:
|
||
- equals:
|
||
http.response.code: 200
|
||
- equals:
|
||
status: OK
|
||
------
|
||
|
||
To configure a condition like `<condition1> OR <condition2> AND <condition3>`:
|
||
|
||
[source,yaml]
|
||
------
|
||
or:
|
||
- <condition1>
|
||
- and:
|
||
- <condition2>
|
||
- <condition3>
|
||
|
||
------
|
||
|
||
[float]
|
||
[[condition-not]]
|
||
===== `not`
|
||
|
||
The `not` operator receives the condition to negate.
|
||
|
||
[source,yaml]
|
||
-------
|
||
not:
|
||
<condition>
|
||
|
||
-------
|
||
|
||
For example, to configure the condition `NOT status = OK`:
|
||
|
||
[source,yaml]
|
||
------
|
||
not:
|
||
equals:
|
||
status: OK
|
||
------
|
||
|
||
[[add-cloud-metadata]]
|
||
=== Add cloud metadata
|
||
|
||
The `add_cloud_metadata` processor enriches each event with instance metadata
|
||
from the machine's hosting provider. At startup it will detect the hosting
|
||
provider and cache the instance metadata.
|
||
|
||
The following cloud providers are supported:
|
||
|
||
- Amazon Elastic Compute Cloud (EC2)
|
||
- Digital Ocean
|
||
- Google Compute Engine (GCE)
|
||
- https://www.qcloud.com/?lang=en[Tencent Cloud] (QCloud)
|
||
- Alibaba Cloud (ECS)
|
||
- Azure Virtual Machine
|
||
- Openstack Nova
|
||
|
||
The simple configuration below enables the processor.
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_cloud_metadata: ~
|
||
-------------------------------------------------------------------------------
|
||
|
||
The `add_cloud_metadata` processor has one optional configuration setting named
|
||
`timeout` that specifies the maximum amount of time to wait for a successful
|
||
response when detecting the hosting provider. The default timeout value is
|
||
`3s`.
|
||
|
||
If a timeout occurs then no instance metadata will be added to the events. This
|
||
makes it possible to enable this processor for all your deployments (in the
|
||
cloud or on-premise).
|
||
|
||
The metadata that is added to events varies by hosting provider. Below are
|
||
examples for each of the supported providers.
|
||
|
||
_EC2_
|
||
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
{
|
||
"meta": {
|
||
"cloud": {
|
||
"availability_zone": "us-east-1c",
|
||
"instance_id": "i-4e123456",
|
||
"machine_type": "t2.medium",
|
||
"provider": "ec2",
|
||
"region": "us-east-1"
|
||
}
|
||
}
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
_Digital Ocean_
|
||
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
{
|
||
"meta": {
|
||
"cloud": {
|
||
"instance_id": "1234567",
|
||
"provider": "digitalocean",
|
||
"region": "nyc2"
|
||
}
|
||
}
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
_GCE_
|
||
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
{
|
||
"meta": {
|
||
"cloud": {
|
||
"availability_zone": "projects/1234567890/zones/us-east1-b",
|
||
"instance_id": "1234556778987654321",
|
||
"machine_type": "projects/1234567890/machineTypes/f1-micro",
|
||
"project_id": "my-dev",
|
||
"provider": "gce"
|
||
}
|
||
}
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
_Tencent Cloud_
|
||
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
{
|
||
"meta": {
|
||
"cloud": {
|
||
"availability_zone": "gz-azone2",
|
||
"instance_id": "ins-qcloudv5",
|
||
"provider": "qcloud",
|
||
"region": "china-south-gz"
|
||
}
|
||
}
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
_Alibaba Cloud_
|
||
|
||
This metadata is only available when VPC is selected as the network type of the
|
||
ECS instance.
|
||
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
{
|
||
"meta": {
|
||
"cloud": {
|
||
"availability_zone": "cn-shenzhen",
|
||
"instance_id": "i-wz9g2hqiikg0aliyun2b",
|
||
"provider": "ecs",
|
||
"region": "cn-shenzhen-a"
|
||
}
|
||
}
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
_Azure Virtual Machine_
|
||
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
{
|
||
"meta": {
|
||
"cloud": {
|
||
"provider": "az",
|
||
"instance_id": "04ab04c3-63de-4709-a9f9-9ab8c0411d5e",
|
||
"instance_name": "test-az-vm",
|
||
"machine_type": "Standard_D3_v2",
|
||
"region": "eastus2"
|
||
}
|
||
}
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
_Openstack Nova_
|
||
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
{
|
||
"meta": {
|
||
"cloud": {
|
||
"provider": "openstack",
|
||
"instance_name": "test-998d932195.mycloud.tld",
|
||
"availability_zone": "xxxx-az-c",
|
||
"instance_id": "i-00011a84",
|
||
"machine_type": "m2.large"
|
||
}
|
||
}
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
|
||
[[add-locale]]
|
||
=== Add the local time zone
|
||
|
||
The `add_locale` processor enriches each event with the machine's time zone
|
||
offset from UTC or with the name of the time zone. It supports one configuration
|
||
option named `format` that controls whether an offset or time zone abbreviation
|
||
is added to the event. The default format is `offset`. The processor adds the
|
||
a `beat.timezone` value to each event.
|
||
|
||
The configuration below enables the processor with the default settings.
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_locale: ~
|
||
-------------------------------------------------------------------------------
|
||
|
||
This configuration enables the processor and configures it to add the time zone
|
||
abbreviation to events.
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_locale:
|
||
format: abbreviation
|
||
-------------------------------------------------------------------------------
|
||
|
||
NOTE: Please note that `add_locale` differentiates between daylight savings
|
||
time (DST) and regular time. For example `CEST` indicates DST and and `CET` is
|
||
regular time.
|
||
|
||
|
||
[[decode-json-fields]]
|
||
=== Decode JSON fields
|
||
|
||
The `decode_json_fields` processor decodes fields containing JSON strings and
|
||
replaces the strings with valid JSON objects.
|
||
|
||
[source,yaml]
|
||
-----------------------------------------------------
|
||
processors:
|
||
- decode_json_fields:
|
||
fields: ["field1", "field2", ...]
|
||
process_array: false
|
||
max_depth: 1
|
||
target: ""
|
||
overwrite_keys: false
|
||
-----------------------------------------------------
|
||
|
||
The `decode_json_fields` processor has the following configuration settings:
|
||
|
||
`fields`:: The fields containing JSON strings to decode.
|
||
`process_array`:: (Optional) A boolean that specifies whether to process
|
||
arrays. The default is false.
|
||
`max_depth`:: (Optional) The maximum parsing depth. The default is 1.
|
||
`target`:: (Optional) The field under which the decoded JSON will be written. By
|
||
default the decoded JSON object replaces the string field from which it was
|
||
read. To merge the decoded JSON fields into the root of the event, specify
|
||
`target` with an empty string (`target: ""`). Note that the `null` value (`target:`)
|
||
is treated as if the field was not set at all.
|
||
`overwrite_keys`:: (Optional) A boolean that specifies whether keys that already
|
||
exist in the event are overwritten by keys from the decoded JSON object. The
|
||
default value is false.
|
||
|
||
[[drop-event]]
|
||
=== Drop events
|
||
|
||
The `drop_event` processor drops the entire event if the associated condition
|
||
is fulfilled. The condition is mandatory, because without one, all the events
|
||
are dropped.
|
||
|
||
[source,yaml]
|
||
------
|
||
processors:
|
||
- drop_event:
|
||
when:
|
||
condition
|
||
------
|
||
|
||
See <<conditions>> for a list of supported conditions.
|
||
|
||
[[drop-fields]]
|
||
=== Drop fields from events
|
||
|
||
The `drop_fields` processor specifies which fields to drop if a certain
|
||
condition is fulfilled. The condition is optional. If it's missing, the
|
||
specified fields are always dropped. The `@timestamp` and `type` fields cannot
|
||
be dropped, even if they show up in the `drop_fields` list.
|
||
|
||
[source,yaml]
|
||
-----------------------------------------------------
|
||
processors:
|
||
- drop_fields:
|
||
when:
|
||
condition
|
||
fields: ["field1", "field2", ...]
|
||
-----------------------------------------------------
|
||
|
||
See <<conditions>> for a list of supported conditions.
|
||
|
||
NOTE: If you define an empty list of fields under `drop_fields`, then no fields
|
||
are dropped.
|
||
|
||
[[include-fields]]
|
||
=== Keep fields from events
|
||
|
||
The `include_fields` processor specifies which fields to export if a certain
|
||
condition is fulfilled. The condition is optional. If it's missing, the
|
||
specified fields are always exported. The `@timestamp` and `type` fields are
|
||
always exported, even if they are not defined in the `include_fields` list.
|
||
|
||
[source,yaml]
|
||
-------
|
||
processors:
|
||
- include_fields:
|
||
when:
|
||
condition
|
||
fields: ["field1", "field2", ...]
|
||
-------
|
||
|
||
See <<conditions>> for a list of supported conditions.
|
||
|
||
You can specify multiple `include_fields` processors under the `processors`
|
||
section.
|
||
|
||
NOTE: If you define an empty list of fields under `include_fields`, then only
|
||
the required fields, `@timestamp` and `type`, are exported.
|
||
|
||
[[rename-fields]]
|
||
=== Rename fields from events
|
||
|
||
The `rename` processor specifies a list of fields to rename. Under the `fields`
|
||
key each entry contains a `from: old-key` and a `to: new-key` pair. `from` is
|
||
the origin and `to` the target name of the field.
|
||
|
||
Renaming fields can be useful in cases where field names cause conflicts. For
|
||
example if an event has two fields, `c` and `c.b`, that are both assigned scalar
|
||
values (e.g. `{"c": 1, "c.b": 2}`) this will result in an Elasticsearch error at
|
||
ingest time. This is because the value of a cannot simultaneously be a scalar
|
||
and an object. To prevent this rename_fields can be used to rename `c` to
|
||
`c.value`.
|
||
|
||
Rename fields cannot be used to overwrite fields. To overwrite fields either
|
||
first rename the target field or use the `drop_fields` processor to drop the
|
||
field and then rename the field.
|
||
|
||
[source,yaml]
|
||
-------
|
||
processors:
|
||
- rename:
|
||
fields:
|
||
- from: "a.g"
|
||
to: "e.d"
|
||
ignore_missing: false
|
||
fail_on_error: true
|
||
-------
|
||
|
||
The `rename` processor has the following configuration settings:
|
||
|
||
`ignore_missing`:: (Optional) If set to true, no error is logged in case a key
|
||
which should be renamed is missing. Default is `false`.
|
||
|
||
`fail_on_error`:: (Optional) If set to true, in case of an error the renaming of
|
||
fields is stopped and the original event is returned. If set to false, renaming
|
||
continues also if an error happened during renaming. Default is `true`.
|
||
|
||
See <<conditions>> for a list of supported conditions.
|
||
|
||
You can specify multiple `ignore_missing` processors under the `processors`
|
||
section.
|
||
|
||
[[add-kubernetes-metadata]]
|
||
=== Add Kubernetes metadata
|
||
|
||
The `add_kubernetes_metadata` processor annotates each event with relevant
|
||
metadata based on which Kubernetes pod the event originated from. Each event is
|
||
annotated with:
|
||
|
||
* Pod Name
|
||
* Namespace
|
||
* Labels
|
||
|
||
The `add_kubernetes_metadata` processor has two basic building blocks which are:
|
||
|
||
* Indexers
|
||
* Matchers
|
||
|
||
Indexers take in a pod's metadata and builds indices based on the pod metadata.
|
||
For example, the `ip_port` indexer can take a Kubernetes pod and index the pod
|
||
metadata based on all `pod_ip:container_port` combinations.
|
||
|
||
Matchers are used to construct lookup keys for querying indices. For example,
|
||
when the `fields` matcher takes `["metricset.host"]` as a lookup field, it would
|
||
construct a lookup key with the value of the field `metricset.host`.
|
||
|
||
Each Beat can define its own default indexers and matchers which are enabled by
|
||
default. For example, FileBeat enables the `container` indexer, which indexes
|
||
pod metadata based on all container IDs, and a `logs_path` matcher, which takes
|
||
the `source` field, extracts the container ID, and uses it to retrieve metadata.
|
||
|
||
The configuration below enables the processor when {beatname_lc} is run as a pod in
|
||
Kubernetes.
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_kubernetes_metadata:
|
||
in_cluster: true
|
||
-------------------------------------------------------------------------------
|
||
|
||
The configuration below enables the processor on a Beat running as a process on
|
||
the Kubernetes node.
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_kubernetes_metadata:
|
||
in_cluster: false
|
||
host: <hostname>
|
||
kube_config: ${HOME}/.kube/config
|
||
-------------------------------------------------------------------------------
|
||
|
||
The configuration below has the default indexers and matchers disabled and
|
||
enables ones that the user is interested in.
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_kubernetes_metadata:
|
||
in_cluster: false
|
||
host: <hostname>
|
||
kube_config: ~/.kube/config
|
||
default_indexers.enabled: false
|
||
default_matchers.enabled: false
|
||
indexers:
|
||
- ip_port:
|
||
matchers:
|
||
- fields:
|
||
lookup_fields: ["metricset.host"]
|
||
-------------------------------------------------------------------------------
|
||
|
||
The `add_kubernetes_metadata` processor has the following configuration settings:
|
||
|
||
`in_cluster`:: (Optional) Use in cluster settings for Kubernetes client, `true`
|
||
by default.
|
||
`host`:: (Optional) Identify the node where {beatname_lc} is running in case it
|
||
cannot be accurately detected, as when running {beatname_lc} in host network
|
||
mode.
|
||
`kube_config`:: (Optional) Use given config file as configuration for Kubernetes
|
||
client.
|
||
`default_indexers.enabled`:: (Optional) Enable/Disable default pod indexers, in
|
||
case you want to specify your own.
|
||
`default_matchers.enabled`:: (Optional) Enable/Disable default pod matchers, in
|
||
case you want to specify your own.
|
||
|
||
[[add-docker-metadata]]
|
||
=== Add Docker metadata
|
||
|
||
The `add_docker_metadata` processor annotates each event with relevant metadata
|
||
from Docker containers:
|
||
|
||
* Container ID
|
||
* Name
|
||
* Image
|
||
* Labels
|
||
|
||
[NOTE]
|
||
=====
|
||
When running {beatname_uc} in a container, you need to provide access to
|
||
Docker’s unix socket in order for the `add_docker_metadata` processor to work.
|
||
You can do this by mounting the socket inside the container. For example:
|
||
|
||
`docker run -v /var/run/docker.sock:/var/run/docker.sock ...`
|
||
|
||
To avoid privilege issues, you may also need to add `--user=root` to the
|
||
`docker run` flags. Because the user must be part of the docker group in order
|
||
to access `/var/run/docker.sock`, root access is required if {beatname_uc} is
|
||
running as non-root inside the container.
|
||
=====
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_docker_metadata:
|
||
host: "unix:///var/run/docker.sock"
|
||
#match_fields: ["system.process.cgroup.id"]
|
||
#match_pids: ["process.pid", "process.ppid"]
|
||
#match_source: true
|
||
#match_source_index: 4
|
||
#match_short_id: true
|
||
#cleanup_timeout: 60
|
||
# To connect to Docker over TLS you must specify a client and CA certificate.
|
||
#ssl:
|
||
# certificate_authority: "/etc/pki/root/ca.pem"
|
||
# certificate: "/etc/pki/client/cert.pem"
|
||
# key: "/etc/pki/client/cert.key"
|
||
-------------------------------------------------------------------------------
|
||
|
||
It has the following settings:
|
||
|
||
`host`:: (Optional) Docker socket (UNIX or TCP socket). It uses
|
||
`unix:///var/run/docker.sock` by default.
|
||
|
||
`ssl`:: (Optional) SSL configuration to use when connecting to the Docker
|
||
socket.
|
||
|
||
`match_fields`:: (Optional) A list of fields to match a container ID, at least
|
||
one of them should hold a container ID to get the event enriched.
|
||
|
||
`match_pids`:: (Optional) A list of fields that contain process IDs. If the
|
||
process is running in Docker then the event will be enriched. The default value
|
||
is `["process.pid", "process.ppid"]`.
|
||
|
||
`match_source`:: (Optional) Match container ID from a log path present in the
|
||
`source` field. Enabled by default.
|
||
|
||
`match_short_id`:: (Optional) Match container short ID from a log path present
|
||
in the `source` field. Disabled by default.
|
||
This allows to match directories names that have the first 12 characters
|
||
of the container ID. For example, `/var/log/containers/b7e3460e2b21/*.log`.
|
||
|
||
`match_source_index`:: (Optional) Index in the source path split by `/` to look
|
||
for container ID. It defaults to 4 to match
|
||
`/var/lib/docker/containers/<container_id>/*.log`
|
||
|
||
`cleanup_timeout`:: (Optional) Time of inactivity to consider we can clean and
|
||
forget metadata for a container, 60s by default.
|
||
|
||
[[add-host-metadata]]
|
||
=== Add Host metadata
|
||
|
||
beta[]
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_host_metadata:
|
||
netinfo.enabled: false
|
||
-------------------------------------------------------------------------------
|
||
|
||
It has the following settings:
|
||
|
||
`netinfo.enabled`:: (Optional) Default false. Include IP addresses and MAC addresses as fields host.ip and host.mac
|
||
|
||
The `add_host_metadata` processor annotates each event with relevant metadata from the host machine.
|
||
The fields added to the event are looking as following:
|
||
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
{
|
||
"host":{
|
||
"architecture":"x86_64",
|
||
"name":"example-host",
|
||
"id":"",
|
||
"os":{
|
||
"family":"darwin",
|
||
"build":"16G1212",
|
||
"platform":"darwin",
|
||
"version":"10.12.6"
|
||
},
|
||
"ip": ["192.168.0.1", "10.0.0.1"],
|
||
"mac": ["00:25:96:12:34:56", "72:00:06:ff:79:f1"]
|
||
}
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
NOTE: The host information is refreshed every 5 minutes.
|
||
|
||
[[dissect]]
|
||
=== Dissect strings
|
||
|
||
The dissect processor tokenizes incoming strings using defined patterns.
|
||
|
||
[source,yaml]
|
||
-------
|
||
processors:
|
||
- dissect:
|
||
tokenizer: "%{key1} %{key2}"
|
||
field: "message"
|
||
target_prefix: "dissect"
|
||
-------
|
||
|
||
The `dissect` processor has the following configuration settings:
|
||
|
||
`field`:: (Optional) The event field to tokenize. Default is `message`.
|
||
|
||
`target_prefix`:: (Optional) The name of the field where the values will be extracted. When an empty
|
||
string is defined, the processor will create the keys at the root of the event. Default is
|
||
`dissect`. When the target key already exists in the event, the processor won't replace it and log
|
||
an error; you need to either drop or rename the key before using dissect.
|
||
|
||
For tokenization to be successful, all keys must be found and extracted, if one of them cannot be
|
||
found an error will be logged and no modification is done on the original event.
|
||
|
||
NOTE: A key can contain any characters except reserved suffix or prefix modifiers: `/`,`&`, `+`
|
||
and `?`.
|
||
|
||
See <<conditions>> for a list of supported conditions.
|
||
|
||
[[processor-dns]]
|
||
=== DNS Reverse Lookup
|
||
|
||
The DNS processor performs reverse DNS lookups of IP addresses. It caches the
|
||
responses that it receives in accordance to the time-to-live (TTL) value
|
||
contained in the response. It also caches failures that occur during lookups.
|
||
Each instance of this processor maintains its own independent cache.
|
||
|
||
The processor uses its own DNS resolver to send requests to nameservers and does
|
||
not use the operating system's resolver. It does not read any values contained
|
||
in `/etc/hosts`.
|
||
|
||
This processor can significantly slow down your pipeline's throughput if you
|
||
have a high latency network or slow upstream nameserver. The cache will help
|
||
with performance, but if the addresses being resolved have a high cardinality
|
||
then the cache benefits will be diminished due to the high miss ratio.
|
||
|
||
By way of example, if each DNS lookup takes 2 milliseconds, the maximum
|
||
throughput you can achieve is 500 events per second (1000 milliseconds / 2
|
||
milliseconds). If you have a high cache hit ratio then your throughput can be
|
||
higher.
|
||
|
||
This is a minimal configuration example that resolves the IP addresses contained
|
||
in two fields.
|
||
|
||
[source,yaml]
|
||
----
|
||
processors:
|
||
- dns:
|
||
type: reverse
|
||
fields:
|
||
source.ip: source.hostname
|
||
destination.ip: destination.hostname
|
||
----
|
||
|
||
Next is a configuration example showing all options.
|
||
|
||
[source,yaml]
|
||
----
|
||
processors:
|
||
- dns:
|
||
type: reverse
|
||
action: append
|
||
fields:
|
||
server.ip: server.hostname
|
||
client.ip: client.hostname
|
||
success_cache:
|
||
capacity.initial: 1000
|
||
capacity.max: 10000
|
||
failure_cache:
|
||
capacity.initial: 1000
|
||
capacity.max: 10000
|
||
ttl: 1m
|
||
nameservers: ['192.0.2.1', '203.0.113.1']
|
||
timeout: 500ms
|
||
tag_on_failure: [_dns_reverse_lookup_failed]
|
||
----
|
||
|
||
The `dns` processor has the following configuration settings:
|
||
|
||
`type`:: The type of DNS lookup to perform. The only supported type is
|
||
`reverse` which queries for a PTR record.
|
||
|
||
`action`:: This defines the behavior of the processor when the target field
|
||
already exists in the event. The options are `append` (default) and `replace`.
|
||
|
||
`fields`:: This is a mapping of source field names to target field names. The
|
||
value of the source field will be used in the DNS query and result will be
|
||
written to the target field.
|
||
|
||
`success_cache.capacity.initial`:: The initial number of items that the success
|
||
cache will be allocated to hold. When initialized the processor will allocate
|
||
the memory for this number of items. Default value is `1000`.
|
||
|
||
`success_cache.capacity.max`:: The maximum number of items that the success
|
||
cache can hold. When the maximum capacity is reached a random item is evicted.
|
||
Default value is `10000`.
|
||
|
||
`failure_cache.capacity.initial`:: The initial number of items that the failure
|
||
cache will be allocated to hold. When initialized the processor will allocate
|
||
the memory for this number of items. Default value is `1000`.
|
||
|
||
`failure_cache.capacity.max`:: The maximum number of items that the failure
|
||
cache can hold. When the maximum capacity is reached a random item is evicted.
|
||
Default value is `10000`.
|
||
|
||
`failure_cache.ttl`:: The duration for which failures are cached. Valid time
|
||
units are "ns", "us" (or "µs"), "ms", "s", "m", "h". Default value is `1m`.
|
||
|
||
`nameservers`:: A list of nameservers to query. If there are multiple servers,
|
||
the resolver queries them in the order listed. If none are specified then it
|
||
will read the nameservers listed in `/etc/resolv.conf` once at initialization.
|
||
On Windows you must always supply at least one nameserver.
|
||
|
||
`timeout`:: The duration after which a DNS query will timeout. This is timeout
|
||
for each DNS request so if you have 2 nameservers then the total timeout will be
|
||
2 times this value. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m",
|
||
"h". Default value is `500ms`.
|
||
|
||
`tag_on_failure`:: A list of tags to add to the event when any lookup fails. The
|
||
tags are only added once even if multiple lookups fail. By default no tags are
|
||
added upon failure.
|
||
|
||
[[add-process-metadata]]
|
||
=== Add process metadata
|
||
|
||
The Add process metadata processor enriches events with information from running
|
||
processes, identified by their process ID (PID).
|
||
|
||
[source,yaml]
|
||
-------------------------------------------------------------------------------
|
||
processors:
|
||
- add_process_metadata:
|
||
match_pids: [system.process.ppid]
|
||
target: system.process.parent
|
||
-------------------------------------------------------------------------------
|
||
|
||
The fields added to the event look as follows:
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
"process": {
|
||
"name": "systemd",
|
||
"title": "/usr/lib/systemd/systemd --switched-root --system --deserialize 22",
|
||
"exe": "/usr/lib/systemd/systemd",
|
||
"args": ["/usr/lib/systemd/systemd", "--switched-root", "--system", "--deserialize", "22"],
|
||
"pid": 1,
|
||
"ppid": 0,
|
||
"start_time": "2018-08-22T08:44:50.684Z",
|
||
}
|
||
-------------------------------------------------------------------------------
|
||
|
||
Optionally, the process environment can be included, too:
|
||
[source,json]
|
||
-------------------------------------------------------------------------------
|
||
...
|
||
"env": {
|
||
"HOME": "/",
|
||
"TERM": "linux",
|
||
"BOOT_IMAGE": "/boot/vmlinuz-4.11.8-300.fc26.x86_64",
|
||
"LANG": "en_US.UTF-8",
|
||
}
|
||
...
|
||
-------------------------------------------------------------------------------
|
||
It has the following settings:
|
||
|
||
`match_pids`:: List of fields to lookup for a PID. The processor will
|
||
search the list sequentially until the field is found in the current event, and
|
||
the PID lookup will be applied to the value of this field.
|
||
|
||
`target`:: (Optional) Destination prefix where the `process` object will be
|
||
created. The default is the event's root.
|
||
|
||
`include_fields`:: (Optional) List of fields to add. By default, the processor
|
||
will add all the available fields except `process.env`.
|
||
|
||
`ignore_missing`:: (Optional) When set to `false`, events that don't contain any
|
||
of the fields in match_pids will be discarded and an error will be generated. By
|
||
default, this condition is ignored.
|
||
|
||
`overwrite_keys`:: (Optional) By default, if a target field already exists, it
|
||
will not be overwritten and an error will be logged. If `overwrite_keys` is
|
||
set to `true`, this condition will be ignored.
|
||
|
||
`restricted_fields`:: (Optional) By default, the `process.env` field is not
|
||
output, to avoid leaking sensitive data. If `restricted_fields` is `true`, the
|
||
field will be present in the output.
|