youtubebeat/vendor/github.com/elastic/beats/filebeat/docs/inputs/input-common-harvester-options.asciidoc

//////////////////////////////////////////////////////////////////////////
//// This content is shared by Filebeat inputs that use the prospector
//// but do not process files (the options for managing files
//// on disk are not relevant)
//// If you add IDs to sections, make sure you use attributes to create
//// unique IDs for each input that includes this file. Use the format:
//// [id="{beatname_lc}-input-{type}-option-name"]
//////////////////////////////////////////////////////////////////////////

[float]
===== `encoding`

The file encoding to use for reading data that contains international
characters. See the encoding names http://www.w3.org/TR/encoding/[recommended by
the W3C for use in HTML5].

Here are some sample encodings from W3C recommendation:

    * plain, latin1, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030,
    gbk, hz-gb-2312,
    * euc-kr, euc-jp, iso-2022-jp, shift-jis, and so on

The `plain` encoding is special, because it does not validate or transform any input.

[float]
[id="{beatname_lc}-input-{type}-exclude-lines"]
===== `exclude_lines`

A list of regular expressions to match the lines that you want {beatname_uc} to
exclude. {beatname_uc} drops any lines that match a regular expression in the
list. By default, no lines are dropped. Empty lines are ignored.

If <<multiline,multiline>> settings are also specified, each multiline message
is combined into a single line before the lines are filtered by `exclude_lines`.

The following example configures {beatname_uc} to drop any lines that start with
`DBG`.

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: {type}
  ...
  exclude_lines: ['^DBG']
----

See <<regexp-support>> for a list of supported regexp patterns.

[float]
[id="{beatname_lc}-input-{type}-include-lines"]
===== `include_lines`

A list of regular expressions to match the lines that you want {beatname_uc} to
include. {beatname_uc} exports only the lines that match a regular expression in
the list. By default, all lines are exported. Empty lines are ignored.

If <<multiline,multiline>> settings also specified, each multiline message is
combined into a single line before the lines are filtered by `include_lines`.

The following example configures {beatname_uc} to export any lines that start
with `ERR` or `WARN`:

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: {type}
  ...
  include_lines: ['^ERR', '^WARN']
----

NOTE: If both `include_lines` and `exclude_lines` are defined, {beatname_uc}
executes `include_lines` first and then executes `exclude_lines`. The order in
which the two options are defined doesn't matter. The `include_lines` option
will always be executed before the `exclude_lines` option, even if
`exclude_lines` appears before `include_lines` in the config file.

The following example exports all log lines that contain `sometext`,
except for lines that begin with `DBG` (debug messages):

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: {type}
  ...
  include_lines: ['sometext']
  exclude_lines: ['^DBG']
----

See <<regexp-support>> for a list of supported regexp patterns.

[float]
===== `harvester_buffer_size`

The size in bytes of the buffer that each harvester uses when fetching a file.
The default is 16384.

[float]
===== `max_bytes`

The maximum number of bytes that a single log message can have. All bytes after
`max_bytes` are discarded and not sent. This setting is especially useful for
multiline log messages, which can get large. The default is 10MB (10485760).

[float]
[id="{beatname_lc}-input-{type}-config-json"]
===== `json`
These options make it possible for {beatname_uc} to decode logs structured as
JSON messages. {beatname_uc} processes the logs line by line, so the JSON
decoding only works if there is one JSON object per line.

The decoding happens before line filtering and multiline. You can combine JSON
decoding with filtering and multiline if you set the `message_key` option. This
can be helpful in situations where the application logs are wrapped in JSON
objects, as with like it happens for example with Docker.

Example configuration:

[source,yaml]
----
json.keys_under_root: true
json.add_error_key: true
json.message_key: log
----

You must specify at least one of the following settings to enable JSON parsing
mode:

*`keys_under_root`*:: By default, the decoded JSON is placed under a "json" key
in the output document. If you enable this setting, the keys are copied top
level in the output document. The default is false.

*`overwrite_keys`*:: If `keys_under_root` and this setting are enabled, then the
values from the decoded JSON object overwrite the fields that {beatname_uc}
normally adds (type, source, offset, etc.) in case of conflicts.

*`add_error_key`*:: If this setting is enabled, {beatname_uc} adds a
"error.message" and "error.type: json" key in case of JSON unmarshalling errors
or when a `message_key` is defined in the configuration but cannot be used.

*`message_key`*:: An optional configuration setting that specifies a JSON key on
which to apply the line filtering and multiline settings. If specified the key
must be at the top level in the JSON object and the value associated with the
key must be a string, otherwise no filtering or multiline aggregation will
occur.

*`ignore_decoding_error`*:: An optional configuration setting that specifies if
JSON decoding errors should be logged or not. If set to true, errors will not
be logged. The default is false.

[float]
===== `multiline`

Options that control how {beatname_uc} deals with log messages that span
multiple lines. See <<multiline-examples>> for more information about
configuring multiline options.