[[filebeat-modules-devguide]]
== Creating a New Filebeat Module

This guide will walk you through creating a new Filebeat module.

All Filebeat modules currently live in the main
https://github.com/elastic/beats[Beats] repository. To clone the repository and
build Filebeat (which you will need for testing), please follow the general
instructions in <<beats-contributing>>.

[float]
=== Overview

Each Filebeat module is composed of one or more "filesets". We usually create a
module for each service that we support (`nginx` for Nginx, `mysql` for Mysql,
and so on) and a fileset for each type of log that the service creates. For
example, the Nginx module has `access` and `error` filesets. You can contribute
a new module (with at least one fileset), or a new fileset for an existing
module.

NOTE: In this guide we use `{module}` and `{fileset}` as placeholders for the
module and fileset names. You need to replace these with the actual names you
entered when your created the module and fileset. Only use characters `[a-z]` and, if required, underscores (`_`). No other characters are allowed.

[float]
=== Creating a new module

Run the following command in the `filebeat` folder:

[source,bash]
----
make create-module MODULE={module}
----

After running the `make create-module` command, you'll find the module,
along with its generated files, under `module/{module}`. This
directory contains the following files:

[source,bash]
----
module/{module}
├── module.yml
└── _meta
    └── docs.asciidoc
    └── fields.yml
    └── kibana
----

Let's look at these files one by one.

[float]
==== module.yml

This file contains list of all the dashboards available for the module and used by `export_dashboards.go` script for exporting dashboards.
Each dashboard is defined by an id and the name of json file where the dashboard is saved locally.
At generation new fileset this file will be automatically updated with "default" dashboard settings for new fileset.
Please ensure that this settings are correct.

[float]
==== _meta/docs.asciidoc

This file contains module-specific documentation. You should include information
about which versions of the service were tested and the variables that are
defined in each fileset.

[float]
==== _meta/fields.yml

The module level `fields.yml` contains descriptions for the module-level fields.
Please review and update the title and the descriptions in this file. The title
is used as a title in the docs, so it's best to capitalize it.

[float]
==== _meta/kibana

This folder contains the sample Kibana dashboards for this module. To create
them, you can build them visually in Kibana and then export them with `export_dashboards`.

The tool will export all of the dashboard dependencies (visualizations,
saved searches) automatically.

You can see various ways of using `export_dashboards` at <<export-dashboards>>.
The recommended way to export them is to list your dashboards in your module's
`module.yml` file:

[source,yaml]
----
dashboards:
- id: 69f5ae20-eb02-11e7-8f04-beef1daadb05
  file: mymodule-overview.json
- id: c0a7ce90-cafe-4242-8647-534bb4c21040
  file: mymodule-errors.json
----

Then run `export_dashboards` like this:

[source,shell]
----
$ cd dev-tools/cmd/dashboards
$ make # if export_dashboard is not built yet
$ ./export_dashboards -yml '../../../filebeat/module/{module}/module.yml'
----

New Filebeat modules might not be compatible with Kibana 5.x. To export dashboards
that are compatible with 5.x, run the following command inside the developer virtualenv:

[source,shell]
----
$ cd filebeat
$ make python-env
$ cd module/{module}/
$ python ../../../dev-tools/export_5x_dashboards.py --regex {module} --dir _meta/kibana/5.x
----

Where the `--regex` parameter should match the dashboard you want to export.

Please note that dashboards exported from Kibana 5.x are not compatible with Kibana 6.x.

You can find more details about the process of creating and exporting the Kibana
dashboards by reading {beatsdevguide}/new-dashboards.html[this guide].

[float]
=== Creating a new fileset

Run the following command in the `filebeat` folder:

[source,bash]
----
make create-fileset MODULE={module} FILESET={fileset}
----

After running the `make create-fileset` command, you'll find the fileset,
along with its generated files, under `module/{module}/{fileset}`. This
directory contains the following files:

[source,bash]
----
module/{module}/{fileset}
├── manifest.yml
├── config
│   └── {fileset}.yml
├── ingest
│   └── pipeline.json
├── _meta
│   └── fields.yml
│   └── kibana
│       └── default
└── test
----

Let's look at these files one by one.

[float]
==== manifest.yml

The `manifest.yml` is the control file for the module, where variables are
defined and the other files are referenced. It is a YAML file, but in many
places in the file, you can use built-in or defined variables by using the
`{{.variable}}` syntax.

The `var` section of the file defines the fileset variables and their default
values. The module variables can be referenced in other configuration files,
and their value can be overridden at runtime by the Filebeat configuration.

As the fileset creator, you can use any names for the variables you define. Each
variable must have a default value. So in it's simplest form, this is how you
can define a new variable:

[source,yaml]
----
var:
  - name: pipeline
    default: with_plugins
----

Most fileset should have a `paths` variable defined, which sets the default
paths where the log files are located:

[source,yaml]
----
var:
  - name: paths
    default:
      - /example/test.log*
    os.darwin:
      - /usr/local/example/test.log*
      - /example/test.log*
    os.windows:
      - c:/programdata/example/logs/test.log*
----

There's quite a lot going on in this file, so let's break it down:

* The name of the variable is `paths` and the default value is an array with one
  element: `"/example/test.log*"`.
* Note that variable values don't have to be strings.
  They can be also numbers, objects, or as shown in this example, arrays.
* We will use the `paths` variable to set the input `paths`
  setting, so "glob" values can be used here.
* Besides the `default` value, the file defines values for particular
  operating systems: a default for darwin/OS X/macOS systems and a default for
  Windows systems. These are introduced via the `os.darwin` and `os.windows`
  keywords. The values under these keys become the default for the variable, if
  Filebeat is executed on the respective OS.

Besides the variable definition, the `manifest.yml` file also contains
references to the ingest pipeline and input configuration to use (see next
sections):

[source,yaml]
----
ingest_pipeline: ingest/pipeline.json
input: config/testfileset.yml
----

These should point to the respective files from the fileset.

Note that when evaluating the contents of these files, the variables are
expanded, which enables you to select one file or the other depending on the
value of a variable. For example:

[source,yaml]
----
ingest_pipeline: ingest/{{.pipeline}}.json
----

This example selects the ingest pipeline file based on the value of the
`pipeline` variable. For the `pipeline` variable shown earlier, the path would
resolve to `ingest/with_plugins.json` (assuming the variable value isn't
overridden at runtime.)

[float]
==== config/*.yml

The `config/` folder contains template files that generate Filebeat input
configurations. The Filebeat inputs are primarily responsible for tailing
files, filtering, and multi-line stitching, so that's what you configure in the
template files.

A typical example looks like this:

[source,yaml]
----
type: log
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$"]
----

You'll find this example in the template file that gets generated automatically
when you run `make create-fileset`. In this example, the `paths` variable is
used to construct the `paths` list for the input `paths` option.

Any template files that you add to the `config/` folder need to generate a valid
Filebeat input configuration in YAML format. The options accepted by the
input configuration are documented in the
{filebeat}/configuration-filebeat-options.html[Filebeat Inputs] section of
the Filebeat documentation.

The template files use the templating language defined by the
https://golang.org/pkg/text/template/[Go standard library].

Here is another example that also configures multiline stitching:

[source,yaml]
----
type: log
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$"]
multiline:
  pattern: "^# User@Host: "
  negate: true
  match: after
----

Although you can add multiple configuration files under the `config/` folder,
only the file indicated by the `manifest.yml` file will be loaded. You can use
variables to dynamically switch between configurations.

[float]
==== ingest/*.json

The `ingest/` folder contains Elasticsearch
{elasticsearch}/ingest.html[Ingest Node] pipeline configurations. The Ingest
Node pipelines are responsible for parsing the log lines and doing other
manipulations on the data.

The files in this folder are JSON documents representing
{elasticsearch}/pipeline.html[pipeline definitions]. Just like with the `config/`
folder, you can define multiple pipelines, but a single one is loaded at runtime
based on the information from `manifest.yml`.

The generator creates a JSON object similar to this one:

[source,json]
----
{
  "description": "Pipeline for parsing {module} {fileset} logs",
  "processors": [
    ],
  "on_failure" : [{
    "set" : {
      "field" : "error.message",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  }]
}
----

From here, you would typically add processors to the `processors` array to do
the actual parsing. For details on how to use ingest node processors, see the
{elasticsearch}/ingest-processors.html[ingest node documentation]. In
particular, you will likely find the
{elasticsearch}/grok-processor.html[Grok processor] to be useful for parsing.
Here is an example for parsing the Nginx access logs.

[source,json]
----
{
  "grok": {
    "field": "message",
    "patterns":[
      "%{IPORHOST:nginx.access.remote_ip} - %{DATA:nginx.access.user_name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{WORD:nginx.access.method} %{DATA:nginx.access.url} HTTP/%{NUMBER:nginx.access.http_version}\" %{NUMBER:nginx.access.response_code} %{NUMBER:nginx.access.body_sent.bytes} \"%{DATA:nginx.access.referrer}\" \"%{DATA:nginx.access.agent}\""
      ],
    "ignore_missing": true
  }
}
----

Note that you should follow the convention of naming of fields prefixed with the
module and fileset name: `{module}.{fileset}.field`, e.g.
`nginx.access.remote_ip`. Also, please review our <<event-conventions>>.

While developing the pipeline definition, we recommend making use of the
{elasticsearch}/simulate-pipeline-api.html[Simulate Pipeline API] for testing
and quick iteration.

By default Filebeat does not update Ingest pipelines if already loaded. If you
want to force updating your pipeline during development, use
`./filebeat setup --pipelines` command. This uploads pipelines even if they
are already available on the node.

[float]
==== _meta/fields.yml

The `fields.yml` file contains the top-level structure for the fields in your
fileset. It is used as the source of truth for:

* the generated Elasticsearch mapping template
* the generated Kibana index pattern
* the generated documentation for the exported fields

Besides the `fields.yml` file in the fileset, there is also a `fields.yml` file
at the module level, placed under `module/{module}/_meta/fields.yml`, which
should contain the fields defined at the module level, and the description of
the module itself. In most cases, you should add the fields at the fileset
level.

After `pipeline.json` is created, it is possible to generate a base `field.yml`.

[source,bash]
----
make create-fields MODULE={module} FILESET={fileset}
----

Please, always check the generated file and make sure the fields are correct.
Documenatation of fields must be added manually.

If the fields are correct, it is time to generate documentation, configuration
and Kibana index patterns.

[source,bash]
----
make update
----

[float]
==== test

In the `test/` directory, you should place sample log files generated by the
service. We have integration tests, automatically executed by CI, that will run
Filebeat on each of the log files under the `test/` folder and check that there
are no parsing errors and that all fields are documented.

In addition, assuming you have a `test.log` file, you can add a
`test.log-expected.json` file in the same directory that contains the expected
documents as they are found via an Elasticsearch search. In this case, the
integration tests will automatically check that the result is the same on each
run.