A universal truth and recurring theme in the DevOps world is automation. From providing infrastructure to testing code to deploying to production, many parts of the DevOps lifecycle get automated already. One popular technology for managing infrastructure and configuration in an automated way is Ansible, but are we fully utilizing its capabilities yet?
This presentation will give a broad overview of Ansible and its architecture and use-cases, before exploring a relatively new feature, Event-driven Ansible (EDA). Analzying applications of event-driven Ansible, participants will see that automated management is nice, but automatic management is awesome, not just regarding DevOps principles, but also in terms of reaction times, the human tendency for minor mistakes, and toil for operators.
Participants will get first-hand insights into Ansible, its strengths, weaknesses, and the potential of event-driven automation within the DevOps world.
The setup is done with Ansible, too. It will install **Ansible, EDA, Prometheus**, and **Alertmanager** on a VM to demonstrate some of the capabilities of EDA.
<summary>Ansible from the CLI via ansible</summary>
#### Ansible from the CLI via `ansible`
The first example installs a webserver on all hosts in the `webservers` group. The installed webserver is defined as a **host variable** in the inventory file `hosts.yml` (*see above*).
```console
ansible \
webservers \
-m package \
-a 'name="{{ webserver }}"' \
--one-line
```
Afterwards, we can start the webserver on all hosts in the `webservers` group.
```console
ansible \
webservers \
-m service \
-a 'name="{{ webserver }}" state=started' \
--one-line
```
Go on and check if the web servers are running on the respective hosts.
> Ansible is **idempotent** - try running the commands again and see how the output differs.
</details>
<details>
<summary>Ansible from the CLI via ansible-playbook</summary>
#### Ansible from the CLI via `ansible-playbook`
The second example utilizes the following **playbook** to **gather** and **display information** for all hosts in the `webservers` group, utilizing the **example** role from the lab repository.
<summary>Receive Generic Events via Webhook</summary>
#### Receive Generic Events via Webhook
If you followed the setup instructions for the EDA lab, you should already have a running EDA instance on the `eda-controller.example.com` VM.
If you navigate to `/etc/edacontroller/rulebook.yml` on the VM, you'll see the following rulebook:
```yaml
---
- name: Listen to webhook events
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5000
rules:
- name: Debug event output
condition: 1 == 1
action:
debug:
msg: "{{ event }}"
- name: Listen to Alertmanager alerts
hosts: all
sources:
- ansible.eda.alertmanager:
host: 0.0.0.0
port: 9000
data_alerts_path: alerts
data_host_path: labels.instance
data_path_separator: .
rules:
- name: Restart MySQL server
condition: event.alert.labels.alertname == 'MySQL not running' and event.alert.status == 'firing'
action:
run_module:
name: ansible.builtin.service
module_args:
name: mysql
state: restarted
- name: Debug event output
condition: 1 == 1
action:
debug:
msg: "{{ event }}"
```
For this part of the lab, the **first rule** is the one we're interested in: It listens to a generic webhook on port `5000` and prints the event's **metadata** to its logs.
To test this, we can use the `curl` command to send a `POST` request to the webhook `/endpoint` from the VM itself:
```console
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{"foo": "bar"}' \
http://localhost:5000/endpoint
```
If you now check the logs of the EDA controller, you should see the following output:
A rule that always evaluates to `true` is not very useful, so let's change the rule to only print the the value of `foo` if the `foo` key is present in the event's payload, and `no foo :(` otherwise:
```yaml
---
- name: Listen to webhook events
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5000
rules:
- name: Foo
condition: event.payload.foo is defined
action:
debug:
msg: "{{ event.payload.foo }}"
- name: No foo
condition: 1 == 1
action:
debug:
msg: "no foo :("
```
Send the same `curl` request again and check the logs, you should see a line saying `bar` now.
Let's also try a `curl` request with a different payload:
```console
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{"bar": "baz"}' \
http://localhost:5000/endpoint
```
This time, the output should be `no foo :(`.
</details>
<details>
<summary>Restarting Services Automatically with EDA</summary>
#### Restarting Services Automatically with EDA
The last lab is more of a demo - it shows how you can use EDA to automatically react on events observed by **Prometheus** and **Alertmanager**.
For this demo, the second **ruleset** in our rulebook is the one we're interested in:
```yaml
- name: Listen to Alertmanager alerts
hosts: all
sources:
- ansible.eda.alertmanager:
host: 0.0.0.0
port: 9000
data_alerts_path: alerts
data_host_path: labels.instance
data_path_separator: .
rules:
- name: Restart MySQL server
condition: event.alert.labels.alertname == 'MySQL not running' and event.alert.status == 'firing'
With this rule, we can restart our MySQL server if it's not running! But how do we get the event to trigger? With **Prometheus** and **Alertmanager**!
When you ran the setup playbook, it installed **Prometheus** and **Alertmanager** on the `eda-controller.example.com` VM. You can access the **Prometheus** UI at `http://<eda-controller-ip>:9090` and the **Alertmanager** UI at `http://<eda-controller-ip>:9093`.
It also installed a **Prometheus exporter** for the **MySQL** database that runs on the server.
With this setup, we can now shut down our MySQL server and see what happens - make sure to watch the output of the EDA controller's logs:
```console
systemctl stop mysql
journalctl -fu edacontroller
```
Within 30-90 seconds, you should see EDA running our **playbook** and restarting the MySQL server. You can track that process by watching the Prometheus/Alertmanager UIs for firing alerts.
Once you see the playbook being executed in the logs, you can check the MySQL state once more: