How I keep my file folders tidy with Ansible

I try to use Ansible often, even for tasks that I know how to do with a shell script because I know that Ansible is easy to scale. Even though I might develop an Ansible playbook just for my personal workstation, sometimes it ends up being a lot more useful than intended, and it's easy to apply that same playbook to all the computers on my network. And besides, sometimes the greatest enemy of getting really good at something is the impression that it's only meant for serious professionals, or big projects, or whatever you feel that you're not. I use Ansible because it's a great open source tool, but I benefit from it the most because it scales.

One of the tasks I recently assigned to Ansible was the monumental one of keeping my Downloads folder tidy. If you're like me, you end up downloading many files from the Internet throughout the day and then forget that the files exist. On the one hand, I don't mind this habit. There have been times when I realize I still need a file in my Downloads folder, so forgetting about a file rather than promptly removing it can be helpful. However, there are other files that I download expressly to use once and then ought to remove.

I decided to use a highly specific Ansible task to find files I know I don't need and then remove them.

Ansible boilerplate

Ansible playbooks generally start in exactly the same way: Define your hosts and announce a task:

---
- hosts: localhost
  tasks:

Commit those three lines to memory. They're the "shebang" (#!) of Ansible playbooks. Once you have those lines in a text file, you can start defining the steps in your task.

Finding files with Ansible

You can locate files on a system using the find Ansible module. If an Ansible module is a command, its parameters are its command options. In this example playbook, I want to find files explicitly located in the ~/Downloads folder and I can define that using the paths parameter.

This is my process when I start writing a playbook: I find a module in the Ansible module index that seems likely to do what I need, and then I read through its parameters to find out what kind of control I have over the module.

In my case, the files I accidentally collect in my Downloads folder are CSV files. They get downloaded weekly, processed, and then ought to disappear. But they hang around for weeks until I get overwhelmed and delete them. Here's how to find CSV files in Downloads with Ansible:

---
- hosts: localhost
  tasks:
    - name: Find CSV in Downloads
      find:
        paths: ~/Downloads
        recurse: false
        patterns: '*.csv,*.CSV'
      register: result

The paths parameter tells Ansible where to search for files.

The recurse: false parameter forbids Ansible from searching in subdirectories of Downloads. This gives me the ability to retain CSV files that I've downloaded and saved into a subdirectory. Ansible only targets the CSV files I save straight to Downloads (which is my habit).

The patterns parameter tells Ansible what to count as a match. All of the CSV files I download end in .csv, but I'm confident that I'm willing to remove .CSV (in all capital letters) as well.

The finishing touch to this step is to invoke the register module, which saves the results of the find process into a variable called result.

This is important because I want Ansible to perform a second action on the results of find, so those results need to be stored somewhere for the next step.

Removing files with Ansible

The next step in the task is to remove the files that find has uncovered. The module used to remove files is the file module.

This step relies entirely on the find step, so it uses several variables:

    - name: Remove CSV files
      file:
        path: "{{ item.path }}"
        state: absent
      with_items: "{{ result.files }}"

The path parameter uses the built-in "{{ item.path }}" variable, which confusingly isn't actually defined yet. The variable has no information on the path until the file module is used in a loop by the with_items keyword. The with_items step uses the contents of the result variable to extract one filename at a time, which becomes the item for the path parameter. Once the current item's path is extracted, Ansible uses the state: absent rule to ensure that the file located at that path is not left on the system (in other words, it's deleted.)

This is a very dangerous step, especially during testing. If you get this step wrong, you can easily remove files you don't intend to delete.

Verify the playbook

Ansible playbooks are written in YAML, which has a strict syntax. Verify that your YAML is correct using the yamllint command:

$ yamllint cleanup.yaml
$

No results means no errors. This playbook must have been written by someone who really knows and loves YAML!

Testing Ansible plays safely

To avoid deleting my entire home directory by accident, I ran my first attempt with the --check option. This ensures that Ansible doesn't actually make changes to your system.

$ ansible-playbook --check example.yaml
[WARNING]: provided hosts list is empty, only localhost is available.
'all'

PLAY [localhost] ****************************************************

TASK [Gathering Facts] **********************************************
ok: [localhost]

TASK [Find CSV files in Downloads] **********************************
ok: [localhost]

TASK [Remove CSV files] *********************************************
changed: [localhost] => (item={'path': '/home/tux/Downloads/foo.csv', [...]
changed: [localhost] => (item={'path': '/home/tux/Downloads/bar.csv', [...]
changed: [localhost] => (item={'path': '/home/tux/Downloads/baz.csv', [...]

PLAY RECAP **********************************************************
localhost                  : ok=3    changed=1    unreachable=0 [...]

The output is very verbose, but it shows that my playbook is correct: Only CSV files within Downloads have been marked for removal.

Running Ansible playbooks

To run an Ansible playbook, you use the ansible-playbook command:

$ ansible-playbook example.yaml

Confirm the results:

$ ls *.csv  ~/Downloads/
ls: cannot access '*.csv': No such file or directory
/home/tux/Downloads/:
file.txt

Schedule the Ansible playbook

The Ansible playbook has been confirmed, but I want it to run at least every week. I use Anacron rather than Cron, so I created an Anacron job to run weekly:

$ cat << EOF >> ~/.local/etc/cron.weekly/cleanup
#!/bin/sh
ansible-playbook $HOME/Ansible/cleanup.yaml
EOF
$ chmod +x ~/.local/etc/cron.daily/cleanup

What can you do with Ansible?

Generally, Ansible is meant as a system maintenance tool. It's finely tuned to bootstrap complex systems to help with course correction when something's gone wrong and to keep a system in a specific state. I've used it for simple but repetitive tasks, like setting up a complex directory tree that would typically require several commands or clicks. I've also used it for tasks I don't want to do wrong, like removing old files from directories. I've also used it for tasks that are just too complex for me to bother trying to remember, like synchronizing several changes made to a production system with its redundant backup system.

I don't use this cleanup script on my servers because I don't download CSV files every week on my servers, but I do use a variation of it. Ansible isn't a replacement for shell or Python scripting, but for some tasks, it's a very precise method to perform some set of tasks that you might want to run on many more systems.