Query your Linux operating system like a database

Image by:

Lewis Cowles, CC BY-SA 4.0

Linux offers a lot of commands to help users gather information about their host operating system: listing files or directories to check attributes; querying to see what packages are installed, processes are running, and services start at boot; or learning about the system's hardware.

Each command uses its own output format to list this information. You need to use tools like grep, sed, and awk to filter the results to find specific information. Also, a lot of this information changes frequently, leading to changes in the system's state.

It would be helpful to view all of this information formatted like the output of a database SQL query. Imagine that you could query the output of the ps and rpm commands as if you were querying an SQL database table with similar names.

Fortunately, there is a tool that does just that and much more: Osquery is an open source "SQL powered operating system instrumentation, monitoring, and analytics framework."

Many applications that handle security, DevOps, compliance, and inventory management (to name a few) depend upon the core functionalities provided by Osquery at their heart.

Install Osquery

Osquery is available for Linux, macOS, Windows, and FreeBSD. Install the latest version for your operating system by following its installation instructions. (I'll use version 4.7.0 in these examples.)

After installation, verify it's working:

$ rpm -qa | grep osquery
osquery-4.7.0-1.linux.x86_64
$ 
$ osqueryi --version
osqueryi version 4.7.0
$

Osquery components

Osquery has two main components:

osqueri is an interactive SQL query console. It is a standalone utility that does not need super-user privileges (unless you are querying tables that need that level of access).
osqueryd is like a monitoring daemon for the host it is installed on. This daemon can schedule queries to execute at regular intervals to gather information from the infrastructure.

You can run the osqueri utility without having the osqueryd daemon running. Another utility, osqueryctl, controls starting, stopping, and checking the status of the daemon.

$ rpm -ql osquery-4.8.0-1.linux.x86_64 | grep bin
/usr/bin/osqueryctl
/usr/bin/osqueryd
/usr/bin/osqueryi
$

Use the osqueryi interactive prompt

You interact with Osquery much like you would use an SQL database. In fact, osqueryi is a modified version of the SQLite shell. Running the osqueryi command drops you into an interactive shell where you can run commands specific to Osquery, which often start with a .:

$ osqueryi 
Using a virtual database. Need help, type '.help'
osquery>

To quit the interactive shell, run the .quit command to get back to the operating system's shell:

osquery> 
osquery> .quit
$

Find out what tables are available

As mentioned, Osquery makes data available as the output of SQL queries. Information in databases is often saved in tables. But how can you query these tables if you don't know their names? Well, you can run the .tables command to list all the tables that you can query. If you are a long-time Linux user or a sysadmin, the table names will be familiar, as you have been using operating system commands to get this information:

osquery> .tables 
  => acpi_tables
  => apparmor_events
  => apparmor_profiles
  => apt_sources

<< snip >>

  => arp_cache
  => user_ssh_keys
  => users
  => yara
  => yara_events
  => ycloud_instance_metadata
  => yum_sources
osquery>

Check the schema for individual tables

Now that you know the table names, you can see what information each table provides. As an example, choose processes, since the ps command is used quite often to get this information. Run the .schema command followed by the table name to see what information is saved in this table. If you want to check the results, you could quickly run ps -ef or ps aux and compare the output with the contents of the table:

osquery> .schema processes
CREATE TABLE processes(`pid` BIGINT, `name` TEXT, `path` TEXT, `cmdline` TEXT, `state` TEXT, `cwd` TEXT, `root` TEXT, `uid` BIGINT, `gid` BIGINT, `euid` BIGINT, `egid` BIGINT, `suid` BIGINT, `sgid` BIGINT, `on_disk` INTEGER, `wired_size` BIGINT, `resident_size` BIGINT, `total_size` BIGINT, `user_time` BIGINT, `system_time` BIGINT, `disk_bytes_read` BIGINT, `disk_bytes_written` BIGINT, `start_time` BIGINT, `parent` BIGINT, `pgroup` BIGINT, `threads` INTEGER, `nice` INTEGER, `is_elevated_token` INTEGER HIDDEN, `elapsed_time` BIGINT HIDDEN, `handle_count` BIGINT HIDDEN, `percent_processor_time` BIGINT HIDDEN, `upid` BIGINT HIDDEN, `uppid` BIGINT HIDDEN, `cpu_type` INTEGER HIDDEN, `cpu_subtype` INTEGER HIDDEN, `phys_footprint` BIGINT HIDDEN, PRIMARY KEY (`pid`)) WITHOUT ROWID;
osquery>

To drive home the point, use the following command to see the schema for the RPM packages and compare the information with rpm -qa and rpm -qi operating system commands:

osquery> 
osquery> .schema rpm_packages
CREATE TABLE rpm_packages(`name` TEXT, `version` TEXT, `release` TEXT, `source` TEXT, `size` BIGINT, `sha1` TEXT, `arch` TEXT, `epoch` INTEGER, `install_time` INTEGER, `vendor` TEXT, `package_group` TEXT, `pid_with_namespace` INTEGER HIDDEN, `mount_namespace_id` TEXT HIDDEN, PRIMARY KEY (`name`, `version`, `release`, `arch`, `epoch`, `pid_with_namespace`)) WITHOUT ROWID;
osquery>

You learn more in Osquery's tables documentation.

Use the PRAGMA command

In case that schema information is too cryptic for you, there is another way to print the table information in a verbose, tabular format: the PRAGMA command. For example, I'll use PRAGMA to see information for the rpm_packages table in a nice format:

osquery> PRAGMA table_info(rpm_packages);

One benefit of this tabular information is that you can focus on the field you want to query and see the type of information that it provides:

osquery> PRAGMA table_info(users);
+-----+-------------+--------+---------+------------+----+
| cid | name        | type   | notnull | dflt_value | pk |
+-----+-------------+--------+---------+------------+----+
| 0   | uid         | BIGINT | 1       |            | 1  |
| 1   | gid         | BIGINT | 0       |            | 0  |
| 2   | uid_signed  | BIGINT | 0       |            | 0  |
| 3   | gid_signed  | BIGINT | 0       |            | 0  |
| 4   | username    | TEXT   | 1       |            | 2  |
| 5   | description | TEXT   | 0       |            | 0  |
| 6   | directory   | TEXT   | 0       |            | 0  |
| 7   | shell       | TEXT   | 0       |            | 0  |
| 8   | uuid        | TEXT   | 1       |            | 3  |
+-----+-------------+--------+---------+------------+----+
osquery>

Run your first query

Now that you have all the required information from the table, the schema, and the items to query, run your first SQL query to view the information. The query below returns the users that are present on the system and each one's user ID, group ID, home directory, and default shell. Linux users could get this information by viewing the contents of the /etc/passwd file and doing some grep, sed, and awk magic.

osquery> 
osquery> select uid,gid,directory,shell,uuid FROM users LIMIT 7;
+-----+-----+----------------+----------------+------+
| uid | gid | directory      | shell          | uuid |
+-----+-----+----------------+----------------+------+
| 0   | 0   | /root          | /bin/bash      |      |
| 1   | 1   | /bin           | /sbin/nologin  |      |
| 2   | 2   | /sbin          | /sbin/nologin  |      |
| 3   | 4   | /var/adm       | /sbin/nologin  |      |
| 4   | 7   | /var/spool/lpd | /sbin/nologin  |      |
| 5   | 0   | /sbin          | /bin/sync      |      |
| 6   | 0   | /sbin          | /sbin/shutdown |      |
+-----+-----+----------------+----------------+------+
osquery>

Run queries without entering interactive mode

What if you want to run a query without entering the osqueri interactive mode? This could be very useful if you are writing shell scripts around it. In this case, you could echo the SQL query and pipe it to osqueri right from the Bash shell:

$ echo "select uid,gid,directory,shell,uuid FROM users LIMIT 7;" | osqueryi
+-----+-----+----------------+----------------+------+
| uid | gid | directory      | shell          | uuid |
+-----+-----+----------------+----------------+------+
| 0   | 0   | /root          | /bin/bash      |      |
| 1   | 1   | /bin           | /sbin/nologin  |      |
| 2   | 2   | /sbin          | /sbin/nologin  |      |
| 3   | 4   | /var/adm       | /sbin/nologin  |      |
| 4   | 7   | /var/spool/lpd | /sbin/nologin  |      |
| 5   | 0   | /sbin          | /bin/sync      |      |
| 6   | 0   | /sbin          | /sbin/shutdown |      |
+-----+-----+----------------+----------------+------+
$

Learn what services start when booting up

Osquery can also return all the services set to start at boot. For example, to query the startup_items table and get the name, status, and path of the first five services that run at startup:

osquery> SELECT name,type,status,path FROM startup_items LIMIT 5;
  name = README
  type = Startup Item
status = enabled
  path = /etc/rc.d/init.d/README

  name = anamon
  type = Startup Item
status = enabled
  path = /etc/rc.d/init.d/anamon

  name = functions
  type = Startup Item
status = enabled
  path = /etc/rc.d/init.d/functions

  name = osqueryd
  type = Startup Item
status = enabled
  path = /etc/rc.d/init.d/osqueryd

  name = AT-SPI D-Bus Bus
  type = Startup Item
status = enabled
  path = /usr/libexec/at-spi-bus-launcher --launch-immediately
osquery>

Look up ELF information for a binary

Imagine you want to find out more details about the ls binary. Usually, you would do it with the readelf -h command followed by the ls command's path. You can query the elf_info table with Osquery and get the same information:

osquery> SELECT * FROM elf_info WHERE path="/bin/ls";
      class = 64
        abi = sysv
abi_version = 0
       type = dyn
    machine = 62
    version = 1
      entry = 24064
      flags = 0
       path = /bin/ls
osquery>

Now you have a taste of how to use osqueri to look for information of interest to you. However, this information is stored on a huge number of tables; one system I queried had 156 different tables, which can be overwhelming:

$ echo ".tables" | osqueryi | wc -l
156
$

To make things easier, you can start with these tables to get information about your Linux system:

System information table

osquery> select * from system_info;

System limit information

osquery> select * from ulimit_info;

Files opened by various processes

osquery> select * from process_open_files;

Open ports on a system

osquery> select * from listening_ports;

Running processes information

osquery> select * from processes;

Installed packages information

osquery> select * from rpm_packages;

User login information

osquery> select * from last;

System log information

osquery> select * from syslog_events;

Learn more

Osquery is a powerful tool that provides a lot of host information that can be used to solve various use cases. You can learn more about Osquery by reading its documentation.