Borrowing from the Red Hat Developer Blog entry, here’s an introduction the “fedora-tools” image for Fedora Atomic Host.

When Red Hat’s performance team first started experimenting with Atomic, it became clear that our needs for low-level debug capabilities were at odds with the stated goal of Atomic to maintain a very small footprint. If you consider your current production environment, most standard builds do not include full debug capabilities, so this is nothing new. What is new, is that on Red Hat Enterprise Linux (RHEL) you could easily install any debug/tracing/analysis utility, but on Atomic:

-bash-4.2# dnf bash: dnf: command not found

Whoops! What’s this now??? If you haven’t played with Fedora Atomic yet, keep the first rule of Atomic in mind:

You don’t install software on Atomic. You build containers on RHEL, CentOS, or Fedora, then run them on Atomic. Sys admin tools are no exception.

We always knew we needed an equivalent for Fedora, and we’re happy to announce today the availability of the fedora-tools image.

How Do I Use This Thing?

Here’s a short video that shows how to use the tools container to do common root system administrator tasks:

  • sosreport
  • Snooping bridge traffic
  • System service container (sadc/sar)
  • Using the perf profiling tool

Real-World Usage in the Field

One capability that we love having in the tools container is the GNU Debugger (GDB). GDB is an interactive debugger used to troubleshoot application crashes by analyzing process core dumps.

Here’s a quick demo of how to analyze userspace core dumps on Atomic, using the tools container.

-bash-4.3# atomic host status
  TIMESTAMP (UTC)         VERSION    ID             OSNAME            REFSPEC
* 2015-09-01 22:50:33     22.102     132444ceed     fedora-atomic     fedora-atomic:fedora-atomic/f22/x86_64/docker-host
 2015-08-26 17:38:55     22.98      8b40b4d962     fedora-atomic     fedora-atomic:fedora-atomic/f22/x86_64/docker-host

Download the Fedora and new Fedora Tools images:

-bash-4.3# docker pull fedora
latest: Pulling from docker.io/fedora
...
Status: Downloaded newer image for docker.io/fedora:latest

-bash-4.3# docker pull fedora/tools
latest: Pulling from docker.io/fedora/tools
...
Status: Downloaded newer image for docker.io/fedora/tools:latest

Launch a tools container:

-bash-4.3# atomic run --name tools fedora/tools
docker run -it --name tools --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=tools -e IMAGE=fedora/tools -v /run:/run -v /var/log:/var/log -v /etc/localtime:/etc/localtime -v /:/host fedora/tools
[root@f22-atomic /]# exit
exit

Now for the demo part. Let’s run a daemonized container that runs the sleep command. Sleep could represent any daemonized application. We are going to crash it, and analyze the core dump with gdb.

-bash-4.3# docker run -d fedora sleep infinity
9921bdd687eea85faa3f0365bb510c4e4e5df142295a2c8775e4c4a0912376a6

Get the Process ID (PID) of the process we are going to crash.

-bash-4.3# pid=$(pgrep sleep)

Using the gcore utility from the gdb package, crash the sleep PID within the container we created.

-bash-4.3# docker run fedora/tools gcore -o /host/tmp/democore $!
ptrace: No such process.
You can't do that without a process to debug.
The program is not being run.
gcore: failed to create /host/tmp/democore.1822

Ahh, why did that fail? It failed because of PID namespace isolation. A normal docker run invocation gives you a dedicated PID namespace for that container, starting with PID 1.

-bash-4.3# docker run -it fedora/tools ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  45860  2632 ?        Rs+  19:10   0:00 ps aux

So, the value of $pid does not exist inside this new container. But, if you use --pid=host, which is what the atomic run command does for you, you skip the PID namespace creation for the container, and operate in the host’s PID namespace, where the value of $pid is valid.

-bash-4.3# atomic run fedora/tools pgrep sleep
1822

Now let’s use the tools container to try and take a gcore of the sleep process running in another container.

-bash-4.3# atomic run fedora/tools gcore -o /host/tmp/democore $pid
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable
0x00007fc297ba62c0 in __nanosleep_nocancel () from /lib64/libc.so.6
warning: target file /proc/2240/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 8192 bytes at 0x7ffe73d4e000.
Saved corefile /host/tmp/democore.1822

Okay, this time we were able to save a core at /host/tmp/democore.1822. But what is /host? /host is a volume mount that Red Hat has chosen to standardize on for its super-privileged containers. This list of options is embedded in the tools image label:

# docker inspect fedora/tools | grep RUN
            "RUN": "docker run -it --name NAME --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=NAME -e IMAGE=IMAGE -v /run:/run -v /var/log:/var/log -v /etc/localtime:/etc/localtime -v /:/host IMAGE"

So, a tools image launched with the atomic command will be launched as above, including the -v/:/host volume. This gives us a handy way to write stuff to the host from within a container, such as when using gcore and choosing where it should write out the core dump.

Also, notice the warning about being in different PID namespaces. This warning means that certain data within the core file will not make sense when read back in a different PID namespace in which our gdb process will run. It gives the example of thread lists, which for our sake can be thought of as PID namespaces.

Back on the Atomic host system, we can see the core file was in fact written to /tmp:

-bash-4.3# file /tmp/democore.$pid
/tmp/democore.2240: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'sleep'

We can now use the tools container to run gdb, and analyze the core dump:

-bash-4.3# atomic run fedora/tools gdb /host/tmp/democore.$pid -q -ex bt -batch
[New LWP 2240]
Core was generated by `sleep'.
#0  0x00007fc297ba62c0 in ?? ()
"/host/tmp/democore.2240" is a core file.
Please specify an executable to debug.
#0  0x00007fc297ba62c0 in ?? ()
#1  0x0000000000403e0f in ?? ()
#2  0x000000000001869f in ?? ()
#3  0x0000000000000000 in ?? ()
-bash-4.3#

The magic numbers above can be resolved into human-readable function names by installing the corresponding debuginfo package:

# dnf install --disablerepo=* --enablerepo=fedora-debuginfo coreutils-debuginfo

Coming soon: sosreport

Unfortunately, the container support for sosreport has not yet been merged into Fedora. We’ve discussed this, and are working quickly to update the Fedora tools image accordingly.

Go Forth and Debug!

For more information, head over to the Red Hat Customer Portal and check out the official rhel-tools documentation.