Scanning Dockerfiles for security issues + Contributing to semgrep

Recently I had to scan some Dockerfiles to identify potential security issues.  In this case I wanted to use an automatic scanner. Automatic scanners have the problems we know about false positives and false negatives, but depending on the kind of work you want to do and the depth you need, they have a good benefit/effort ratio.

Hadolint

A friend of mine talked to me about hadolint. In its Github repository, this is the description it has:

“A smarter Dockerfile linter that helps you build best practice Docker images. The linter parses the Dockerfile into an AST and performs rules on top of the AST. It stands on the shoulders of ShellCheck to lint the Bash code inside RUN instructions.”

As it is written in Haskell and I would need to install Haskell and the stack build tool to build it, I used the container image:

podman run --rm -i docker.io/hadolint/hadolint < Dockerfile

This is the results I had with a specific Dockerfile, along with the offending lines:

-:6 DL3007 warning: Using latest is prone to errors if the image will ever update. Pin the version explicitly to a release tag
FROM myregistry.local/testing/test-image:latest as build
-:7 DL3002 warning: Last USER should not be root
USER root
-:16 DL3007 warning: Using latest is prone to errors if the image will ever update. Pin the version explicitly to a release tag
FROM myregistry.local/testing/test-image-minimal:latest

The results are simple, but interesting.

About the first finding, it is not clear, at least for me, and for other people, if it is much better to pin the version or to use latest as tag. You can see an interesting conversation about this in this Twiter thread. If you use latest, you always pull the latest and most recent version, so you will be sure that you include the latest patches and security fixes. On the other hand, by using latest without validating the version first, you could introduce security issues or even malicious code included in the latest version.

The second recommendation, not using root within the container, is a basic security control. Although the containers implement isolation mechanisms, if you work with a non-root user within the container, you will be reducing the attack surface and the risk. You can read more about container security in this post.

The third finding is the same as the first one, but the line is a bit different: it does not have the “as something” at the end.

 

Semgrep

Although the findings are nice, I would like to work with as few tools as possible. I like semgrep a lot because of it’s flexible, it’s open source and it has an awesome team behind it. I also know they are trying to semgrep everything, so I assumed they should have something for Dockerfiles.

I browsed to their semgrep registry and found this ruleset for dockerfiles. It has 34 rules. Note the description of the ruleset:

“Selected rules from Hadolint, a Dockerfile linter, rewritten in Semgrep.”

It seems that someone has already done the work of translating hadolint rules to semgrep rules.

As the rules already exist, I tried semgrep on the Dockerfiles and these are the results I got for the same Dockerfile commented above (to compare results):

  splunk-quarkus/Dockerfile.jvm
     generic.dockerfile.security.last-user-is-root.last-user-is-root
        The last user in the container is 'root'. This is a security hazard because if an attacker
        gains control of the container they will have root access. Switch back to another user after
        running commands as 'root'.
        Details: https://sg.run/N461
          7┆ USER root

Only one finding, the one about the root user. What happened with the finding about using latest as image tag? I thought that probably the rule was not implemented in semgrep for whatever reason and that it might be a good occasion to contribute to semgrep in something that seemed easy.

By searching within the dockerfiles ruleset, I found that indeed a rule already existed that was what I was looking for:

If the rule exists, why didn’t it detect the same issue?

By clicking in the rule you enter a functionality where you have the rule above and some example code to test the rule below:

There you can test the rule by clicking in “Run”, but in this case I got an error:

{“code”:3,”level”:”warn”,”message”:”[WARN] Semgrep Core — Syntax error\nAn error occurred while invoking the Semgrep engine. Please help us fix this by creating an issue at https://github.com/returntocorp/semgrep\n\nAt line targetDockerfile:11: `FORM debian:jessie as blah2` was unexpected\n”,”path”:”targetDockerfile”,”type”:”Syntax error”}

The pattern documentation has full details on pattern syntax.

The error happens because the test code has an error: it uses FORM instead of FROM. This is another issue I can help fix by contributing, but to continue troubleshooting the issue I was working on, I followed a different path. Semgrep has another functionality in their website that is called playground. We can use the playground to enter rules and example code and see if it works. So, I browsed to the playground.

The source code of the rule is here: https://github.com/returntocorp/semgrep-rules/blob/release/generic/dockerfile/best-practice/avoid-latest-version.yaml. What I did is to copy and paste the code of the rule in the top section and some example lines in the bottom section:

Then I executed the scan. Only line 5 was detected as wrong:

FROM debian:latest

But not the lines 7 and 9.

By doing some try and error I saw that the issue is that $IMAGE does not match with something with dots and slashes. I’m not sure about the reason but I think it might be related to the fact that dockerfile support is recent in semgrep.

I remembered that semgrep has an operator that is the three dots “…” that match with almost anything. I tried replacing $IMAGE by the three dots and executed the rule again:

This is probably how we wanted the rule to behave.

Thinking that I had discovered an opportunity to improve the rule, and being semgrep an open source project, I looked for their guidelines to contribute to semgrep rules.

 

Contributing to semgrep

The contribution was trivial because it was just to change something in an existing rule, and not creating a new rule, which has more requirements.

This is the process I followed to clone the repo and contribute:

First, I had to fork the repo to my own account, and continued from there.

git clone git@github.com:fcano/semgrep-rules.git

cd semgrep-rules

git remote add --track develop upstream git@github.com:returntocorp/semgrep-rules.git

git fetch upstream

git checkout -b fix-avoid-latest-version-rule upstream/develop

# here I did the change

code generic/dockerfile/best-practice/avoid-latest-version.yaml

# here I modified the test cases to add the case the rule was failing with

code generic/dockerfile/best-practice/avoid-latest-version.dockerfile

In summary, these are the modifications I did:

diff --git a/generic/dockerfile/best-practice/avoid-latest-version.dockerfile b/generic/dockerfile/best-practice/avoid-latest-version.dockerfile
index b2910018..f6b28996 100644
--- a/generic/dockerfile/best-practice/avoid-latest-version.dockerfile

+++ b/generic/dockerfile/best-practice/avoid-latest-version.dockerfile
@@ -1,11 +1,23 @@
 # ruleid: avoid-latest-version
 FROM debian:latest

+# ruleid: avoid-latest-version
+FROM myregistry.local/testing/test-image:latest
+
 # ruleid: avoid-latest-version
 FROM debian:latest as blah

+# ruleid: avoid-latest-version
+FROM myregistry.local/testing/test-image:latest as blah
+

 # ok: avoid-latest-version
 FROM debian:jessie

 # ok: avoid-latest-version
-FORM debian:jessie as blah2
+FROM myregistry.local/testing/test-image:42ee222
+
+# ok: avoid-latest-version
+FROM debian:jessie as blah2
+
+# ok: avoid-latest-version
+FROM myregistry.local/testing/test-image:2a4af68 as blah2

diff --git a/generic/dockerfile/best-practice/avoid-latest-version.yaml b/generic/dockerfile/best-practice/avoid-latest-version.yaml
index 35eb69ac..e0010e37 100644
--- a/generic/dockerfile/best-practice/avoid-latest-version.yaml
+++ b/generic/dockerfile/best-practice/avoid-latest-version.yaml
@@ -17,4 +17,4 @@ rules:
       include:
         - "*dockerfile*"
         - "*Dockerfile*"
-    pattern: FROM $IMAGE:latest
+    pattern: FROM ...:latest

With the code changed, there are some things that semgrep require to check before sending the pull request:

pre-commit run --all
python -m semgrep --quiet --test generic/dockerfile/best-practice/
git add .
git commit -m "Fix dockerfile avoid-latest-version rule"
git push -u origin fix-avoid-latest-version-rule

Then, I browsed to https://github.com/fcano/semgrep-rules/pull/new/fix-avoid-latest-version-rule to created the pull request, which is here https://github.com/returntocorp/semgrep-rules/pull/1923. It was approved and merged some minutes later.

If you want to read more information about the git workflow usually followed to contribute to open source projects you can review this article.

Please, note what happened here. First, I used one tool, hadolint, just because a colleague told me about it. I installed and executed it in a few minutes, thanks mainly to the fact that it is open source. Then, I switched to one of my favorite tools, semgrep, which is also open source, and searched for the same functionality; and it is there! Moreover, the functionality is based on hadolint’s. That’s probably possible because both tools are open source. But not only that. I saw something not working as expected and I was able to send a modification to the tool to the authors which included it, again in only some hours. From idea to change in an existing tool in an afternoon. This is the power of open source.

Well, it’s true that semgrep has a release cycle and they release more or less each 10 days, so I will need to wait a bit for seeing the change included in the release, but it will be there.

1 thought on “Scanning Dockerfiles for security issues + Contributing to semgrep

Leave a Reply

Your email address will not be published.