Secret Scanning GitHub Repositories Using Concourse CI & Trufflehog

February 20, 2023

Recently I’ve become responsible for a number of GitHub organizations with many repositories in each. During a review, I noticed a Slack webhook URL saved in a config file in GitHub. For those that don’t know, Slack webhooks should be considered secret, as anyone with the URL can push a message into Slack. Slack has this to say about them:

Keep it secret, keep it safe. Your webhook URL contains a secret. Don't share it online, including via public version control repositories. Slack actively searches out and revokes leaked secrets.

Fortunately, all these secrets were found in private GitHub repos, so they were never exposed publicly. The Slack webhook URL that was found had been stored in the config file for a year when discovered. So it seems there aren’t any internal processes actively monitoring for secrets in source code, or they don’t consider a Slack webhook URL an issue. I was curious what other secrets I might find, so I download the trufflehog CLI and decided to start scanning.

Trufflehog is a tool that is designed to find secrets by scanning for high entropy strings in source code. You can learn more about the tool here:

https://github.com/trufflesecurity/trufflehog

It’s quite easy to scan an entire GitHub organization, all it requires is a single command:

trufflehog github --endpoint https://github.com --org my-org --token my-token -j --only-verified

If trufflehog finds secrets in your repositories it will return a result similar to what is shown below:

{"SourceMetadata":{"Data":{"Github":{"link":"https://github.com/org/repo/file.yml","repository":"https://github.com/org/repo.git","commit":"commit-hash","email":"user@test.com","file":"file.yml","timestamp":"2023-01-17 09:50:57 -0500 -0500","line":40,"visibility":1}}},"SourceID":0,"SourceType":7,"SourceName":"trufflehog - github","DetectorType":30,"DetectorName":"SlackWebhook","DecoderName":"PLAIN","Verified":true,"Raw":"https://hooks.slack.com/fake/webhook/url","Redacted":"","ExtraData":null,"StructuredData":null}

After trufflehog finished scanning, it had identified a number of secrets in a few repositories. The most common issue however, were Slack webhook URLs embedded in config files. Once identified, I moved these secrets to a HashiCorp Vault instance and updated the config files to remove the Slack webhook. Unfortunately, it’s not enough to just update the file to remove the secret, you also need to rewrite the entire history to remove it completely. To do this, you can use a tool like BFG Repo-Cleaner.

A few days later, I decided to automate the secret scanning process using trufflehog and the Concourse CI tool. Concourse CI is known as a “continuous thing doer” and I’ve used it for CI/CD for a number of years. You can read more about it here:

https://concourse-ci.org/

I wanted my secret scanning pipeline to run every night to warn us if any secrets were found. Here is a simplified version of the pipeline I developed:

resource_types:
  - name: slack-notification
    type: registry-image
    source:
      repository: cfcommunity/slack-notification-resource
      tag: latest

resources:
- name: once-nightly
  type: time
  icon: clock
  check_every: 2h
  source:
    start: 1:00 AM
    stop: 4:00 PM
    days: [Monday, Tuesday, Wednesday, Thursday, Friday]
    location: America/New_York

- name: slack-notify
  type: slack-notification
  icon: slack
  source:
    url: ((slack-webhook))

- name: trufflehog-image
  type: registry-image
  icon: docker
  source:
    repository: ((docker-repository))
    username: ((docker-username))
    password: ((docker-password))

jobs:
- name: scan-organizations
  plan:
  - get: once-nightly
    trigger: true
  - get: trufflehog-image
    trigger: true
  - task: trufflehog-scan-org
      image: trufflehog-image
      config:
        platform: linux
        outputs:
          - name: results
        run:
          path: bash
          args:
          - -exc
          - |
           # Scan the organization to find any secrets:
           trufflehog github --endpoint https://github.com --org org --token ${GIT_TOKEN} -j --only-verified >> result.json
           # Check the number of lines in the file to determine how many secrets were found:
           jq -s length result.json
           issues=`jq -s length result.json`
           # Write a message to a file that will be sent to notify the team secrets were found:
           echo "Trufflehog found $issues verfied issues in the organization." >> results/org.txt
        params:
           GIT_TOKEN: ((git.access-token))
  on_success:
    do:
      - put: slack-notify
        params:
          username: Concourse
          silent: true
          text_file: results/org.txt
          text: |
            :concourse-succeeded: [*SUCCESS*] *$BUILD_PIPELINE_NAME* | *$BUILD_JOB_NAME*
            Result: $TEXT_FILE_CONTENT
  on_failure:
    do:
      - put: slack-notify
        params:
          username: Concourse
          silent: true
          text: |
            The scanning pipeline failed.

The Concourse pipeline is fairly simple, it sets up a few resources and the main job of the pipeline scans the specified organization’s repositories for any secrets. The results of the scan are written to a file and inspected to identify how many issues were found. Not shown in the simplified pipeline above, but the job also stores the report in S3 so the developers can take action on any secrets found. The pipeline also scans multiple GitHub organizations in parallel, but that is not shown in the simplified pipeline above.

A Docker image was also created that contains the trufflehog CLI and other tools required by the pipeline. That Dockerfile looks something like this:

FROM golang:alpine

RUN apk update && apk add git jq curl gcc bash musl-dev openssl-dev ca-certificates && update-ca-certificates
RUN apk add --no-cache aws-cli
RUN git clone https://github.com/trufflesecurity/trufflehog.git
RUN cd trufflehog && go install

If you are not currently scanning source code for passwords, I would highly recommend setting up a system like this to continually scan. If you have public repos, some companies like Stripe and Slack will scan repositories for their compromised keys. Even GitHub has secret scanning capabilities for public repos, but it’s not available for private repos as far as I’ve seen. Especially if you’re using source control other than GitHub or an on-prem version, you should consider setting up something like this to catch any secrets. Better safe than sorry.


Written by William Applegate

© 2023