FossID Documentation

FossID Toolbox User Guide

Supported Platforms

The fossid toolbox is delivered as a multi-arch Docker image and binaries for Linux amd64, Linux arm64, MacOS Universal and Windows amd64.

In Workbench 25.1 by default fossid-cli is still used. In order to enable FossID Toolbox please uncomment this line in fossid.conf:

;webapp_cli_command=/fossid/bin/fossid-toolbox -c /fossid/etc/fossid.conf

Prerequisites

Many features of fossid toolbox talks directly to the FossID scan server, so you need a scan server host and scan server token. These can be retrieved from the Evaluation or Delivery Portal, depending on whether you’re an active client or conducting an evaluation.

Command: filescan

This command scans a file. It has multiple modes of operation (--mode):

  • mixed: Combines regular and user-contribution volumes

  • regular: Uses only regular volumes

  • uc: Uses only user-contribution volumes

  • vsf: Does a VSF scan (to find vulnerable snippets)

For details about the returned response, please see Appendix A.

A filescan works by generating a signature for each file scanned, containing the name of the file and hashes for various scanning purposes. Some of the hashes are required for doing full file matching, and some for finding snippets. The hashes might change over time as we do improvements in the scan engine.

Origin of the code: Oldest version

Please note that the goal of a filescan is to find the origin of the source code. If a file matches multiple versions of a component, the match will point to the oldest version. This version might differ from the version of the component that is being scanned.

Command: diffscan

The fossid toolbox supports integrating FossID code scanning in CI/CD pipelines using the diffscan command. Its primary purpose is to prevent the accidental merge of code with license restrictions or security vulnerabilities.

The diffscan command compares two commits, the base commit and the compare commit, and reports compliance and security issues.

Show help by running fossid diffscan --help:

> docker run quay.io/fossid/fossid-toolbox:latest diffscan --help
(... help shown here)

Notable features of diffscan

  • Snippet identificationn: Find license issues down to snippet level.
  • VSF: Find code with security vulnerabilities.
  • Ignore-file support: Add a .fossidignore file to the root of your git repo, and it will be automatically detected.
  • Policy-file support: Add a .fossidpolicy file to specify allowed licenses.
  • Autodetection of PR refs: Automatically detect the base and compare refs.
  • Prevent accidental merging: Fail pipelines if an issue is encountered. For false positives, you can simply update the .fossidignore file.
  • PR annotation: Annotate the PR with the detected issues. GitHub only.
  • Server-Side Noise Filtering: Reduced false positives with Server-Side Noise Filtering (SSNF). Can be disabled.

Ways of integrating

  • Job image: Specifying the image the job should run on.
    • Example: GitLab job:image: with DOCKER_AUTH_CONFIG secret.
    • Example: GitHub job:container:image: with job:container:credentials.
    • Example: Azure DevOps container with configured endpoint (service connection).
  • Docker run: If the job is setup to allow docker to be run, you can authenticate and call docker run. You will have to mount the source code and also manually provide the base and compare refs since autodetection can’t analyze the job environment.
  • Downloading tarball: You can download, decompress and run the fossid command line tool.

Scan mode

Both license and vsf scanning is configured to run in a “mode”. This mode can be set to off, new, or all. For new, both the before and after revision is scanned to compare the results, and an issue is only reported if it didn’t exist in the before revision. For all, only the after revision is scanned, and all issues are reported.

Policy support

You can specify a policy file that fossid will use to determine what licenses are allowed. This file can either be exported from the FossID Workbench, or hand-crafted. Example contents:

[
  {
    "id":"MIT",
    "blocked": false,
    "reason": "Permissive License"
  },
  {
    "id":"Apache-2.0",
    "blocked": false,
    "reason": "Permissive License"
  }
]

If you place this in a file called .fossidpolicy in the repository root, it will automatically be used. Licenses not mentioned in this file will be prohibited.

Example: GitHub Integration

This example demonstrates scanning and Pull Request annotation using the fossid image from Quay. You can add this job to your GitHub workflow.

Several secrets are required to be set up in your repository:

  • QUAY_USERNAME: The username for logging into quay.io container registry
  • QUAY_PASSWORD: The password for logging into quay.io container registry
  • FOSSID_TOKEN: The token used for scanning
  • FOSSID_HOST: The host used for scanning

Here is a complete working GitHub workflow file that can be placed in .github/workflows/fossid.yaml:

name: FossID Scanning

on: pull_request

jobs:
  run-fossid-diffscan:
    name: FossID Annotate PR
    runs-on: ubuntu-latest
    container:
      image: quay.io/fossid/fossid-toolbox:latest
      credentials:
        username: $
        password: $
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Run fossid diffscan
        run: |
          fossid \
          diffscan \
          --fossid-host $ \
          --fossid-token $ \
          --github-workflow-errors

The --github-workflow-errors flag annotates the PR, if any issues are found.

The following image shows how the annotations look:

GitHub PR Annotations

Clicking on the annotation will bring you to the code snippet:

GitHub Annotated Snippet

Example: GitLab Integration

workflow:
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_TAG'
# ...
run-fossid-diffscan:
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: always
    - when: never
  stage: build
  image:
    name: quay.io/fossid/fossid-toolbox:latest
    entrypoint: [""]
  tags:
    - linux
  script:
    - fossid
      -v trace
      diffscan
      --fossid-host $FOSSID_HOST
      --fossid-token $FOSSID_TOKEN
      --fail-on-any-issues 1

Some things to note about the example:

  • It’s required that the entrypoint is overwritten since the fossid Dockerfile specifies the entrypoint to be the tool itself. With the default entrypoint the GitLab runner will have trouble launching the shell.
  • The base and compare refs are autodetected using environment variables.
  • This job must be run as a merge request, or the autodetection will fail, and the tool will fail to run.
  • The stage needs to be updated to reflect your pipeline.

Example: Azure DevOps

pr:
- master

resources:
  containers:
  - container: fossid-toolbox
    image: quay.io/fossid/fossid-toolbox:latest
    endpoint: quayServiceConnection
    options: --entrypoint ""

pool:
  vmImage: 'ubuntu-latest'

jobs:
- job: fossid_diff_scan
  condition: eq(variables['Build.Reason'], 'PullRequest')
  container: fossid-toolbox
  steps:
  - checkout: self
    persistCredentials: true
  - script: |
      fossid \
      -v info \
      diffscan \
      --fossid-host $(FOSSID_HOST) \
      --fossid-token $(FOSSID_TOKEN) \
      --fail-on-any-issues 1
    displayName: 'Run fossid diffscan'
    env:
      FOSSID_HOST: $(FOSSID_HOST)
      FOSSID_TOKEN: $(FOSSID_TOKEN)

This is a full example for running diffscan in Azure DevOps. Some things of importance:

  • You must have set up a service connection for quay (referenced using endpoint).
  • You must have set FOSSID_HOST and FOSSID_TOKEN as variables for the pipeline.
  • It’s important you have enabled the pipeline to run for Pull Requests (Repos → Branches → master → … → Branch policies → Build Validation)
  • We need to override the image entrypoint with options. Otherwise Azure DevOps cannot run the image.
  • checkout:persistCredentials is important, or fossid will not be able to fetch missing refs for diffing.
  • job:condition ensures the job is only run on PR’s.

Example: Docker run

This example runs the tool locally to show the difference between to commits:

> docker run -v .:/code quay.io/fossid/fossid-toolbox:latest \
  diffscan \
  --base-ref c32f7d43ee9c0db29a546a94dfe90da6fe1fef21 \
  --compare-ref 4176569ef5181bd727c1bf839af9c5e14a33a2ce \
  --repository-path /code \
  --fossid-host $FOSSID_HOST \
  --fossid-token $FOSSID_TOKEN \
  --fail-on-any-issues 1

Issue found in file: test_data/snippet_linux: License: "GPL-2.0", Component: linux/linux/2.6.10, url: "https://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.10.tar.gz", license: Some("GPL-2.0") }), Match Type: Partial
==> void fastcall
==> prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state)
==> {
==>     unsigned long flags;
==>
==>     wait->flags &= ~WQ_FLAG_EXCLUSIVE;
==>     spin_lock_irqsave(&q->lock, flags);
==>     if (list_empty(&wait->task_list))
==>         __add_wait_queue(q, wait);
==>     /*
==>      * don't alter the task state if this is just going to
==>      * queue an async wait queue callback
==>      */
==>     if (is_sync_wait(wait))

> echo $?
1

Some things to note about the example:

  • The current working directory is expected to be a git repository since we’re using . as volume.
  • The repository is mounted in the /code folder in the container, so we provide fossid with this path with --repository-path.
  • The tool is instructed to fail with exit-code 1 if any issues are detected. And in the example, we found issues and therefore $? returns 1.
  • The example commits has been hand-crafted to demonstrate the introduction of a GPL snippet.
  • A working FOSSID_HOST and FOSSID_TOKEN is expected to exist as an environment variable.

You can use an ignore file even with docker run, but you have to provide the full path to it. You also must make sure the file is mounted inside the docker container.

Appendix A: Filescan Response

Overview

This appendix describes the different kinds of responses you can get when performing a filescan. It describes both successful responses, and error responses.

Response

When scanning a file a few different things can happen:

  • You get a match (type component, file or partial):
    {"type":"component/file/partial", "component": {"artifact":"linux","version":"4.6", ...}, ...}

    • You get a vsf match: {"type":"vulnerability", "vulnerability": {"id": "CVE-2024-...", ...}, ...}

    • The file was ignored: {"type":"ignored", "ignore_reason": "skipping empty file", ...}

    • You can get an invalid or empty response from the server: {"type":"error", "reason":"...", "response": "...", "response_code": 400, ...}
      See Server Error section.

    • There was a filtered match (see whitelisting options): {"type":"filtered", "wid":"<whitelist id>", "rid":"<whitelist rule id>"}

    • There was a match with noise that was removed (if –ssnf is enabled): {"type":"noise", "noise":{...}}

General Structure

A successful match response contains lots of different objects. Depending on the type of the match, not all objects in the match are present.

{
  # Type of match:
  # - component: a match to a full component, for example a .jar file or other archives.
  # - file: a full file match to a file within a component.
  # - partial: a match to a snippet.
  # - vulnerability: a vsf match.
  "type": "partial",

  # A unique identifier for this specific match.
  "id": "387a0b23df7dc465",

  # Component details (see section COMPONENT).
  "component": { ... },

  # File details (see section FILE).
  "file": { ... },

  # Snippet details (see section SNIPPET).
  "snippet": { ... }

  # Noise details (see section NOISE).
  "noise": { ... }

  # Vulnerability details for VSF scans (see section VULNERABILITY).
  "vulnerability": { ... }
}

Component

The component object includes all component details.

"component": {
  # A unique ID for this particular component.
  "id": "9455f745c67462ead575ed8000000000",

  # The author of the component.
  "author": "The Bootstrap Authors",

  # The name of the component.
  "artifact": "bootstrap",

  # The version of the component.
  # Please note that this is the oldest identified version.
  # Example: If a match is found in version 4.1.2, 4.1.3 and 4.1.4 of a
  # component, we report 4.1.2.
  # This means that if you scan a certain version of a component, 
  # we might report an earlier version.
  "version": "4.1.2",

  # The release date of the component, in ISO 8601 format.
  # Please note this means the date can have many different appearances.
  "release_date": "2018-07-12",

  # The url to the archive, to download.
  "url": "https://registry.npmjs.org/bootstrap/-/bootstrap-4.1.2.tgz",

  # The component licenses (see section LICENSES).
  "licenses": [ ... ],

  # The component copyright (see section COPYRIGHT).
  "copyright": [ ... ],

  # The id to the file identified as being the component license file.
  "license_file_id": "568d01e0545c0c51b17c063100000000",

  # The path to the file identified as being the component license file.
  "license_file_path": "LICENSE",

  # The purl to the archive (https://github.com/package-url/purl-spec).
  "purl": "pkg:npm/bootstrap@4.1.2"
}

File

The file object includes all file details related to the remote file the match was in.

"file": {
  # A unique ID for this particular file.
  "id": "4e0f2d05ccc21a752459bb0800000000",

  # The path to this file inside the component.
  "path": "js/src/button.js",

  # The size of this file in bytes.
  "size": 4699,

  # Whether or not this file is available in the knowledge base mirror.
  "available": true,

  # The encoding of this file.
  "encoding": "UTF-8",

  # The file licenses (see section LICENSES).
  "licenses": [ ... ],

  # The file copyright (see section COPYRIGHT).
  "copyright": [ ... ]
}

Licenses

The licenses object can be found both in the file object and the component object. This array contains a list of licenses. The licenses can be of different types:

  • DECLARED: the license as declared by the author (only as component license)
  • LICENSE: a license text
  • EXCEPTION: a license exception
  • REFERENCE: a reference to a license or exception
  • WARNING: a license warning
  • LINK: a link to a license
"licenses": [
  # An example of a declared license. These 
  {
    # The type of license, in this case a declared license.
    "type": "DECLARED",

    # The license ID.
    "id": "MIT"
  },

  # An example of a found license.
  {
    # The type of license, in this case a license reference.
    "type": "LICENSE",

    # The license ID.
    "id": "BSD-3-Clause",

    # The name of the license.
    "name": "BSD 3-Clause \"New\" or \"Revised\" License",

    # A unique id for this license (not always present).
    "reference": "fossid_BSD-3-Clause.json",

    # Offset and length in chars for the license.
    # Can be used for highlighting of the license.
    "offset": 973127,
    "length": 1533,

    # The probability of this license being correct, as determined by
    # the license extractor.
    "probability": 0.9417472,

    # How much modification has been detected by the license extractor.
    "modification": 0.057017542,
  },

  # An example of a found license reference.
  # Please see earlier examples for meaning of fields.
  {
    "type": "REFERENCE"
    "id": "MIT",
    "name": "MIT License",
    "reference": "MIT.json",
    "offset": 142,
    "length": 18,
    "probability": 1
  }
]

This array contains a list of copyright statements.

"copyright": [
  {
    # The complete copyright statement.
    "default": "(c) 1998-2016 The OpenSSL Project",

    # The name of the copyright holder.
    "name": "The OpenSSL Project",

    # Offset and length in chars for the copyright.
    # Can be used for highlighting of the copyright.
    "offset": 508,
    "length": 33,

    # Year of the last revision.
    "year_last_revision": "2016",

    # Year of the publication.
    "year_publication": "1998"
  }
]

Snippet

This object is valid when we get a match to a snippet.

"snippet": {
  # A unique ID of the snippet.
  "id": "570ccfec92d089df6beeafa52bbc0a91",

  # The size of the snippet in the local file, in hashes.
  "local_size": 10,

  # The coverage in percent of the snippet in the local file.
  "local_coverage": 0.0901,

  # The highlight data for the snippet in the local file. (see HIGHLIGHT)
  "local_highlight": { ... },

  # The size of the snippet in the remote file, in hashes.
  "remote_size": 10,

  # The coverage in percent of the snippet in the remote file.
  "remote_coverage": 0.084,

  # The highlight data for the snippet in the remote file. (see HIGHLIGHT)
  "remote_highlight": { ... }
}

Highlight

This object contains the highlighting-information for a snippet.

"local/remote_highlight": {
  # A unique id of this highlighting.
  "id": "570ccfec92d089df6beeafa52bbc0a91",

  # Array of blocks, each containing highlight ranges.
  "blocks": [
    {
      # A unique id identifying this block.
      "id": "39175286b39340afc4f0c10f43bdecd4",

      # Range in chars for this highlight block.
      "char_range": {
        # Begin char
        "begin": 2362,

        # End char (including).
        # begin = 0 and end = 0 mean 1 char.
        "end": 2838
      }

      # NOTE: byte_range is deprecated, and replaced by char_range.
    }
  ],

  # Encoding of the remote file.
  "encoding": "UTF-8"
}

Noise

This information is then returned in the “noise” object. The information here can be used to highlight or filter matches locally.

{
  # For partial matches, blocks matches noise data to specific snippet blocks.
  # Using chars.total and chars.noise, you can see how much noise there is in
  # a specific block.
  "blocks": {
    # The id of the remote snippet block.
    "77259f5c2d6e879ed7ca62a1172dbb78": {
      # Information about amount of noise, in chars.
      "chars": {
        "noise": 455,
        "total": 567
      },
      # List of types of noise detected in this specific block.
      "types": [
        "SimpleCode",
        "Comment",
        ... # potentially many types.
      ]
    }
    # ... potentially many blocks present
  },
  # Information about amount of noise in the match.
  # This is the most important piece of information when doing match filtering.
  # It adapts to the specific match (file or snippet).
  "chars": {
    "noise": 455,
    "total": 567
  },
  # Information about noise in the remote file.
  # Even if it's a snippet match, this describes noise in the remote
  # file as a whole.
  "file": {
    "chars": {
      "noise": 21373,
      "total": 100660
    },
    # Set to something other than null, empty string or "None" if the
    # remote file was classified on noise at file-level.
    "type": "Junk"
  }
}

Vulnerability

This object contains vulnerability details from a VSF scan.

"vulnerability": {

  # The unique id of the vulnerabiltiy, in this case a CVE from NVD (nvd.nist.gov).
  "id": "CVE-2014-3506",

  # The official vulnerabilitiy json document.
  # In the case of CVEs, this would be from nvd.nist.gov, provided as-is.
  "details": { ... },

  # The URL to the vulnerability description.
  "url": "https://nvd.nist.gov/vuln/detail/CVE-2014-3506"
}

Server Error

When you get a server error, for example with status code 500, the CLI detects this and outputs an error JSON.

The error JSON from the CLI looks like this:

{
  # Path of the local file.
  "local_path": "test-scan",

  # Match type is error.
  "type": "error",

  # Reason for error.
  "reason": "invalid response",

  # Response code from the server.
  "response_code": 500,

  # Date of error (ISO 8601).
  "date": "2020-10-25T12:38:11Z",

  # The response from the server.
  # This could be:
  #  - A complete JSON document from the server.
  #  - Another response, for example an error from a proxy (non JSON).
  "response": "..."
}

Some server errors are JSON documents embedded inside the CLI error response field.
You can extract the response manually or (for example) using the jq utility:

$ fossid ... | jq .response -r | jq

This yields a document looking similar to the following:

{
  # Match type is error.
  "type": "error",

  # The error text.
  "error": "Failed to execute command",

  # Details about the error, in this case the command that triggered the error.
  "error_details": "Alfred missing or cannot be executed",

  # The host that generated the error.
  "host": "...",

  # The date of the error (ISO 8601).
  "error_date": "2020-10-25T12:33:54.918Z",

  # The version of the software serving the request.
  "version": "3.7.0"
}

If you have scanned a lot of files, and need to know which files failed, you can do the following to extract all “local_path” from error jsons:

$ jq -r 'select(.type == "error") | .local_path' result.ndjson