FossID Documentation

Supported Platforms

The fossid toolbox is delivered as a multi-arch Docker image and binaries for Linux amd64, Linux arm64, MacOS Universal and Windows amd64.

In Workbench 25.1 by default fossid-cli is still used. In order to enable FossID Toolbox please uncomment this line in fossid.conf:

;webapp_cli_command=/fossid/bin/fossid-toolbox -c /fossid/etc/fossid.conf

Prerequisites

Many features of fossid toolbox talk directly to the FossID scan server, so you need a scan server host and scan server token. These can be retrieved from the Evaluation or Delivery Portal, depending on whether you’re an active client or conducting an evaluation.

Command: `filescan`

This command scans a file. It has multiple modes of operation (--mode):

mixed: Combines regular and user-contribution volumes
regular: Uses only regular volumes
uc: Uses only user-contribution volumes
vsf: Does a VSF scan (to find vulnerable snippets)

For details about the returned response, please see Appendix A.

A filescan works by generating a signature for each file scanned, containing the name of the file and hashes for various scanning purposes. Some of the hashes are required for doing full file matching, and some for finding snippets. The hashes might change over time as we do improvements in the scan engine.

Ignore projects

This command supports ignoring projects with the --ignore-projects argument. For detailed information on how to use this feature, see the “Ignoring Projects” section below.

Origin of the code: Oldest version

Please note that the goal of a filescan is to find the origin of the source code. If a file matches multiple versions of a component, the match will point to the oldest version. This version might differ from the version of the component that is being scanned.

Command: `diffscan`

The fossid toolbox supports integrating FossID code scanning in CI/CD pipelines using the diffscan command. Its primary purpose is to prevent the accidental merge of code with license restrictions or security vulnerabilities.

The diffscan command compares two commits, the base commit and the compare commit, and reports compliance and security issues.

Show help by running fossid diffscan --help:

> docker run quay.io/fossid/fossid-toolbox:latest diffscan --help

Notable features of `diffscan`

Snippet identification: Find license issues down to snippet level.
VSF: Find code with security vulnerabilities. (Default off, see Scan Mode below)
Ignore-file support: Add a .fossidignore file to the root of your git repo, and it will be automatically detected.
Policy-file support: Add a .fossidpolicy file to specify allowed licenses.
Autodetection of PR refs: Automatically detect the base and compare refs.
Prevent accidental merging: Fail pipelines if an issue is encountered. For false positives, you can simply update the .fossidignore file.
PR annotation: Annotate the PR with the detected issues. GitHub only.
Server-Side Noise Filtering: Reduced false positives with Server-Side Noise Filtering (SSNF). Can be disabled.
Ignore Projects: Supports ignoring certain projects (See Ignoring Projects section).

Ways of integrating

Job image: Specifying the image the job should run on.
- Example: GitLab job:image: with DOCKER_AUTH_CONFIG secret.
- Example: GitHub job:container:image: with job:container:credentials.
- Example: Azure DevOps container with configured endpoint (service connection).
Docker run: If the job is setup to allow docker to be run, you can authenticate and call docker run. You will have to mount the source code and also manually provide the base and compare refs since autodetection can’t analyze the job environment.
Downloading tarball: You can download, decompress and run the fossid command line tool.

Scan Mode

Both License and VSF scanning are configured to run in a “mode” (please see --license-mode and --vsf-mode in the diffscan --help). This mode can be set to:

off: Disable this kind of scanning
new: If an issue was found, scan the previous version of this file as well to see if the issue was new or already existed. Only report the issue if it did not exist in the base commit.
all: Report the issue no matter if it’s new or existed before. Especially useful for VSF scanning, as new vulnerabilities are continuously added to the Knowledge Base.

Policy support

You can specify a policy file that fossid will use to determine what licenses are allowed. This file can either be exported from the FossID Workbench, or hand-crafted. Example contents:

[
  {
    "id":"MIT",
    "blocked": false,
    "reason": "Permissive License"
  },
  {
    "id":"Apache-2.0",
    "blocked": false,
    "reason": "Permissive License"
  }
]

If you place this in a file called .fossidpolicy in the repository root, it will automatically be used. Licenses not mentioned in this file will be prohibited.

Ignore projects

This command supports ignoring projects with the --ignore-projects argument. For detailed information on how to use this feature, see the “Ignoring Projects” section below.

Known Limitations

Maximum GitHub Annotations: At the time of writing, GitHub has an undocumented limitation on the number of annotations created per job and workflow. It appears to be approximately 10 per job step and 50 per job. If these limits are exceeded, additional annotations may not be displayed in the PR. For more information and the latest updates, see the following GitHub community discussions:
- https://github.com/orgs/community/discussions/26680
- https://github.com/orgs/community/discussions/68471

Example: GitHub Integration

This example demonstrates scanning and Pull Request annotation using the fossid image from Quay. You can add this job to your GitHub workflow.

The following secrets must be configured in your repository:

QUAY_USERNAME: The username for logging into quay.io container registry
QUAY_PASSWORD: The password for logging into quay.io container registry
FOSSID_TOKEN: The token used for scanning
FOSSID_HOST: The host used for scanning

Here is a complete working GitHub workflow file that can be placed in .github/workflows/fossid.yaml:

name: FossID Scanning

on: pull_request

jobs:
  run-fossid-diffscan:
    name: FossID Annotate PR
    runs-on: ubuntu-latest
    container:
      image: quay.io/fossid/fossid-toolbox:latest
      credentials:
        username: $
        password: $
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Run fossid diffscan
        run: |
          fossid \
          diffscan \
          --fossid-host $ \
          --fossid-token $ \
          --github-workflow-errors

The --github-workflow-errors flag annotates the PR, if any issues are found.

The following image shows how the annotations look:

GitHub PR Annotations

Clicking on the annotation will bring you to the code snippet:

GitHub Annotated Snippet

Example: GitLab Integration

workflow:
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_TAG'
# ...
run-fossid-diffscan:
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: always
    - when: never
  stage: build
  image:
    name: quay.io/fossid/fossid-toolbox:latest
    entrypoint: [""]
  tags:
    - linux
  script:
    - fossid
      -v trace
      diffscan
      --fossid-host $FOSSID_HOST
      --fossid-token $FOSSID_TOKEN
      --fail

Some things to note about the example:

It’s required that the entrypoint is overwritten since the fossid Dockerfile specifies the entrypoint to be the tool itself. With the default entrypoint the GitLab runner will have trouble launching the shell.
The base and compare refs are autodetected using environment variables.
This job must be run as a merge request, or the autodetection will fail, and the tool will fail to run.
The stage needs to be updated to reflect your pipeline.

Example: Azure DevOps

pr:
- master

resources:
  containers:
  - container: fossid-toolbox
    image: quay.io/fossid/fossid-toolbox:latest
    endpoint: quayServiceConnection
    options: --entrypoint ""

pool:
  vmImage: 'ubuntu-latest'

jobs:
- job: fossid_diff_scan
  condition: eq(variables['Build.Reason'], 'PullRequest')
  container: fossid-toolbox
  steps:
  - checkout: self
    persistCredentials: true
  - script: |
      fossid \
      -v info \
      diffscan \
      --fossid-host $(FOSSID_HOST) \
      --fossid-token $(FOSSID_TOKEN) \
      --fail
    displayName: 'Run fossid diffscan'
    env:
      FOSSID_HOST: $(FOSSID_HOST)
      FOSSID_TOKEN: $(FOSSID_TOKEN)

This is a full example for running diffscan in Azure DevOps. Some things of importance:

You must have set up a service connection for quay (referenced using endpoint).
You must have set FOSSID_HOST and FOSSID_TOKEN as variables for the pipeline.
It’s important you have enabled the pipeline to run for Pull Requests (Repos → Branches → master → … → Branch policies → Build Validation)
We need to override the image entrypoint with options. Otherwise Azure DevOps cannot run the image.
checkout:persistCredentials is important, or fossid will not be able to fetch missing refs for diffing.
job:condition ensures the job is only run on PR’s.

Example: Docker run

This example runs the tool locally to show the difference between two commits:

> docker run -v .:/code quay.io/fossid/fossid-toolbox:latest \
  diffscan \
  --base-ref c32f7d43ee9c0db29a546a94dfe90da6fe1fef21 \
  --compare-ref 4176569ef5181bd727c1bf839af9c5e14a33a2ce \
  --repository-path /code \
  --fossid-host $FOSSID_HOST \
  --fossid-token $FOSSID_TOKEN \
  --fail

Issue found in file: test_data/snippet_linux: License: "GPL-2.0", Component: linux/linux/2.6.10, url: "https://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.10.tar.gz", license: Some("GPL-2.0") }), Match Type: Partial
==> void fastcall
==> prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state)
==> {
==>     unsigned long flags;
==>
==>     wait->flags &= ~WQ_FLAG_EXCLUSIVE;
==>     spin_lock_irqsave(&q->lock, flags);
==>     if (list_empty(&wait->task_list))
==>         __add_wait_queue(q, wait);
==>     /*
==>      * don't alter the task state if this is just going to
==>      * queue an async wait queue callback
==>      */
==>     if (is_sync_wait(wait))

> echo $?
1

Some things to note about the example:

The current working directory is expected to be a git repository since we’re using . as volume.
The repository is mounted in the /code folder in the container, so we provide fossid with this path with --repository-path.
The tool is instructed to fail with exit-code 1 if any issues are detected. And in the example, we found issues and therefore $? returns 1.
The example commits has been hand-crafted to demonstrate the introduction of a GPL snippet.
A working FOSSID_HOST and FOSSID_TOKEN is expected to exist as an environment variable.

You can use an ignore file even with docker run, but you have to provide the full path to it. You also must make sure the file is mounted inside the docker container.

Ignoring Projects

Some commands have an argument called --ignore-projects, which allows you to exclude specific projects from being considered during scanning. The --ignore-projects argument expects a path to a file containing a list of projects to ignore. A project can be one of the following:

A project_id from the match JSON (match.component.project_id)
A GitHub project (github.com/author/project)
A GitHub organization (github.com/author)

This ignore file can be checked into your Git repository.

The project_id is a unique identifier returned in the component object of the match response. (See Appendix A for details on the component object, including the project_id.)

Below is a step-by-step example of how to retrieve a project_id from a match response and use it to ignore a project:

# Get the project_id of the tokio-rs/axum project
$ fossid filescan test_data/axum_snippet --limit 1 | jq -r '.component.url + " (" + .component.project_id + ")"'
https://github.com/tokio-rs/axum/archive/axum-macros-v0.4.2.tar.gz (20146bea053f10e3b7c39b3800000000)

# Add the tokio-rs/axum project to a file called "ignoreme"
$ echo 20146bea053f10e3b7c39b3800000000 > ignoreme

# Scan again with ignore-projects enabled
$ fossid filescan test_data/axum_snippet --limit 1 --ignore-projects ignoreme | jq -r '.component.url + " (" + .component.project_id + ")"'
https://github.com/hoprnet/hoprnet/archive/refs/tags/singapore.tar.gz (415118458ff7880ffe35f18b00000000)

The second filescan still returns a match, but for a different repository.

Here is an example ignoreme file containing various types of projects to ignore:

$ cat ignoreme
# This file contains an example list of projects to ignore
# Empty lines and lines starting with '#' are skipped.

# This ignores github.com/facebook/react
# (the ID is from match.component.project_id)
24772ed0d274588a4b5a210a00000000

# You can also ignore a GitHub project like this:
github.com/facebook/react

# Or a whole GitHub organization from this:
github.com/apache

# (Do not include the https:// prefix for GitHub projects or organizations)

Appendix A: Filescan Response

Overview

This appendix describes the different kinds of responses you can get when performing a filescan. It describes both successful responses, and error responses.

Response

When scanning a file a few different things can happen:

You get a match (type component, file or partial):
{"type":"component/file/partial", "component": {"artifact":"linux","version":"4.6", ...}, ...}
- You get a vsf match: {"type":"vulnerability", "vulnerability": {"id": "CVE-2024-...", ...}, ...}
- The file was ignored: {"type":"ignored", "ignore_reason": "skipping empty file", ...}
- You can get an invalid or empty response from the server: {"type":"error", "reason":"...", "response": "...", "response_code": 400, ...}
  See Server Error section.
- There was a filtered match (see whitelisting options): {"type":"filtered", "wid":"<whitelist id>", "rid":"<whitelist rule id>"}
- There was a match with noise that was removed (if –ssnf is enabled): {"type":"noise", "noise":{...}}

General Structure

A successful match response contains lots of different objects. Depending on the type of the match, not all objects in the match are present.

{
  # Type of match:
  # - component: a match to a full component, for example a .jar file or other archives.
  # - file: a full file match to a file within a component.
  # - partial: a match to a snippet.
  # - vulnerability: a vsf match.
  "type": "partial",

  # A unique identifier for this specific match.
  "id": "387a0b23df7dc465",

  # Component details (see section COMPONENT).
  "component": { ... },

  # File details (see section FILE).
  "file": { ... },

  # Snippet details (see section SNIPPET).
  "snippet": { ... }

  # Noise details (see section NOISE).
  "noise": { ... }

  # Vulnerability details for VSF scans (see section VULNERABILITY).
  "vulnerability": { ... }
}

Component

The component object includes all component details.

"component": {
  # A unique ID for this particular component.
  "id": "9455f745c67462ead575ed8000000000",

  # The author of the component.
  "author": "The Bootstrap Authors",

  # The name of the component.
  "artifact": "bootstrap",

  # The version of the component.
  # Please note that this is the oldest identified version.
  # Example: If a match is found in version 4.1.2, 4.1.3 and 4.1.4 of a
  # component, we report 4.1.2.
  # This means that if you scan a certain version of a component, 
  # we might report an earlier version.
  "version": "4.1.2",

  # The release date of the component, in ISO 8601 format.
  # Please note this means the date can have many different appearances.
  "release_date": "2018-07-12",

  # The url to the archive, to download.
  "url": "https://registry.npmjs.org/bootstrap/-/bootstrap-4.1.2.tgz",

  # The component licenses (see section LICENSES).
  "licenses": [ ... ],

  # The component copyright (see section COPYRIGHT).
  "copyright": [ ... ],

  # The id to the file identified as being the component license file.
  "license_file_id": "568d01e0545c0c51b17c063100000000",

  # The path to the file identified as being the component license file.
  "license_file_path": "LICENSE",

  # The purl to the archive (https://github.com/package-url/purl-spec).
  "purl": "pkg:npm/bootstrap@4.1.2",

  # The project_id for the project this component belongs to. The project in this case is bootstrap (all versions).
  "project_id": "e6cd8a4da0dd745c9753928e00000000"
}

File

The file object includes all file details related to the remote file the match was in.

"file": {
  # A unique ID for this particular file.
  "id": "4e0f2d05ccc21a752459bb0800000000",

  # The path to this file inside the component.
  "path": "js/src/button.js",

  # The size of this file in bytes.
  "size": 4699,

  # Whether or not this file is available in the knowledge base mirror.
  "available": true,

  # The encoding of this file.
  "encoding": "UTF-8",

  # The file licenses (see section LICENSES).
  "licenses": [ ... ],

  # The file copyright (see section COPYRIGHT).
  "copyright": [ ... ]
}

Licenses

The licenses object can be found both in the file object and the component object. This array contains a list of licenses. The licenses can be of different types:

DECLARED: the license as declared by the author (only as component license)
LICENSE: a license text
EXCEPTION: a license exception
REFERENCE: a reference to a license or exception
WARNING: a license warning
LINK: a link to a license

"licenses": [
  # An example of a declared license. These 
  {
    # The type of license, in this case a declared license.
    "type": "DECLARED",

    # The license ID.
    "id": "MIT"
  },

  # An example of a found license.
  {
    # The type of license, in this case a license reference.
    "type": "LICENSE",

    # The license ID.
    "id": "BSD-3-Clause",

    # The name of the license.
    "name": "BSD 3-Clause \"New\" or \"Revised\" License",

    # A unique id for this license (not always present).
    "reference": "fossid_BSD-3-Clause.json",

    # Offset and length in chars for the license.
    # Can be used for highlighting of the license.
    "offset": 973127,
    "length": 1533,

    # The probability of this license being correct, as determined by
    # the license extractor.
    "probability": 0.9417472,

    # How much modification has been detected by the license extractor.
    "modification": 0.057017542,
  },

  # An example of a found license reference.
  # Please see earlier examples for meaning of fields.
  {
    "type": "REFERENCE",
    "id": "MIT",
    "name": "MIT License",
    "reference": "MIT.json",
    "offset": 142,
    "length": 18,
    "probability": 1
  }
]

Copyright

This array contains a list of copyright statements.

"copyright": [
  {
    # The complete copyright statement.
    "default": "(c) 1998-2016 The OpenSSL Project",

    # The name of the copyright holder.
    "name": "The OpenSSL Project",

    # Offset and length in chars for the copyright.
    # Can be used for highlighting of the copyright.
    "offset": 508,
    "length": 33,

    # Year of the last revision.
    "year_last_revision": "2016",

    # Year of the publication.
    "year_publication": "1998"
  }
]

Snippet

This object is valid when we get a match to a snippet.

"snippet": {
  # A unique ID of the snippet.
  "id": "570ccfec92d089df6beeafa52bbc0a91",

  # The size of the snippet in the local file, in hashes.
  "local_size": 10,

  # The coverage in percent of the snippet in the local file.
  "local_coverage": 0.0901,

  # The highlight data for the snippet in the local file. (see HIGHLIGHT)
  "local_highlight": { ... },

  # The size of the snippet in the remote file, in hashes.
  "remote_size": 10,

  # The coverage in percent of the snippet in the remote file.
  "remote_coverage": 0.084,

  # The highlight data for the snippet in the remote file. (see HIGHLIGHT)
  "remote_highlight": { ... }
}

Highlight

This object contains the highlighting-information for a snippet.

"local/remote_highlight": {
  # A unique id of this highlighting.
  "id": "570ccfec92d089df6beeafa52bbc0a91",

  # Array of blocks, each containing highlight ranges.
  "blocks": [
    {
      # A unique id identifying this block.
      "id": "39175286b39340afc4f0c10f43bdecd4",

      # Range in chars for this highlight block.
      "char_range": {
        # Begin char
        "begin": 2362,

        # End char (including).
        # begin = 0 and end = 0 mean 1 char.
        "end": 2838
      }

      # NOTE: byte_range is deprecated, and replaced by char_range.
    }
  ],

  # Encoding of the remote file.
  "encoding": "UTF-8"
}

Noise

This information is then returned in the “noise” object. The information here can be used to highlight or filter matches locally.

{
  # For partial matches, blocks matches noise data to specific snippet blocks.
  # Using chars.total and chars.noise, you can see how much noise there is in
  # a specific block.
  "blocks": {
    # The id of the remote snippet block.
    "77259f5c2d6e879ed7ca62a1172dbb78": {
      # Information about amount of noise, in chars.
      "chars": {
        "noise": 455,
        "total": 567
      },
      # List of types of noise detected in this specific block.
      "types": [
        "SimpleCode",
        "Comment",
        ... # potentially many types.
      ]
    }
    # ... potentially many blocks present
  },
  # Information about amount of noise in the match.
  # This is the most important piece of information when doing match filtering.
  # It adapts to the specific match (file or snippet).
  "chars": {
    "noise": 455,
    "total": 567
  },
  # Information about noise in the remote file.
  # Even if it's a snippet match, this describes noise in the remote
  # file as a whole.
  "file": {
    "chars": {
      "noise": 21373,
      "total": 100660
    },
    # Set to something other than null, empty string or "None" if the
    # remote file was classified on noise at file-level.
    "type": "Junk"
  }
}

Vulnerability

This object contains vulnerability details from a VSF scan.

"vulnerability": {

  # The unique id of the vulnerabiltiy, in this case a CVE from NVD (nvd.nist.gov).
  "id": "CVE-2014-3506",

  # The official vulnerabilitiy json document.
  # In the case of CVEs, this would be from nvd.nist.gov, provided as-is.
  "details": { ... },

  # The URL to the vulnerability description.
  "url": "https://nvd.nist.gov/vuln/detail/CVE-2014-3506"
}

Server Error

When you get a server error, for example with status code 500, the CLI detects this and outputs an error JSON.

The error JSON from the CLI looks like this:

{
  # Path of the local file.
  "local_path": "test-scan",

  # Match type is error.
  "type": "error",

  # Reason for error.
  "reason": "invalid response",

  # Response code from the server.
  "response_code": 500,

  # Date of error (ISO 8601).
  "date": "2020-10-25T12:38:11Z",

  # The response from the server.
  # This could be:
  #  - A complete JSON document from the server.
  #  - Another response, for example an error from a proxy (non JSON).
  "response": "..."
}

Some server errors are JSON documents embedded inside the CLI error response field.
You can extract the response manually or (for example) using the jq utility:

$ fossid ... | jq .response -r | jq

This yields a document looking similar to the following:

{
  # Match type is error.
  "type": "error",

  # The error text.
  "error": "Failed to execute command",

  # Details about the error, in this case the command that triggered the error.
  "error_details": "Alfred missing or cannot be executed",

  # The host that generated the error.
  "host": "...",

  # The date of the error (ISO 8601).
  "error_date": "2020-10-25T12:33:54.918Z",

  # The version of the software serving the request.
  "version": "3.7.0"
}

If you have scanned a lot of files, and need to know which files failed, you can do the following to extract all “local_path” from error jsons:

$ jq -r 'select(.type == "error") | .local_path' result.ndjson

FossID Toolbox User Guide

Supported Platforms

Prerequisites

Command: filescan

Ignore projects

Origin of the code: Oldest version

Command: diffscan

Notable features of diffscan

Ways of integrating

Scan Mode

Policy support

Ignore projects

Known Limitations

Example: GitHub Integration

Example: GitLab Integration

Example: Azure DevOps

Example: Docker run

Ignoring Projects

Appendix A: Filescan Response

Overview

Response

General Structure

Component

File

Licenses

Copyright

Snippet

Highlight

Noise

Vulnerability

Server Error

Command: `filescan`

Command: `diffscan`

Notable features of `diffscan`