FossID Documentation

Command Line Interface (CLI) Introduction

FossID CLI (often referred to as client) is a command line application that analyzes source code and checks similarities against the FossID open source database (Knowledge Base). It generates cryptographic hashes from the provided source code, which are exchanged with the database. This means that there is no source code exchange between the client application and the database servers.

It can either be accessed directly via the shell, or indirectly as the communication interface used by the Workbench and API.

OVERVIEW

Usage:

./fossid-cli [OPTION...] <TARGET-1, TARGET-2, ...TARGET-n>

Where TARGET can be either a single file, or a directory for recursive scanning.

Please see INTERPRETING THE SCANNING RESPONSE

CONFIGURATION

Server configuration

  --host H             sets server hostname and optional port (i.e. https://hostname:80).
                       can be used to specify multiple hosts: host1,host2:8080,...
  --token T            sets user account token
  --config P           specify path of the configuration file
                       Default: autodetection will search in multiple folders, in the following order:
                         - current working directory
                         - exe directory
                         - $XDG_CONFIG_HOME/fossid/ on Linux
                         - %APPDATA%/fossid/ on Windows
                         - ~/Library/Application Support/FossID/ on Mac
                         - /fossid/etc
                         - /etc
                       NOTE: support for the .cfg extensionn has been removed.

Scan configuration

  --limit N, -l N      limits output to N most significant matches
                       (default: 10)
  --sensitivity N      sets snippet sensitivity to a minimum of N lines
                       (default: 10)
  --fields F, -f F     modify the JSON result (see FIELDS for examples)
  --hide-nomatch       omits non-matching files from results
  --pretty             indent JSON response for easier screen reading
  --threads N, -t N    use N number of threads when scanning multiple targets
                       (default: auto)
  --symlinks           follows symlinks (default: disabled, Linux ONLY)
  --retry N            retries N times in case of network issues (default: 2 times)
  --retry-wait W       before retrying, wait W seconds (default: 10 seconds)
  --timeout N          http connections will timeout after N seconds (default: 300)
  --skip-binaries      skips binary files regardless of the file extension
                       (based on chr(0) detection)
  --marker             Sets the snippet marker used with --mirror action
                       (default: "==>")
  --debug              write debug information to fossid-cli.log
  --local              Generate signatures for Blind Audits (on STDOUT). Please see BLIND AUDITS.
  --alfred-version V, -a V  Enable Alfred version V. Example: -a3.1.7r
  --cli-user-agent X   Sets a custom HTTP header x-fossid-cli-user-agent to the specified value.
                       Example: CustomApp/1.0
  --scan-depth D       Sets a scan depth to speed up scans at the cost of match quality. (Default: 1.0)
                       This feature is experimental and might be removed in the future.
                       Example: 0.1, for 10% scan depth
                       Max value: 1
  --match-format N     Select match format, default: 2. Please refrain from using this unless you
                       need to use match format 1 temporarily during a transition to match format 2.
  --scoring-profile P  Select scoring profile. 0 means enable Legacy Scoring Profile. If you do not specify
                       scoring profile the server decides.
  --ssnf               Enable Server Side Noise Filtering. This means matches that the server considers noise
                       are filtered.

Scan Modes

The following options allow changing scan mode from the default scan mode.

The Default scan mode will provide results from both regular and uc volumes, if available.

  --regular             performs a regular scan.
                        (requires that the server has enabled volumes of type regular)
  --uc                  performs a User Contribution (Stack Overflow) scan.
                        (requires that the server has enabled volumes of type uc)
  --vsf                 performs a VulSnipperFinder scan.
                        (requires that the server has enabled volumes of type vsf)

Signature generation

  --dependency-analysis E  whether to enable (1) or disable (0) dependency analysis. Default: disabled.
                           this will include the file contents of package files (package.json, pom.xml, etc)
                           into the signature.
  --full-file-only         perform only full file matching
  --snippet-only           perform only snippet matching
  --min-file BYTES         skip files smaller than BYTES number of bytes
                           (default: 0, which means do not skip any files)
  --max-file BYTES         skip snippet search past the first BYTES number of bytes
                           (default: 524288 bytes)
  --strip-tags EXTS        strips html/xml tags from comma-separated list of extensions
                           (for example: html,xml)
  --send-filename N        enable (1) or disable (0) including the filename in the signature
                           when scanning (not for local signature generation).
                           The filename is used to provide better source code identification.
                           (default: 1)
  --enable-sha1 B          whether to enable or disable sha-1 hashes. Default: disabled.
                           SHA1 hashes are useful for SPDX-report generation.

Exclude

Please see the EXCLUDE FUNCTIONALITY section below for more information about the exclude functionality.

  --exclude-pattern P  do not scan entries matching pattern P
                       (Example: --exclude-pattern .zip$ for all zip files)
  --exclude-from F     do not scan entries matching patterns in file F
  --exclude-dir D      do not scan directory D
                       (Example: --exclude-dir .git to skip all .git directories)

Actions

  --component-details C   get component details for component C (Please see MATCH FORMAT for output details)
                          C can either be a URL or a match.component.id
  --component-licenses C  get component licenses for component C
                          add --underlying for underlying licenses.
  --mirror M              returns the source code for the given mid or match.file.id (in original encoding)
  --mirror-utf8 M         returns the source code for the given mid or match.file.id (in UTF-8 encoding)
  --convert-to-utf8 F     returns the local file F in UTF-8 encoding
  --highlight-local M,F   highlights matching snippets from MID in local FILE
  --cpe                   looks up CVEs for the given comma-separated list of CPEs
  --stdin-signatures      read signatures from stdin
  --stdin-paths           read file paths from stdin (not folders)
  --stdin-file P          read file from stdin (where P is path which goes into signature)
  --credits               displays credits (and exits)
  --help                  displays this help
  --help-scan             displays scan overview help
  --help-response         displays scan response help
  --version, -v           displays version (and exits)

Snippet Highlighting

These commands are used to highlight snippets found in partial file matches.

Example for highlighting a local file:

$ fossid-cli -h /path/to/local/file -i '{json from match.snippet.local_highlight}'  

Example for highlighting a remote file:

$ fossid-cli -h 4e0f2d05ccc21a752459bb0800000000 -i '{json from match.snippet.remote_highlight}'

NOTE: this might look slightly different on different shells or on different platforms. The goal here is to provide the CLI with the unmodified JSON. On windows for example, you might even need to escape each quotation mark. When the CLI fails to parse the JSON, it tells you that the input is invalid and shows you what it received, so that you can adjust the input accordingly.

NOTE: if you have strip-tags enabled for scanning, you need to have it enabled for highlighting aswell.

  --highlight F,-h F           highlight a file or a file id.  
                               F is either a path to a local file for local highlighting,
                               or the ID of the remote file (match.file.id) for remote highlighting
  --highlight-input I, -i I    the local or remote snippet highlight data from the match json.  
                               use match.snippet.local_highlight for local highlighting, and  
                               match.snippet.remote_highlight for remote highlighting.

Whitelisting

  --whitelist W        specifies a whitelist for scanning, or for
                       --whitelist-{rm,add,ls} if present
  --whitelist-rm R     remove rule from a whitelist using rid R
                       (requires --whitelist)
  --whitelist-add M    add whitelist rule to a whitelist using mid M
                       (requires --whitelist)
  --whitelist-ls [W]   list whitelist rules for specified whitelist,
                       or lists all whitelists if argument omitted

Testing

  --test-scan          performs a scan request (can be combined with --vsf)
  --test-nomatch       performs a no-match test (tests server performance)
  --test-benchmark     performs a benchmark test (tests server performance)
  --test-route         shows the route a scan request takes
  --test-route-json    shows the route a scan request takes, in json format

Proxy

  --proxy-host         specifies the proxy host (with or without protocol://)
  --proxy-port         specifies the proxy port
  --proxy-user         specifies the proxy user
  --proxy-pass         specifies the proxy password
  --proxy-secure-pass  specifies an encrypted proxy password (see --password)
  --password PASS      encrypts the provided password
                       (use with --proxy-secure-pass or cli_proxy_secure_pass)
  --proxy-cert         specifies the client certificate location
  --proxy-key          specifies the client private key location

Score list

Options related to the maintenance of the score list. Use these to modify priorities in the result list by assigning a predefined score (S) to a given author (AU) and artifact (AR) pair. Setting a score to 0 will hide the matches if there are other matches found with score > 0.

   --score-list-set AU,AR,S   set score S for AU artifact AR
                              (example: --score-list-set linux,linux_kernel,9999)
   --score-list-clear AU,AR   clear score for AU artifact AR
                              (example: --score-list-clear linux,linux_kernel)
   --score-list-ls            display score list

SSL options

  --ssl-verify N              SSL verification on (1) or off (0) (default: 1)
  --ssl-revoke-best-effort N  Set to 1 to perform certificate revocation checks in a "best effort" manner.
                              Windows only. Please see https://curl.se/libcurl/c/CURLOPT_SSL_OPTIONS.html
                              (CURLSSLOPT_REVOKE_BEST_EFFORT) for more information. (Default: 0)

CONFIGURATION FILE

Please see the example fossid.conf provided with the package.

ENVIRONMENT VARIABLES

The following environment variables are used by the CLI:

- FOSSID_CLI_HOST  
  takes precedence over cli_server_host, but not --host
- FOSSID_CLI_TOKEN
takes precedence over cli_token but not --token
- http_proxy / https_proxy
  Please see PROXY SUPPORT THROUGH ENVIRONMENT VARIABLES below.

PROXY SUPPORT THROUGH ENVIRONMENT VARIABLES

You can use the environment variables https_proxy or http_proxy, i.e.:

  $ export http_proxy="http://user:password@example.com:8080"

for more information please see curl proxy documentation:
curl proxy documentation

OPTION PRECEDENCE

The general rule of option precedence is the following:

  1. Command line
  2. Environment variables
  3. Configuration file
  4. Default option, if available

NOTE: configuring proxy using command line or configuration file takes precedence over proxy environment variables.

FIELDS

Using --fields flag it’s possible to enable or disable fields in the match result.

Example:

 --fields -score                               (remove the score field)
 --fields component.author,component.artifact  (show only these specific fields)

BLIND AUDITS

The CLI is capable of generating file signatures for blind audit scanning using the --local flag.

Example:

$ fossid-cli --local /path/to/file
  {"path":"/path/to/file","hashes_ffm":[{"format":1,"data":"XrY7u+Ae7tCTyyK7j1rNww"}]}

A file can be ignored due to various reasons:

  • file not found
  • file not readable
  • skipping symlinks (Please see --symlinks)
  • skipping binary files (Please see --skip-binaries)

Example:

$ fossid-cli --local /path/to/file
  {"path":"/path/to/symlink","ignore_reason":"skipping symlink"}

ERROR HANDLING

A successful run of the CLI returns exit status 0.

The CLI can encounter unrecoverable exceptions, in which case it will stop scanning, print the error on stderr and exit with non-zero error code. For example:

  • Network related issues (timeout, DNS lookup failed, etc. Please see --retry flag.)
  • Other unexpected issues (out of memory, unexpected filesystem errors)

Some errors are less serious, for example invalid responses from internal proxy servers. These will not interrupt the scan but let the scan finish. This benefits large long-running scans that would otherwise have to be restarted. You can read more about these kinds of errors in the section INTERPRETING THE SCANNING RESPONSE.

EXCLUDE FUNCTIONALITY

The CLI can keep a list of directories or patterns that it matches against all files and directories that it encounters during scanning. If one of these patterns match a file, it will be skipped. If it’s a directory, it will not be recursed.

The directories added using --exclude-dir uses plain text matching in order to find matches.

The patterns added using --exclude-pattern and --exclude-from uses more advanced pattern matching following the ECMAScript regular expression pattern syntax.

The option --exclude-from points out a file containing one pattern on each line. Lines starting with # are ignored, and can be used as comments. Example patterns (for --exclude-pattern and --exclude-from):

# skip all .git directories
\.git/
# skip all zip files
\.zip$
# skip all hidden files and directories
(^|\/|\\)\..+

Tip: The CLI will log when it excludes a path from scanning, when you run it with --debug. Use this to make sure you don’t get unexpected exclusions.

UTF8 SUPPORT

Since the match results are in JSON, the CLI needs to be run in a UTF-8 enabled console in order to fully support the JSON response from the server. If the console is not UTF-8 compatible, the CLI will output a warning on stderr. It’s known for example that certain versions of PowerShell generate UTF16 files when output is redirected. This behaviour is possible to override with some PowerShell configuration.

INTERPRETING THE SCANNING RESPONSE

This section describes the different kinds of responses you can get from the CLI when scanning. This can be a match result or an error when something goes wrong. The responses are encoded in JSON. You can view the below information directly from fossid-cli by running:

./fossid-cli --help-response

RESPONSE

When scanning a file a few different things can happen:

  • You get a match (type component, file or partial):
    {"type":"component/file/partial", "component": {"artifact":"linux","version":"4.6", ...}, ...}

  • You get a vsf match:
    {"type":"vulnerability", "vulnerability": {"id": "CVE-2024-...", ...}, ...}

  • The file was ignored:
    {"type":"ignored", "ignore_reason": "skipping empty file", ...}

  • You can get an invalid or empty response from the server:
    {"type":"error", "reason":"...", "response": "...", "response_code": 500, ...}

    See SERVER ERROR

  • There was a filtered match (see whitelisting options):
    {"type":"filtered", "wid":"<whitelist id>", "rid":"<whitelist rule id>"}

  • There was a match with noise that was removed (if --ssnf is enabled):
    {"type":"noise", "noise":{...}}

MATCH RESPONSE FOR MATCH FORMAT 2

Please note that this is a fabricated example.

GENERAL STRUCTURE

Depending on the type of the match, not all objects in the match are present.

{
  # Type of match:
  # - component: a match to a full component, for example a .jar file or other archives.
  # - file: a full file match to a file within a component.
  # - partial: a match to a snippet.
  # - vulnerability: a vsf match.
  "type": "partial",

  # A unique identifier for this specific match.
  "id": "387a0b23df7dc465",

  # Component details (see section COMPONENT).
  "component": { ... },

  # File details (see section FILE).
  "file": { ... },

  # Snippet details (see section SNIPPET).
  "snippet": { ... }

  # Noise details (see section NOISE).
  "noise": { ... }

  # Vulnerability details for VSF scans (see section VULNERABILITY).
  "vulnerability": { ... }
}

COMPONENT

The component object includes all component details.

"component": {
  # A unique ID for this particular component.
  "id": "9455f745c67462ead575ed8000000000",

  # The author of the component.
  "author": "The Bootstrap Authors",

  # The name of the component.
  "artifact": "bootstrap",

  # The version of the component.
  # Please note that this is the oldest identified version.
  # Example: If a match is found in version 4.1.2, 4.1.3 and 4.1.4 of a
  # component, we report 4.1.2.
  # This means that if you scan a certain version of a component,
  # we might report an earlier version.
  "version": "4.1.2",

  # The release date of the component, in ISO 8601 format.
  # Please note this means the date can have many different appearances.
  "release_date": "2018-07-12",

  # The url to the archive, to download.
  "url": "https://registry.npmjs.org/bootstrap/-/bootstrap-4.1.2.tgz",

  # The component licenses (see section LICENSES).
  "licenses": [ ... ],

  # The component copyright (see section COPYRIGHT).
  "copyright": [ ... ],

  # The id to the file identified as being the component license file.
  "license_file_id": "568d01e0545c0c51b17c063100000000",

  # The path to the file identified as being the component license file.
  "license_file_path": "LICENSE",

  # The purl to the archive (https://github.com/package-url/purl-spec).
  "purl": "pkg:npm/bootstrap@4.1.2"
}

FILE

The file object includes all file details.

"file": {
  # A unique ID for this particular file.
  "id": "4e0f2d05ccc21a752459bb0800000000",

  # The path to this file inside the component.
  "path": "js/src/button.js",

  # The size of this file in bytes.
  "size": 4699,

  # Whether or not this file is available in the knowledge base mirror.
  "available": true,

  # The encoding of this file.
  "encoding": "UTF-8",

  # The file licenses (see section LICENSES).
  "licenses": [ ... ],

  # The file copyright (see section COPYRIGHT).
  "copyright": [ ... ]
}

LICENSES

This array contains a list of licenses. The licenses can be of different types:

  • DECLARED: the license as declared by the author
  • LICENSE: a license text
  • EXCEPTION: a license exception
  • REFERENCE: a reference to a license or exception
  • WARNING: a license warning
  • LINK: a link to a license
"licenses": [
  # An example of a declared license.
  {
    # The type of license, in this case a declared license.
    "type": "DECLARED",

    # The license ID.
    "id": "MIT"
  },

  # An example of an extracted license.
  {
    # The type of license, in this case a license reference.
    "type": "LICENSE",

    # The license ID.
    "id": "BSD-3-Clause",

    # The name of the license.
    "name": "BSD 3-Clause \"New\" or \"Revised\" License",

    # A unique id for this license (not always present).
    "reference": "fossid_BSD-3-Clause.json",

    # Offset and length in chars for the license.
    # Can be used for highlighting of the license.
    "offset": 973127,
    "length": 1533,
    
    # The probability of this license being correct, as determined by
    # the license extractor.
    "probability": 0.9417472,

    # How much modification has been detected by the license extractor.
    "modification": 0.057017542,
  },

  # An example of an extracted license reference.
  # Please see earlier examples for meaning of fields.
  {
    "type": "REFERENCE"
    "id": "MIT",
    "name": "MIT License",
    "reference": "MIT.json",
    "offset": 142,
    "length": 18,
    "probability": 1
  }
]

This array contains a list of copyright statements.

"copyright": [
  {
    # The complete copyright statement.
    "default": "(c) 1998-2016 The OpenSSL Project",

    # The name of the copyright holder.
    "name": "The OpenSSL Project",

    # Offset and length in chars for the copyright.
    # Can be used for highlighting of the copyright.
    "offset": 508,
    "length": 33,

    # Year of the last revision.
    "year_last_revision": "2016",

    # Year of the publication.
    "year_publication": "1998"
  }
]

SNIPPET

"snippet": {
  # A unique ID of the snippet.
  "id": "570ccfec92d089df6beeafa52bbc0a91",

  # The size of the snippet in the local file, in hashes.
  "local_size": 10,

  # The coverage in percent of the snippet in the local file.
  "local_coverage": 0.0901,

  # The highlight data for the snippet in the local file. (see HIGHLIGHT)
  "local_highlight": { ... },

  # The size of the snippet in the remote file, in hashes.
  "remote_size": 10,

  # The coverage in percent of the snippet in the remote file.
  "remote_coverage": 0.084,

  # The highlight data for the snippet in the remote file. (see HIGHLIGHT)
  "remote_highlight": { ... }
}

HIGHLIGHT

"local/remote_highlight": {
  # A unique id of this highlighting.
  "id": "570ccfec92d089df6beeafa52bbc0a91",

  # Array of blocks, each containing highlight ranges.
  "blocks": [
    {
      # A unique id identifying this block.
      "id": "39175286b39340afc4f0c10f43bdecd4",

      # Range in chars for this highlight block.
      "char_range": {
        # Begin char
        "begin": 2362,

        # End char (including).
        # begin = 0 and end = 0 mean 1 char.
        "end": 2838
      }

      # NOTE: byte_range is deprecated, and replaced by char_range.
    }
  ],

  # Encoding of the remote file.
  "encoding": "UTF-8"
}

NOISE

The server runs noise classification on all matches. This information is then returned in the “noise” object. The information here can be used to highlight or filter matches locally.

{
  # For partial matches, blocks matches noise data to specific snippet blocks.
  # Using chars.total and chars.noise, you can see how much noise there is in
  # a specific block.
  "blocks": {
    # The id of the remote snippet block.
    "77259f5c2d6e879ed7ca62a1172dbb78": {
      # Information about amount of noise, in chars.
      "chars": {
        "noise": 455,
        "total": 567
      },
      # List of types of noise detected in this specific block.
      "types": [
        "SimpleCode",
        "Comment",
        ... # potentially many types.
      ]
    }
    # ... potentially many blocks present
  },
  # Information about amount of noise in the match.
  # This is the most important piece of information when doing match filtering.
  # It adapts to the specific match (file or snippet).
  "chars": {
    "noise": 455,
    "total": 567
  },
  # Information about noise in the remote file.
  # Even if it's a snippet match, this describes noise in the remote
  # file as a whole.
  "file": {
    "chars": {
      "noise": 21373,
      "total": 100660
    },
    # Set to something other than null, empty string or "None" if the
    # remote file was classified on noise at file-level.
    "type": "Junk"
  }
}

VULNERABILITY

This object contains vulnerability details from a VSF scan.

"vulnerability": {

  # The unique id of the vulnerabiltiy, in this case a CVE from NVD (nvd.nist.gov).
  "id": "CVE-2014-3506",

  # The official vulnerabilitiy json document.
  # In the case of CVEs, this would be from nvd.nist.gov, provided as-is.
  "details": { ... },

  # The URL to the vulnerability description.
  "url": "https://nvd.nist.gov/vuln/detail/CVE-2014-3506"
}

SERVER ERROR

When you get a server error, for example with status code 500, the CLI detects this and outputs an error JSON.

The error JSON from the CLI looks like this:

{
  # Path of the local file.
  "local_path": "test-scan",

  # Match type is error.
  "type": "error",

  # Reason for error.
  "reason": "invalid response",

  # The response from the server.
  # This could be:
  #  - A complete JSON document from the server.
  #  - Another response, for example an error from a proxy (non JSON).
  "response": "...",

  # Response code from the server.
  "response_code": 500,

  # Date of error (ISO 8601).
  "date": "2020-10-25T12:38:11Z"
}

Some server errors are JSON documents embedded inside the CLI error response field. You can extract the response manually or (for example) using the “jq” utility:

fossid-cli ... | jq -r .response | jq

This yields a document looking like the following:

{
  # Match type is error.
  "type": "error",

  # The error text.
  "error": "Failed to execute command",

  # Details about the error, in this case the command that triggered the error.
  "error_details": "SCAN",

  # The host that generated the error.
  "host": "node0",

  # The date of the error (ISO 8601).
  "error_date": "2020-10-25T12:33:54.918Z",

  # The exit status of the command.
  "error_exit_status": 1,

  # The version or the node serving the request.
  "version": "3.1.14"
}