Skip to content

Question regarding license_expression_parser behavior #806

@billie-alsup

Description

@billie-alsup

src/spdx_tools/spdx/parser/jsonlikedict/license_expression_parser.py uses License().parse(expr) directly, rather than get_spdx_licensing().parse(expr) as used in parser/tagvalue/parser.py. The difference results in a different LicenseSymbol for GPl-2.0, e.g.

>>> from license_expression import Licensing
>>> Licensing().parse('GPL-2.0')
LicenseSymbol('GPL-2.0', is_exception=False)
>>> from license_expression import get_spdx_licensing
>>> get_spdx_licensing().parse('GPL-2.0')
LicenseSymbol('GPL-2.0-only', aliases=('GPL-2.0', 'GPL 2.0', 'LicenseRef-GPL-2.0'), is_exception=False)
>>> 

As you can see, GPL-2.0-only is the official name, and GPL-2.0 is an alias. However, when parsing directly with Licensing(), we get a GPL-2.0 node, rather than a GPL-2.0-only node. This causes problem later in validation, when GPL-2.0 comes back as an invalid symbol, e.g.

2024-06-18 16:28:31,476:WARNING:root: Unrecognized license reference: GPL-2.0. license_expression must only use IDs from the license list or extracted licensing info, but is: GPL-2.0
2024-06-18 16:28:31,476:WARNING:root: ValidationContext(spdx_id=None, parent_id='SPDXRef-base-files-Package-base-files', element_type=<SpdxElementType.LICENSE_EXPRESSION: 1>, full_element=LicenseSymbol('GPL-2.0', is_exception=False))

I'm wondering if this is expected behavior (and you do not with to allow aliases), or if this is a bug. Should I filter my json file in advance to switch to GPL-2.0-only ? Certainly GPL-2.0 should not be listed in the extracted_licensing_info section (as that would require changing it to LicenseRef-GPL-2.0 or similar), right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions