Skip to content

Complexity in validate SPDX ID causing slowdown #742

@lumjjb

Description

@lumjjb

There is quite a bit of slowdown in the validation routine of the SPDX document, one potentially offender seems to be this function which loads in the entire list of IDs in a document over and over for each call, with a linear search for the ID each time.

def is_spdx_id_present_in_document(spdx_id: str, document: Document) -> bool:
all_spdx_ids_in_document: List[str] = get_list_of_all_spdx_ids(document)
return spdx_id in all_spdx_ids_in_document

This came up due to slowdown when running ntia-checker

  133911027 function calls (133748641 primitive calls) in 31.696 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    585/1    0.007    0.000   31.700   31.700 {built-in method builtins.exec}
        1    0.000    0.000   31.700   31.700 ntia-checker:1(<module>)
        1    0.000    0.000   31.454   31.454 main.py:42(main)
        1    0.000    0.000   31.446   31.446 sbom_checker.py:19(__init__)
        1    0.000    0.000   30.759   30.759 document_validator.py:19(validate_full_spdx_document)
        1    0.007    0.007   25.766   25.766 relationship_validator.py:12(validate_relationships)
     3996    0.035    0.000   25.758    0.006 relationship_validator.py:22(validate_relationship)
    15428    0.080    0.000   25.724    0.002 spdx_id_validators.py:46(validate_spdx_id)
     7992    0.363    0.000   25.589    0.003 spdx_id_validators.py:25(is_spdx_id_present_in_document)
     7993    0.067    0.000   25.229    0.003 spdx_id_validators.py:31(get_list_of_all_spdx_ids)
     7993    0.032    0.000   24.981    0.003 document_utils.py:11(get_contained_spdx_element_ids)
     7993    8.097    0.001   23.264    0.003 document_utils.py:12(<listcomp>)
 59639927    9.758    0.000   16.273    0.000 dataclass_with_properties.py:46(get_field)
 60112853    6.556    0.000    6.556    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    4.771    4.771 package_validator.py:22(validate_packages)
      435    0.002    0.000    4.771    0.011 package_validator.py:36(validate_package_within_document)
     7871    0.019    0.000    4.748    0.001 license_expression_validator.py:26(validate_license_expression)
      148    0.054    0.000    3.700    0.025 __init__.py:812(get_spdx_licensing)
      148    0.001    0.000    3.004    0.020 __init__.py:860(build_spdx_licensing)
<TRUNCATED>

Ask:

Could there be a function that would be able to do this on multiple invocations that uses a dictionary?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions