Skip to content

SetImpliedBits is inefficient #155808

@wjr-z

Description

@wjr-z

Most Implies only have a small amount of bits.

for (const SubtargetFeatureKV &FE : FeatureTable)
    if (Implies.test(FE.Value))
      SetImpliedBitsImpl(Bits, FE.Implies.getAsBitset(), FeatureTable);

But it has to traverse all of FeatureTable.
And maybe same implies visited more than once (Simple debugging, uncertain).

A more effective way is to create a bitmask for the FeatureTable, and only need to traverse intersection of them.
A simple fix below (Need to add a data function for FeatureBitset):

static
void MySetImpliedBitsImpl(FeatureBitset &Bits, const FeatureBitset &Implies,
                    ArrayRef<SubtargetFeatureKV> FeatureTable) {
   std::array<uint16_t, MAX_SUBTARGET_FEATURES> featureMap;
   uint16_t idx = 0;
   FeatureBitset Mask;
   for (const auto & FE: FeatureTable) {
      Mask.set(FE.Value);
      featureMap[FE.Value] = idx++;
   }

   FeatureBitset impl(Implies);

   while (true) {
    Bits |= impl;
    auto newImplies = Mask & impl;
    if (newImplies.none()) {
      break;
    }

    Mask ^= newImplies;
    impl = FeatureBitset();
    
    for (size_t i =0 ;i < MAX_SUBTARGET_WORDS; ++i) {
      uint64_t bits = newImplies.data()[i];
      while (bits) {
        unsigned Bit = llvm::countr_zero(bits);
        bits &= bits - 1;
        unsigned idx = i * 64 + Bit;
        impl |= FeatureTable[featureMap[idx]].Implies.getAsBitset();
      }
    }
   }
}

This will ensure a worst-case complexity of O (n), n is size of FeatureTable.

It should be similar for ClearImpliedBits as well.

In my case, this will reduce the overhead in perf from 4% to approximately 0.3
If tables can be generated in compile time, it will be faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions