-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Open
Labels
Description
Most Implies only have a small amount of bits.
for (const SubtargetFeatureKV &FE : FeatureTable)
if (Implies.test(FE.Value))
SetImpliedBitsImpl(Bits, FE.Implies.getAsBitset(), FeatureTable);
But it has to traverse all of FeatureTable.
And maybe same implies visited more than once (Simple debugging, uncertain).
A more effective way is to create a bitmask for the FeatureTable, and only need to traverse intersection of them.
A simple fix below (Need to add a data function for FeatureBitset):
static
void MySetImpliedBitsImpl(FeatureBitset &Bits, const FeatureBitset &Implies,
ArrayRef<SubtargetFeatureKV> FeatureTable) {
std::array<uint16_t, MAX_SUBTARGET_FEATURES> featureMap;
uint16_t idx = 0;
FeatureBitset Mask;
for (const auto & FE: FeatureTable) {
Mask.set(FE.Value);
featureMap[FE.Value] = idx++;
}
FeatureBitset impl(Implies);
while (true) {
Bits |= impl;
auto newImplies = Mask & impl;
if (newImplies.none()) {
break;
}
Mask ^= newImplies;
impl = FeatureBitset();
for (size_t i =0 ;i < MAX_SUBTARGET_WORDS; ++i) {
uint64_t bits = newImplies.data()[i];
while (bits) {
unsigned Bit = llvm::countr_zero(bits);
bits &= bits - 1;
unsigned idx = i * 64 + Bit;
impl |= FeatureTable[featureMap[idx]].Implies.getAsBitset();
}
}
}
}
This will ensure a worst-case complexity of O (n), n is size of FeatureTable.
It should be similar for ClearImpliedBits as well.
In my case, this will reduce the overhead in perf from 4% to approximately 0.3
If tables can be generated in compile time, it will be faster.