Skip to content

feat(core): add delegate.hpp as std::function alternative #713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

qq978358810
Copy link
Contributor

@qq978358810 qq978358810 commented Aug 3, 2025

feat: replace std::function with lightweight delegate based on EnTT

  • Implement lightweight delegate wrapper under taskflow/utility/delegate.hpp
  • Replace internal std::function usage with small-buffer optimized delegate
  • Delegate implementation adapted from EnTT's signal module:
    https://github.com/skypjack/entt/tree/master/src/entt/signal
  • Improves performance in tight-loop callable invocations (2x faster than std::function)
  • Verified correctness and benchmarked via test cases under benchmark/test_delegate.cpp

taskflow/
├── utility/
│ └── delegate.hpp

This PR introduces a lightweight delegate wrapper in taskflow/utility/delegate.hpp to replace performance-heavy std::function for callable storage and invocation within the Taskflow runtime.

The delegate is adapted from the EnTT library (MIT license), specifically from:
https://github.com/skypjack/entt/tree/master/src/entt/signal

✅ Includes benchmark (test_delegate.cpp) showing ~2x performance improvement.
✅ API remains flexible with support for free functions, lambdas, and functors.

#include "taskflow/utility/delegate.hpp"

int normal_func(int x) noexcept{
    return 1;
}

int main() {
    constexpr int iterations = 100'000'000;
    int result = 0;


    // std::function
    {
        auto start = std::chrono::high_resolution_clock::now();
        std::function<int(int)> func = normal_func;
        for (int i = 0; i < iterations; ++i) {
            result += func(i);
        }
        auto end = std::chrono::high_resolution_clock::now();
        std::cout << "std::function:   "
                  << std::chrono::duration<double>(end - start).count() << "s\n";
    }

    // event::delegate
    {
        auto start = std::chrono::high_resolution_clock::now();
        tf::delegate<int(int)> del;
        del.connect<&normal_func>();
        for (int i = 0; i < iterations; ++i) {
            result += del(i);
        }
        auto end = std::chrono::high_resolution_clock::now();
        std::cout << "tf::delegate:  "
                  << std::chrono::duration<double>(end - start).count() << "s\n";
    }



    std::cout << "result: " << result << std::endl;

    return 0;

    // release
    // std::function:   0.24929s
    // tf::delegate:  0.100528s
    // result: 200000000
}

- Implement lightweight delegate pattern for task callable storage
- Replace direct std::function usage with type-erased delegate wrapper
- Support efficient inline storage for small callables (SBO optimization)
@qq978358810
Copy link
Contributor Author

Usage Example:

//#define TF_ENABLE_DELEGATE

#ifdef TF_ENABLE_DELEGATE
#include <taskflow/taskflow.hpp>
#include <chrono>
#include <iostream>
#include <vector>
#include <numeric>

int main() {
    tf::Executor executor;
    tf::Taskflow taskflow;

    const std::vector<int> num_calls_list = {100000, 1000000, 10000000};
    const int num_runs = 5;


    std::vector<double> avg_times;


    int result = 0;
    taskflow.emplace<[](int& result) mutable {
        result++;
    }>(result);


    for (int num_calls : num_calls_list) {
        std::vector<long long> durations;
        std::cout << "\nTesting with " << num_calls << " calls:\n";


        for (int run = 0; run < num_runs; ++run) {
            result = 0;

            auto start = std::chrono::high_resolution_clock::now();


            executor.run_n(taskflow, num_calls).wait();

            auto end = std::chrono::high_resolution_clock::now();
            auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();

            std::cout << "average time Run " << run + 1 << ": " << duration / (double)num_calls << " ns (result = " << result << ")\n";

        }
    }
    return 0;


// Testing with 100000 calls:
// Run 1: 14986700 ns (result = 100000)
// Run 2: 16282600 ns (result = 100000)
// Run 3: 15341900 ns (result = 100000)
// Run 4: 15342200 ns (result = 100000)
// Run 5: 16130900 ns (result = 100000)
// Average time for 100000 calls: 1.56169e+07 ns

// Testing with 1000000 calls:
// Run 1: 155600400 ns (result = 1000000)
// Run 2: 151828900 ns (result = 1000000)
// Run 3: 155381100 ns (result = 1000000)
// Run 4: 158884500 ns (result = 1000000)
// Run 5: 161871000 ns (result = 1000000)
// Average time for 1000000 calls: 1.56713e+08 ns

// Testing with 10000000 calls:
// Run 1: 1545152600 ns (result = 10000000)
// Run 2: 1579407400 ns (result = 10000000)
// Run 3: 1600501200 ns (result = 10000000)
// Run 4: 1606155200 ns (result = 10000000)
// Run 5: 1597838300 ns (result = 10000000)
// Average time for 10000000 calls: 1.58581e+09 ns

// Overall average time across all tests: 5.86047e+08 ns
}

#else
#include <taskflow/taskflow.hpp>
#include <chrono>
#include <iostream>
#include <vector>
#include <numeric>

int main432() {
    tf::Executor executor;
    tf::Taskflow taskflow;

    const std::vector<int> num_calls_list = {100000, 1000000, 10000000};
    const int num_runs = 5;


    std::vector<double> avg_times;


    int result = 0;
    taskflow.emplace([&]() mutable {
        result++;
    });

    for (int num_calls : num_calls_list) {
        std::vector<long long> durations;
        std::cout << "\nTesting with " << num_calls << " calls:\n";


        for (int run = 0; run < num_runs; ++run) {
            result = 0;


            auto start = std::chrono::high_resolution_clock::now();


            executor.run_n(taskflow, num_calls).wait();


            auto end = std::chrono::high_resolution_clock::now();
            auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();


            std::cout << "average time Run " << run + 1 << ": " << duration / (double)num_calls << " ns (result = " << result << ")\n";

        }
    }
    return 0;


// Testing with 100000 calls:
// Run 1: 16228200 ns (result = 100000)
// Run 2: 13067700 ns (result = 100000)
// Run 3: 15962700 ns (result = 100000)
// Run 4: 17233100 ns (result = 100000)
// Run 5: 16129700 ns (result = 100000)
// Average time for 100000 calls: 1.57243e+07 ns

// Testing with 1000000 calls:
// Run 1: 170002200 ns (result = 1000000)
// Run 2: 171117000 ns (result = 1000000)
// Run 3: 173660700 ns (result = 1000000)
// Run 4: 172738200 ns (result = 1000000)
// Run 5: 174223700 ns (result = 1000000)
// Average time for 1000000 calls: 1.72348e+08 ns

// Testing with 10000000 calls:
// Run 1: 1687622700 ns (result = 10000000)
// Run 2: 1720427500 ns (result = 10000000)
// Run 3: 1705274100 ns (result = 10000000)
// Run 4: 1723315600 ns (result = 10000000)
// Run 5: 1709833300 ns (result = 10000000)
// Average time for 10000000 calls: 1.70929e+09 ns

// Overall average time across all tests: 6.32456e+08 ns
}
#endif

@qq978358810
Copy link
Contributor Author

Usage Example:

#include <taskflow/taskflow.hpp>
#include <iostream>
#include <vector>

#ifdef TF_ENABLE_DELEGATE

struct MyClass {

    static void static_member() {
        std::cout << "MyClass::static_member()\n";
    }

    void member() {
        std::cout << "MyClass::member()\n";
    }

    void const_member() const {
        std::cout << "MyClass::const_member()\n";
    }

    virtual void virtual_member() {
        std::cout << "MyClass::virtual_member()\n";
    }

    void runtime_task(tf::Runtime& rt) {
        std::cout << "MyClass::runtime_task() with Runtime\n";
    }

    void subflow_task(tf::Subflow& sf) {
        std::cout << "MyClass::subflow_task() with Subflow\n";

        sf.emplace<[]() { std::cout << "  Subtask in subflow\n"; }>();
    }

    int condition_task() {
        std::cout << "MyClass::condition_task() returns 1\n";
        return 1;
    }

    tf::SmallVector<int> multi_condition_task() {
        std::cout << "MyClass::multi_condition_task() returns {1, 2}\n";
        return {1, 2};
    }

    void operator()() const {
        std::cout << "MyClass::operator\n";
    }
};

void free_function() {
    std::cout << "free_function()\n";
}

void function_with_ref(int& x) {
    ++x;
    std::cout << "function_with_ref: " << x << '\n';
}

void runtime_free_function(tf::Runtime& rt) {
    std::cout << "runtime_free_function() with Runtime\n";
}

void subflow_free_function(tf::Subflow& sf) {
    std::cout << "subflow_free_function() with Subflow\n";
    sf.emplace<[]() { std::cout << "  Subtask in subflow_free_function\n"; }>();
}

int condition_free_function() {
    std::cout << "condition_free_function() returns 0\n";
    return 0;
}

tf::SmallVector<int> multi_condition_free_function() {
    std::cout << "multi_condition_free_function() returns {0, 1}\n";
    return {0, 1};
}

int main() {
    tf::Executor executor;
    tf::Taskflow taskflow;

    int x = 0;
    MyClass obj;

    // 1. Static Task: Free function
    taskflow.emplace<&free_function>();
    std::cout << "Add Static Task: Free function\n";

    // 2. Static Task: Lambda without capture
    taskflow.emplace<[]() {
        std::cout << "lambda no capture\n";
    }>();
    std::cout << "Add Static Task: Lambda without capture\n";

    // 3. Static Task with Instance: Lambda with int& parameter
    taskflow.emplace<[](int& val) {
        ++val;
        std::cout << "lambda with int&: " << val << '\n';
    }>(x);
    std::cout << "Add Static Task with Instance: Lambda with int& parameter\n";

    // 4. Static Task with Instance: Free function with reference
    taskflow.emplace<&function_with_ref>(x);
    std::cout << "Add Static Task with Instance: Free function with reference\n";

    // 5. Static Task: Static member function
    taskflow.emplace<&MyClass::static_member>();
    std::cout << "Add Static Task: Static member function\n";

    // 6. Static Task with Instance: Non-static member function
    taskflow.emplace<&MyClass::member>(obj);
    std::cout << "Add Static Task with Instance: Non-static member function\n";

    // 7. Static Task with Instance: Const member function
    taskflow.emplace<&MyClass::const_member>(obj);
    std::cout << "Add Static Task with Instance: Const member function\n";

    // 8. Static Task with Instance: Const operator()
    taskflow.emplace<&MyClass::operator()>(obj);
    std::cout << "Add Static Task with Instance: Const operator()\n";

    // 9. Static Task with Instance: Virtual function
    taskflow.emplace<&MyClass::virtual_member>(obj);
    std::cout << "Add Static Task with Instance: Virtual function\n";

    // 10. Runtime Task: Free function
    taskflow.emplace<&runtime_free_function>();
    std::cout << "Add Runtime Task: Free function\n";

    // 11. Runtime Task with Instance: Member function
    taskflow.emplace<&MyClass::runtime_task>(obj);
    std::cout << "Add Runtime Task with Instance: Member function\n";

    // 12. Subflow Task: Free function
    taskflow.emplace<&subflow_free_function>();
    std::cout << "Add Subflow Task: Free function\n";

    // 13. Subflow Task with Instance: Member function
    taskflow.emplace<&MyClass::subflow_task>(obj);
    std::cout << "Add Subflow Task with Instance: Member function\n";

    // 14. Condition Task: Free function
    auto cond_task = taskflow.emplace<&condition_free_function>();
    std::cout << "Add Condition Task: Free function\n";

    // 15. Condition Task with Instance: Member function
    auto cond_task_instance = taskflow.emplace<&MyClass::condition_task>(obj);
    std::cout << "Add Condition Task with Instance: Member function\n";

    cond_task_instance.work<[](int& val) {
        ++val;
        std::cout << "12321321&: " << val << '\n';
    }>(x);

    // 16. MultiCondition Task: Free function
    taskflow.emplace<&multi_condition_free_function>();
    std::cout << "Add MultiCondition Task: Free function\n";

    // 17. MultiCondition Task with Instance: Member function
    taskflow.emplace<&MyClass::multi_condition_task>(obj);
    std::cout << "Add MultiCondition Task with Instance: Member function\n";

    // Set condition task dependencies (example)
    cond_task.precede(taskflow.emplace<[]() { std::cout << "Task after condition\n"; }>());
    cond_task_instance.precede(taskflow.emplace<[]() { std::cout << "Task after condition with instance\n"; }>());

    executor.run(taskflow).wait();

    return 0;
}

@tsung-wei-huang
Copy link
Member

tsung-wei-huang commented Aug 3, 2025

@qq978358810 thank you for this great effort! We indeed discussed whether or not to use delegate pattern during the early stage of the Taskflow design. However, one big challenge we found is the limited support for lambda - especially in applications where capturing local variables are essential. For instance, it's quite challenging to implement the following code nicely and cleanly using delegate:

taskflow.emplace([a=int{0}, &b, &c]() mutable {
  ++a;
});

Similarly, delegate pattern has hard time handling a slightly complicated scenario like this recursive parallelism. In this case, it would be users' jobs to manage the lifetime of all intermediate results, which can become very messy.

I might be wrong or missing something here, but having this delegate interface would be a great feature too.

@qq978358810
Copy link
Contributor Author

@qq978358810 thank you for this great effort! We indeed discussed whether or not to use delegate pattern during the early stage of the Taskflow design. However, one big challenge we found is the limited support for lambda - especially in applications where capturing local variables are essential. For instance, its

taskflow.emplace([a=int{0}, &b, &c]() mutable {
  ++a;
});

Or, how can this delegate pattern handle a slightly complicated scenario like recursive parallelism? In this case, it would be users' jobs to manage the lifetime of all intermediate results, which can become very messy.

I might be wrong or missing something here, but having this delegate interface would be a great feature too.

Thank you for your feedback! You are correct - my implementation does indeed have limitations, especially in terms of capturing variables, which significantly reduces usability. The fact that lambda does not support capturing local variables is a clear defect. Manually managing the lifecycle of intermediate results by users can indeed become quite troublesome. Thank you for your insights. I will consider how to improve the delegate interface to address these challenges!

@tsung-wei-huang
Copy link
Member

@qq978358810 Thank you - but I still like to the idea of having a delegate interface like what you are proposing here. This will be potentially useful for applications that don't rely on heavy variable capturing (e.g., pure function or C-style void* callback). I think this is definitely something interesting. Perhaps we need to think about a way to template the Node so users can switch between std::function and delegate.

template <typename Handle>
class Node {
  Handle _handle;
};

Please keep me posted about your ideas. Thank you very much for contributing to the library 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants