3

In sycl we create a kernel like this:

queue.submit(
[&d_cells, &d_count_occupied](sycl::handler& cgh)
{
     auto cells_accessor = d_cells.get_access<sycl::access_mode::read>(cgh);
     auto count_accessor =
         d_count_occupied.get_access<sycl::access_mode::write>(cgh);
     cgh.parallel_for(
         d_cells.range(),
         [cells_accessor,
          count_accessor](sycl::id<3> id, sycl::kernel_handler kh)
         {
             auto cell = cells_accessor.at(kh, id);
             if (cell.is_occupied())
             {
                 sycl::atomic_ref<
                     unsigned,
                     sycl::memory_order::relaxed,
                     sycl::memory_scope::device>
                     count{count_accessor[0]};
                 count++;
             }
         }
     );
}
)

This kernel takes 2 buffers, 1 which holds cell information, the other is designed to count the number of "occupied" cells. Imagine now that i have the d_cells buffer wrapped up into a class which has knowledge of the occupied cells or not. We could conceivably offer a smart function that takes a user supplied lambda to operate on the cells:

class Cell {
   bool is_occupied() const;
   int get_position() const;

   // implementation details.

};

class Grid {

// Apply some user function to all of the occupied cells.
template <typename TFunctor, typename... TArgs>
sycl::event apply_all_occupied(sycl::queue q, TFunctor&& function, TArgs... args);

private: 
sycl::buffer<Cell> d_cells;

};

The intended call pattern would be something like this:

sycl::buffer<unsigned> d_count_occupied{
    count_occupied.data(), count_occupied.size()};
auto function = [](auto grid_cell, sycl::kernel_handler, auto count_accessor)
{
    sycl::atomic_ref<
        unsigned,
        sycl::memory_order::relaxed,
        sycl::memory_scope::device>
        count{count_accessor[0]};
    count++;
};
grid.apply_all_occupied(queue, function, d_count_occupied).wait_and_throw();

This would be very cool, it simplifies and abstracts the implementation of the "Grid" substantially which is nice. But here we have a problem. The implementation of the functor that the user gives must be able to run on the device. Therefore the buffers provided need to be converted to "accessors" before being passed to the user supplied function. We could maybe work it out with some meta programming like:


template <typename TFunctor, typename... TArgs>
sycl::event apply_all_occupied(sycl::queue q, TFunctor&& function, TArgs... args) {

queue.submit(
[this, function, &args...](sycl::handler& cgh)
{
     auto cells_accessor = d_cells_.get_access<sycl::access_mode::write>(cgh);

     // Somehow get the access to all of the arguments here?
     std::tuple accessors = {args.get_access<sycl::access_mode::read>(cgh), ...};

     cgh.parallel_for(
         d_cells.range(),
         [cells_accessor,
          accessors, function](sycl::id<3> id, sycl::kernel_handler kh)
         {
             auto cell = cells_accessor.at(kh, id);
             function(kh, cell, accessors);
         }
     );
}

But this has serious issues:

  1. The user would need their lambda to receive some ambiguous tuple type with accessors.
  2. There is no way to customise the access mode for each of the get_access calls.

Is there a sensible way to implement this type of behaviour?

1 Answer 1

2
+50

Yes, there is a way. Your second requirement to customize the access mode means that you want to pass in a transform operation for each buffer into apply_all_occupied() instead of the buffer itself. I.e. you receive a parameter pack BufferAccessFuncsT &&... get_access_funcs, where each element is a callable. For example:

int main()
{
  sycl::buffer d_count_occupied;
  sycl::other_buffer other_buf;
  sycl::queue q;
  Grid grid;

  auto function = [](auto grid_cell,
                     sycl::kernel_handler & kh,
                     sycl::buffer::accessor & count_accessor,
                     sycl::other_buffer::accessor & buf2) {
    std::cout << "Called" << std::endl;
    // Do stuff, e.g.:
    // sycl::atomic_ref<...> count{count_accessor[0]};
    // count++;
  };

  grid.apply_all_occupied(
      q,
      function,
      [&d_count_occupied](sycl::handler & cgh) { return d_count_occupied.get_access<sycl::access_mode::write>(cgh); },
      [&other_buf](sycl::handler & cgh) { return other_buf.get_access<sycl::access_mode::read>(cgh); });
}

In this example I pass in two lambdas into apply_all_occupied() that return the accessor for each buffer. Of course, it also works with just one or zero or multiple accessors. The function in main() expects the accessors in the same order as the lambdas were passed into apply_all_occupied().

Regarding your first requirement, that the user defined function should not receive a tuple but rather the parameters directly, you basically want something like a "local parameter pack variable"

// Invalid, does not compile
auto &&... accessors = (std::forward<BufferAccessFuncsT>(get_access_funcs)(cgh))...;

that you can then forward to your actual function. As far as I know, something like this does not exist. But instead you can do the transform and pass the result directly to another helper function. Like so:

struct Grid
{
  template <class FuncT, class... BufferAccessFuncsT>
  void apply_all_occupied(sycl::queue & q, FuncT && func, BufferAccessFuncsT &&... get_access_funcs)
  {
    q.submit([&](sycl::handler & cgh) {
      auto cells_accessor = 0; // Or whatever

      // Helper function that receives the transformed arguments in the parameter pack get_access_funcs.
      auto call_parallel_for_with_accessors = [&](auto &&... accessors) {
        cgh.parallel_for([&](sycl::kernel_handler & kh) {
          int grid_cell = cells_accessor; // Or whatever
          func(grid_cell, kh, accessors...);
        });
      };

      call_parallel_for_with_accessors((std::forward<BufferAccessFuncsT>(get_access_funcs)(cgh))...);
    });
  }
};

The call_parallel_for_with_accessors in Grid::apply_all_occupied() is the helper function that receives the accessors.

Note that I removed some of the non-essential stuff from your original code to get a minimal example.

Full example (live on godbolt):

#include <iostream>
#include <utility>

namespace sycl
{
struct kernel_handler
{
};

struct handler
{
  kernel_handler kh;

  template <class FuncT, class... ArgsT>
  void parallel_for(FuncT && func, ArgsT &&... args)
  {
    func(kh, std::forward<ArgsT>(args)...);
  }
};


enum class access_mode
{
    read,
    write
};

struct buffer
{
  struct accessor
  {
  };

  template <access_mode mode>
  accessor get_access(handler &)
  {
    return accessor{};
  }
};

// Just to have another buffer type.
struct other_buffer
{
  struct accessor
  {
  };

  template <access_mode mode>
  accessor get_access(handler &)
  {
    return accessor{};
  }
};


struct queue
{
  handler cgh;

  template <class FuncT>
  void submit(FuncT func)
  {
    func(cgh);
  }
};
} // namespace sycl


struct Grid
{
  template <class FuncT, class... BufferAccessFuncsT>
  void apply_all_occupied(sycl::queue & q, FuncT && func, BufferAccessFuncsT &&... get_access_funcs)
  {
    q.submit([&](sycl::handler & cgh) {
      auto cells_accessor = 0; // Or whatever

      // Helper function that receives the transformed arguments in the parameter pack get_access_funcs.
      auto call_parallel_for_with_accessors = [&](auto &&... accessors) {
        cgh.parallel_for([&](sycl::kernel_handler & kh) {
          int grid_cell = cells_accessor; // Or whatever
          func(grid_cell, kh, accessors...);
        });
      };

      call_parallel_for_with_accessors((std::forward<BufferAccessFuncsT>(get_access_funcs)(cgh))...);
    });
  }
};


int main()
{
  sycl::buffer d_count_occupied;
  sycl::other_buffer other_buf;
  sycl::queue q;
  Grid grid;

  auto function = [](auto grid_cell,
                     sycl::kernel_handler & kh,
                     sycl::buffer::accessor & count_accessor,
                     sycl::other_buffer::accessor & buf2) {
    std::cout << "Called" << std::endl;
    // Do stuff, e.g.:
    // sycl::atomic_ref<...> count{count_accessor[0]};
    // count++;
  };

  grid.apply_all_occupied(
      q,
      function,
      [&d_count_occupied](sycl::handler & cgh) { return d_count_occupied.get_access<sycl::access_mode::write>(cgh); },
      [&other_buf](sycl::handler & cgh) { return other_buf.get_access<sycl::access_mode::read>(cgh); });
}

EDIT: If the flexibility of passing a full lambda to apply_all_occupied() is not required but only the access_mode should be specified per buffer, you can introduce an additional helper function

template <sycl::access_mode mode, class BufferT>
auto AccessAs(BufferT & buffer)
{
    return [&] (sycl::handler & cgh) { 
        return buffer.template get_access<mode>(cgh); 
    };
};

and call apply_all_occupied() like this:

  grid.apply_all_occupied(
      q,
      function,
      AccessAs<sycl::access_mode::write>(d_count_occupied),
      AccessAs<sycl::access_mode::read>(other_buf));

Full example on godbolt.


You can further abbreviate this by defining

template <class BufferT>
auto AsWritableBuffer(BufferT & buffer)
{
    return AccessAs<sycl::access_mode::write>(buffer);
};

template <class BufferT>
auto AsReadableBuffer(BufferT & buffer)
{
    return AccessAs<sycl::access_mode::read>(buffer);
};

and using it like this

  grid.apply_all_occupied(
      q,
      function,
      AsWritableBuffer(d_count_occupied),
      AsReadableBuffer(other_buf));

Full example on godbolt.

3
  • Its a good suggestion. I was hoping not to go down the route of individual functions for each one though. i will have a think about it. Nov 1, 2022 at 23:05
  • @FantasticMrFox I edited my answer with an example helper function AccessAs, which abbreviates the arguments passed to apply_all_occupied(). But maybe I have not understood you correctly: If you say that you didn't want to pass individual functions to apply_all_occupied(), did you mean you just wanted to pass in references to the buffers directly? However, how do you then imagine your requirement that the caller can "customize the access mode"? For this you necessarily need to pass in the information about the access_mode somehow? What is the interface you are looking for?
    – Sedenion
    Nov 2, 2022 at 19:43
  • Yeah, this is close to what i was looking for. I will mess around with this and see if it works. There are some peculiar things that occur because the output from the access function must be used within a kernel, but this should work. Nov 2, 2022 at 22:42

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.