In sycl we create a kernel like this:
queue.submit(
[&d_cells, &d_count_occupied](sycl::handler& cgh)
{
auto cells_accessor = d_cells.get_access<sycl::access_mode::read>(cgh);
auto count_accessor =
d_count_occupied.get_access<sycl::access_mode::write>(cgh);
cgh.parallel_for(
d_cells.range(),
[cells_accessor,
count_accessor](sycl::id<3> id, sycl::kernel_handler kh)
{
auto cell = cells_accessor.at(kh, id);
if (cell.is_occupied())
{
sycl::atomic_ref<
unsigned,
sycl::memory_order::relaxed,
sycl::memory_scope::device>
count{count_accessor[0]};
count++;
}
}
);
}
)
This kernel takes 2 buffers, 1 which holds cell information, the other is designed to count the number of "occupied" cells. Imagine now that i have the d_cells
buffer wrapped up into a class which has knowledge of the occupied cells or not. We could conceivably offer a smart function that takes a user supplied lambda to operate on the cells:
class Cell {
bool is_occupied() const;
int get_position() const;
// implementation details.
};
class Grid {
// Apply some user function to all of the occupied cells.
template <typename TFunctor, typename... TArgs>
sycl::event apply_all_occupied(sycl::queue q, TFunctor&& function, TArgs... args);
private:
sycl::buffer<Cell> d_cells;
};
The intended call pattern would be something like this:
sycl::buffer<unsigned> d_count_occupied{
count_occupied.data(), count_occupied.size()};
auto function = [](auto grid_cell, sycl::kernel_handler, auto count_accessor)
{
sycl::atomic_ref<
unsigned,
sycl::memory_order::relaxed,
sycl::memory_scope::device>
count{count_accessor[0]};
count++;
};
grid.apply_all_occupied(queue, function, d_count_occupied).wait_and_throw();
This would be very cool, it simplifies and abstracts the implementation of the "Grid" substantially which is nice. But here we have a problem. The implementation of the functor that the user gives must be able to run on the device. Therefore the buffers provided need to be converted to "accessors" before being passed to the user supplied function. We could maybe work it out with some meta programming like:
template <typename TFunctor, typename... TArgs>
sycl::event apply_all_occupied(sycl::queue q, TFunctor&& function, TArgs... args) {
queue.submit(
[this, function, &args...](sycl::handler& cgh)
{
auto cells_accessor = d_cells_.get_access<sycl::access_mode::write>(cgh);
// Somehow get the access to all of the arguments here?
std::tuple accessors = {args.get_access<sycl::access_mode::read>(cgh), ...};
cgh.parallel_for(
d_cells.range(),
[cells_accessor,
accessors, function](sycl::id<3> id, sycl::kernel_handler kh)
{
auto cell = cells_accessor.at(kh, id);
function(kh, cell, accessors);
}
);
}
But this has serious issues:
- The user would need their lambda to receive some ambiguous tuple type with accessors.
- There is no way to customise the access mode for each of the
get_access
calls.
Is there a sensible way to implement this type of behaviour?