Global device variables in CUDA: bad practice? -
i designing library has large contingent of cuda kernels perform parallel computations. kernels acting on common object, computational grid, defined using c++ style objects. computational domain doesn't need accessed host side, creating on device side , keeping there makes sense now. i'm wondering if following considered "good practice":
suppose computational grid class called
domain
. first define global device-side variable store computational domain:__device__ domain* d
then initialize computational domain using cuda kernel
__global__ void initdomain(paramtype p){ d = new domain(p); }
then, perform computations using domain other kernels:
__global__ void docomputation(double *x,double *y){ d->dothing(x,y); //... }
if domain remains fixed (i.e. kernels don't modify domain once it's created), ok? there better way? tried creating domain
object on host side , copying on device, turned out hassle because domain
relatively complex type makes pain copy on using e.g. cudamemcpy
or thrust::device_new
(at least, couldn't work nicely).
yes it's ok.
maybe can improve performance using
__constant__
using keyword, object available in kernels in fast memory.
in order copy object, must use : cudamemcpytosymbol, please note there come restriction : object read-only in device code, , must don't have default constructor.
you can find informations here
if object complex , hard copy, maybe can : unified memory, pass variable value kernel.
Comments
Post a Comment