Op's supported by CUDA

As of now, only the very basic simple ops support CUDA:

Elementwise unary operations:

Elementwise binary operations - only arithmetic operations support CUDA:

From a lot of profiling of this author’s personal projects, the ones that really matter are tanh, sigmoid, expm1, exp and cube - basically the activation functions. The other operations do work fine with MKL+AVX and aren’t the major cause of slowness in a neural network