Skip to main content

First speed test of GPU RTX 3090 using OpenCL

So I wanted to test the new GPU RTX3090 from NVIDIA. I created simple software in OpenCL C++, which is comparing single threaded CPU execution and GPU execution of Newton's method.


Kernel code is super simple for each GPU core (just simple Newton iteration):

__kernel void make_step_kernel(__global float *outp, int batchSize)
{
int tid = get_global_id(0);

if (tid < batchSize)
{
float tmp = outp[tid];
for(int j=1;j<30000;j++)
{
tmp -= 0.0001f * 2.0f * (tmp - 5.0f);
}
outp[tid] = tmp;
}
}

Just iterate 30,000 times.

Speed up in comparison with single thread on CPU is ~ 4200 times and when considering that CPU frequency is 3.8 Ghz and GPU core frequency is 1.7 GHz, theoretical speed up is 4204.213465 * 3.8 / 1.7 ~ 9397.65, which is very close to amount of GPU cores RTX 3090 has (10496).

Pretty impressive GPU :).