When is it better to program with CUDA?

13 December 2023 1 6K Report

Hello,

CUDA is a C/C++ extension, which allows you to efficiently program NVIDIA GPUs. If your program needs to be executable on non-NVIDIA GPUs, you should use something more general. OpenCL is wide-spread standard for parallel programming on accelerators. Though being less generic, CUDA programs typically run significantly faster due to producing more optimized code for specific GPU architectures.

When considering the use of the GPU with CUDA or OpenCL, it is important that the computations can be executed in parallel without conflicts. GPUs are highly-efficient when many small calculations can be performed simultaneously not depending on each other. I give some exemples, where GPUs are efficiently used:

1) Rendering: The rendering of several triangles in a scene can be done in a sequence of parallel computations. This was the initial use-case for GPUs, as they allowed for impressively fast frame rates in games or other interactive 3D applications. Initially, GPUs acted as fixed function pipelines [1], but as their impressive aggregated processing power was sought for in other disciplines, GPUs became programmable through high-level languages such as CUDA or OpenCL. Now, GPUs can be programmed for rendering tasks such as raytracing or path tracing [2], where thousands of rays are emitted from an eye position through the scene.

2) Calculations in linear algebra such as matrix calculations or solving linear equation systems. These calculations are frequently performed in high performance computing problems such as physical simulations. Reed and others (together with Turing award winner Jack Dongarra) described the HPC computations that are currently frequently performed [3]. These applications can benefit drastically from using GPU accelerated linear algebra libraries such as cuBLAs [4], because this can save days of calculation time.

3) Machine-learning: Learning or training large data sets can be intrusive and frequently requires a lot of time. For this reason, GPUs are frequently used to accelerate the process, while most users rely on already-implemented packages that control the GPU. Take a look into [5] for a detailed explanation.

As a closing note, first check if the problem you have maps well onto the GPU architecture and secondly check if there is already a library for the problem you are facing, before you start with the implementation of a CUDA program by yourself.

Best regards,

Daniel

[1] CEBENOYAN, Cem. Graphics pipeline performance. GPU Gems, 2004, S. 473-486. URL: https://developer.nvidia.com/gpugems/gpugems/part-v-performance-and-practicalities/chapter-28-graphics-pipeline-performance

[2] SJOHOLM, J. Best practices: Using NVIDIA RTX ray tracing. NVIDIA Developer Blog, 2020. URL: https://developer.nvidia.com/blog/best-practices-using-nvidia-rtx-ray-tracing/

[3] REED, Daniel; GANNON, Dennis; DONGARRA, Jack. Reinventing high performance computing: challenges and opportunities. arXiv preprint arXiv:2203.02544, 2022.

[4] https://docs.nvidia.com/cuda/cublas/index.html

[5] ILIEVSKI, Andrej; ZDRAVESKI, Vladimir; GUSEV, Marjan. How CUDA powers the machine learning revolution. In: 2018 26th Telecommunications Forum (TELFOR). IEEE, 2018. S. 420-425.

Badges
Science topic

More Tong Guo's questions See All

"A Markov-like Model for Patient Progression"?

A Markov-like Model for Patient Progression" Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC) is a powerful computational technique used to draw samples from a probability...

05 August 2024 10,079 0 View

La animación digital en plataformas digitales?

Hoy la animación se utiliza como una tecnología multimedia con gran potencial educativo, que va mucho más allá de sólo crear figuras, ya que puede promover una mejor comprensión en...

01 August 2024 7,186 0 View

GSH estimation assay: What is the right choice of standard?

Hi there, My question is: What standard curves should be used while estimating Tot GSH and GSSG by kinetic method using GR enzyme mediated recyling with DTNB chromophore? Actually I am following...

01 August 2024 8,217 1 View

How to do pca analysis of c-alpha atom of the protein?

i m interested in pca analysis of c-alpha atoms in gromacs for that i used the following gmx_mpi covar -s mdca.tpr -f mdca.xtc -o eigenvalca.xvg -v eigenvecca.trr -av average.pdb -n index.ndx but...

30 July 2024 1,607 1 View

What exactly is RAG-LLM doing? Isn’t it data engineering?

What exactly is Retrieval Augmented Generation for Large Language Model doing? Isn’t it data engineering?

30 July 2024 7,376 3 View

After a lot of feature engineering for CTR modeling, it feels like it's basically the end of iteration? I mean, it's not cost-effective to keep doing?

After a lot of feature engineering for click-through rate modeling, it feels like it's basically the end of iteration? I mean, it's not cost-effective to keep doing it?

29 July 2024 4,955 0 View

How to estimate sample size for GWAS of continuous and discrete traits? What are the pre-requisites?

Genome-wide association study (GWAS) Continuous traits: eg. Height Discrete traits: eg. Eye color

28 July 2024 286 0 View

All math can be explained by iterator of code?

all math can be traversed by code? all math can be translate to code?

26 July 2024 9,530 0 View

HEC 1A & HEC1B Cell Lines?

Hi, Kindly guide me that how many cells of HEC1A & HEC1B Cell lines should I seed for Wound healing assay and which plate type is recommended 6, 12 & 24?. Articles suggested mainly 24...

20 July 2024 4,143 2 View

Why electrical charge on the moving plate increase?

Hi, everyone This figure depicts a simulation of an electrostatic energy harvesting system in COMSOL Multiphysics software. My question is regarding the relationship between the changes in...

19 July 2024 4,694 4 View

Cuáles fueron las tendencias en investigaciones en arquitectura, urbanismo y patrimonio edificado en decadas del 2000 al 2020?

Cuáles fueron las tendencias en investigaciones en arquitectura, urbanismo y patrimonio edificado en decadas del 2000 al 2020? Porque requiero conocer tesis de posgrado nivel maestría...

24 July 2024 5,494 1 View

Does Nature Scientific Reports waive open access fee for industry authors?

I came across the Green Building and Sustainable Architecture collection under Nature Scientific Reports some weeks ago. https://www.nature.com/collections/gajghaebce The special issue/collection...

10 July 2024 5,533 1 View

LSTM on Time Series: Has LSTM architectures ever been applied to Time-Series Forecasting ?

Have we ever used LSTM architectures on Time-Series Forecasting and Analysis, and gotten a decent result ?

30 June 2024 6,924 3 View

Has fine-tuning techniques like LORA ever been applied to pre-trained Computer Vision CNN architectures ?

Has fine-tuning techniques like LORA and QLORA ever been applied to pre-trained CNN architectures for any application ?

25 June 2024 7,332 2 View

Images: Between CNN architectures and Vision Transformers, which requires more data to train and why ?

Which architecture requires more data to train between CNN and Vision Transformer based models ?

25 June 2024 7,599 0 View

You are kindly requested to investigate the stealing my name from one of my researches?

Unfortunately, I found my name as a senior author of research entitled "Multifunctional prosthetic polyester based hybrid mesh for repairing of abdominal wall hernias and defects" published in...

23 June 2024 7,798 0 View

¿Cuáles son los entornos estrategicos mas importantes frente al tema de inteligencia artificial?

Según el Ministerio de Tecnología e Innovación Colombiano, los entonos estrategicos en los que deben trabajar los gobiernos para adoptar una posición eficiente frente a la Inteligencia artificial,...

23 June 2024 9,844 1 View

Hello In your opinion, which is better: Study Microprocessing first, then Computer Architecture, or vice versa, and why?

Computer Science Department

19 June 2024 8,292 2 View

Object Detection: Which Object Detection Model can identify small objects ?

Which Object Detection architecture (be it CNN-based or Visual Transformer-based) can be used to detect small objects ?

18 June 2024 9,589 2 View

How can we train multi-modal CLIP architecture to generate images using Prompt ?

Can we even make changes to CLIP Model architecture such that it can be used as an image generator from prompts ?

16 June 2024 320 0 View