Evolution of CUDA and new parallel programming paradigms

Authors

DOI:

https://doi.org/10.62452/wq0t2e62

Keywords:

Parallel programming, high-performance computing, deep learning, scientific simulation, Generative artificial intelligence, hybrid architectures

Abstract

This article analyzed the evolution of CUDA (Compute Unified Device Architecture) and its impact on parallel programming paradigms, with the aim of exploring its contributions to high-performance computing and the challenges it faces amid emerging technological trends. The research employed a methodology based on a systematic review of scientific and technical literature, complemented by a comparative analysis of CUDA against other parallel programming models, such as OpenCL and SYCL. Additionally, structured consultations with experts were conducted using the Delphi method, which allowed for the integration of qualitative perspectives on the current and future trends of this technology. The results highlighted that CUDA has been pivotal in areas such as deep learning, scientific simulation, and artificial intelligence, by providing specialized tools that optimize computational performance and enhance efficiency in NVIDIA GPU-based systems. However, significant challenges were identified, including its exclusive reliance on proprietary hardware, the need to improve its portability to heterogeneous platforms, and energy sustainability in large-scale applications. The conclusions emphasized the importance of adapting CUDA to more abstract and automated paradigms, facilitating its integration into hybrid architectures and distributed computing environments. The research provided a novel analysis by highlighting CUDA’s evolution and potential as a key technology in parallel programming, reinforcing its relevance for the development of computational solutions to address complex problems in science and engineering.

Downloads

Download data is not yet available.

References

Alves de Araujo, G. (2022). Data and stream parallelism optimizations on GPUs [Tesis de Maestría. Pontifícia Universidade Católica do Rio Grande do Sul].

Breyer, M., Van Craen, A., & Pflüger, D. (2022). A comparison of sycl, opencl, cuda, and openmp for massively parallel support vector machine classification on multi-vendor hardware. Proceedings of the 10th International Workshop on OpenCL. Bristol, United Kingdom .

Caicedo Goyes, F. L. (2024). Exploración de estrategias avanzadas en computación de alto rendimiento: Un Análisis Integral y Perspectivas Emergentes. REVISTA ODIGOS, 5(2), 9–32. https://doi.org/10.35290/ro.v5n2.2024.1174

Calatayud, R., Navarro-Modesto, E., Navarro-Camba, E. A., & Sangary, N. T. (2020). Nvidia CUDA parallel processing of large FDTD meshes in a desktop computer: FDTD-matlab on GPU. Proceedings of the 10th Euro-American Conference on Telematics and Information Systems. Aveiro, Portugal.

Fernandes, D. F., Santos, M. C., Silva, A. C., & Lima, A. M. M. (2024). Comparative study of CUDA-based parallel programming in C and Python for GPU acceleration of the 4th order Runge-Kutta method. Nuclear Engineering and Design, 421, 113050. https://doi.org/10.1016/j.nucengdes.2024.113050

Flor Damiá, J. (2023). Realidad aumentada e Inteligencia Artificial en un entorno de Tactile Internet [Tesis de Grado. Universitat Politècnica de València]. https://riunet.upv.es/handle/10251/195532

Hijma, P., Heldens, S., Sclocco, A., Van Werkhoven, B., & Bal, H. E. (2023). Optimization techniques for GPU programming. ACM Computing Surveys, 55(11), 1–81. https://dl.acm.org/doi/full/10.1145/3570638

Kim, D., Kim, I., & Kim, J. (2022). Analysis of Sub-Routines in NVIDIA cuBLAS Library for a series of Matrix-Matrix Multiplications in Transformer. 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). Jeju Island, Korea.

Miguel López, S. (2021). Celerity: el futuro de la programación paralela en memoria distribuida [Tesis de Maestría, Universidad de Valladolid].

Moya Jiménez, M. Á. 2021). Soporte de Comunicación Eficiente en Plataforma de Entrenamiento Distribuido de Redes Neuronales [Tesis de Grado. Universitat Politècnica de València].

Muñoz, F., Asenjo, R., Navarro, A., & Cabaleiro, J. C. (2024). CPU and GPU oriented optimizations for LiDAR data processing. Journal of Computational Science, 79, 102317. https://doi.org/10.1016/j.jocs.2024.102317

Pang, W., Luo, X., Chen, K., Ji, D., Qiao, L., & Yi, W. (2023). Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs. Journal of Systems Architecture, 139, 102888. https://doi.org/10.1016/j.sysarc.2023.102888

Rockenbach, D. A., Araujo, G., Griebler, D., & Fernandes, L. G. (2025). GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism. Computer Standards & Interfaces, 92, 103922. https://doi.org/10.1016/j.csi.2024.103922

Valencia Pérez, T. A. (2020). Implementación de algoritmos de reconstrucción tomográfica mediante programación paralela (CUDA) [Tesis de doctorado, Benemérita Universidad Autónoma de Puebla].

Yanez Soffia, M. A. (2023). Análisis sobre modelos predictores de depresión mediante la interpretación de lenguaje natural a partir de textos usando machine learning [Tesis de Grado. ETSI_Informatica. Universidad Politécnica de Madrid].

Yoshida, K., Miwa, S., Yamaki, H., & Honda, H. (2024). Analyzing the impact of CUDA versions on GPU applications. Parallel Computing, 120, 103081. https://doi.org/10.1016/j.parco.2024.103081

Zhuo, Y., Zhang, T., Du, F., & Liu, R. (2023). A parallel particle swarm optimization algorithm based on GPU/CUDA. Applied Soft Computing, 144, 110499. https://doi.org/10.1016/j.asoc.2023.110499

Downloads

Published

2025-09-20

How to Cite

Molina-Chalacán, L. J. ., Jalón-Arias, E. J. ., & Albarracín-Zambrano, L. O. . (2025). Evolution of CUDA and new parallel programming paradigms. Revista Metropolitana De Ciencias Aplicadas, 8(4), 87-94. https://doi.org/10.62452/wq0t2e62