Evolution of CUDA and new parallel programming paradigms
DOI:
https://doi.org/10.62452/wq0t2e62Keywords:
Parallel programming, high-performance computing, deep learning, scientific simulation, Generative artificial intelligence, hybrid architecturesAbstract
This article analyzed the evolution of CUDA (Compute Unified Device Architecture) and its impact on parallel programming paradigms, with the aim of exploring its contributions to high-performance computing and the challenges it faces amid emerging technological trends. The research employed a methodology based on a systematic review of scientific and technical literature, complemented by a comparative analysis of CUDA against other parallel programming models, such as OpenCL and SYCL. Additionally, structured consultations with experts were conducted using the Delphi method, which allowed for the integration of qualitative perspectives on the current and future trends of this technology. The results highlighted that CUDA has been pivotal in areas such as deep learning, scientific simulation, and artificial intelligence, by providing specialized tools that optimize computational performance and enhance efficiency in NVIDIA GPU-based systems. However, significant challenges were identified, including its exclusive reliance on proprietary hardware, the need to improve its portability to heterogeneous platforms, and energy sustainability in large-scale applications. The conclusions emphasized the importance of adapting CUDA to more abstract and automated paradigms, facilitating its integration into hybrid architectures and distributed computing environments. The research provided a novel analysis by highlighting CUDA’s evolution and potential as a key technology in parallel programming, reinforcing its relevance for the development of computational solutions to address complex problems in science and engineering.
Downloads
References
Alves de Araujo, G. (2022). Data and stream parallelism optimizations on GPUs [Tesis de Maestría. Pontifícia Universidade Católica do Rio Grande do Sul].
Breyer, M., Van Craen, A., & Pflüger, D. (2022). A comparison of sycl, opencl, cuda, and openmp for massively parallel support vector machine classification on multi-vendor hardware. Proceedings of the 10th International Workshop on OpenCL. Bristol, United Kingdom .
Caicedo Goyes, F. L. (2024). Exploración de estrategias avanzadas en computación de alto rendimiento: Un Análisis Integral y Perspectivas Emergentes. REVISTA ODIGOS, 5(2), 9–32. https://doi.org/10.35290/ro.v5n2.2024.1174
Calatayud, R., Navarro-Modesto, E., Navarro-Camba, E. A., & Sangary, N. T. (2020). Nvidia CUDA parallel processing of large FDTD meshes in a desktop computer: FDTD-matlab on GPU. Proceedings of the 10th Euro-American Conference on Telematics and Information Systems. Aveiro, Portugal.
Fernandes, D. F., Santos, M. C., Silva, A. C., & Lima, A. M. M. (2024). Comparative study of CUDA-based parallel programming in C and Python for GPU acceleration of the 4th order Runge-Kutta method. Nuclear Engineering and Design, 421, 113050. https://doi.org/10.1016/j.nucengdes.2024.113050
Flor Damiá, J. (2023). Realidad aumentada e Inteligencia Artificial en un entorno de Tactile Internet [Tesis de Grado. Universitat Politècnica de València]. https://riunet.upv.es/handle/10251/195532
Hijma, P., Heldens, S., Sclocco, A., Van Werkhoven, B., & Bal, H. E. (2023). Optimization techniques for GPU programming. ACM Computing Surveys, 55(11), 1–81. https://dl.acm.org/doi/full/10.1145/3570638
Kim, D., Kim, I., & Kim, J. (2022). Analysis of Sub-Routines in NVIDIA cuBLAS Library for a series of Matrix-Matrix Multiplications in Transformer. 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). Jeju Island, Korea.
Miguel López, S. (2021). Celerity: el futuro de la programación paralela en memoria distribuida [Tesis de Maestría, Universidad de Valladolid].
Moya Jiménez, M. Á. 2021). Soporte de Comunicación Eficiente en Plataforma de Entrenamiento Distribuido de Redes Neuronales [Tesis de Grado. Universitat Politècnica de València].
Muñoz, F., Asenjo, R., Navarro, A., & Cabaleiro, J. C. (2024). CPU and GPU oriented optimizations for LiDAR data processing. Journal of Computational Science, 79, 102317. https://doi.org/10.1016/j.jocs.2024.102317
Pang, W., Luo, X., Chen, K., Ji, D., Qiao, L., & Yi, W. (2023). Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs. Journal of Systems Architecture, 139, 102888. https://doi.org/10.1016/j.sysarc.2023.102888
Rockenbach, D. A., Araujo, G., Griebler, D., & Fernandes, L. G. (2025). GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism. Computer Standards & Interfaces, 92, 103922. https://doi.org/10.1016/j.csi.2024.103922
Valencia Pérez, T. A. (2020). Implementación de algoritmos de reconstrucción tomográfica mediante programación paralela (CUDA) [Tesis de doctorado, Benemérita Universidad Autónoma de Puebla].
Yanez Soffia, M. A. (2023). Análisis sobre modelos predictores de depresión mediante la interpretación de lenguaje natural a partir de textos usando machine learning [Tesis de Grado. ETSI_Informatica. Universidad Politécnica de Madrid].
Yoshida, K., Miwa, S., Yamaki, H., & Honda, H. (2024). Analyzing the impact of CUDA versions on GPU applications. Parallel Computing, 120, 103081. https://doi.org/10.1016/j.parco.2024.103081
Zhuo, Y., Zhang, T., Du, F., & Liu, R. (2023). A parallel particle swarm optimization algorithm based on GPU/CUDA. Applied Soft Computing, 144, 110499. https://doi.org/10.1016/j.asoc.2023.110499
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Luis Javier Molina-Chalacán, Edmundo José Jalón-Arias, Luis Orlando Albarracín-Zambrano (Autor/a)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who publish in Revista Metropolitana de Ciencias Aplicadas (REMCA), agree to the following terms:
1. Copyright
Authors retain unrestricted copyright to their work. Authors grant the journal the right of first publication. To this end, they assign the journal non-exclusive exploitation rights (reproduction, distribution, public communication, and transformation). Authors may enter into additional agreements for the non-exclusive distribution of the version of the work published in the journal, provided that acknowledgment of its initial publication in this journal is given.
© The authors.
2. License
The articles are published in the journal under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). The terms can be found at: https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
This license allows:
- Sharing: Copying and redistributing the material in any medium or format.
- Adapting: Remixing, transforming, and building upon the material.
Under the following terms:
- Attribution: You must give appropriate credit, provide a link to the license, and indicate if any changes were made. You may do this in any reasonable manner, but not in any way that suggests the licensor endorses or sponsors your use.
- NonCommercial: You may not use the material for commercial purposes.
- ShareAlike: If you remix, transform, or build upon the material, you must distribute your creation under the same license as the original work.
There are no additional restrictions. You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.