Evaluación de nuevas arquitecturas de IA para la estimación de la incertidumbre

Pautsch, Erik; Li, John; Rizzi, Silvio; Thiruvathukal, George K.; Pantoja, Maria

doi:https://doi.org/10.29375/25392115.5274

Evaluación de nuevas arquitecturas de IA para la estimación de la incertidumbre

dc.contributor.author	Pautsch, Erik
dc.contributor.author	Li, John
dc.contributor.author	Rizzi, Silvio
dc.contributor.author	Thiruvathukal, George K.
dc.contributor.author	Pantoja, Maria
dc.contributor.orcid	Pautsch, Erik [0000-0003-0028-5598]	spa
dc.contributor.orcid	Li, John [0000-0002-3730-3713]	spa
dc.contributor.orcid	Rizzi, Silvio [0000-0002-3804-2471]	spa
dc.contributor.orcid	Thiruvathukal, George K. [0000-0002-0452-5571]	spa
dc.contributor.orcid	Pantoja, Maria [0000-0002-1942-9769]	spa
dc.date.accessioned	2025-02-13T21:03:20Z
dc.date.available	2025-02-13T21:03:20Z
dc.date.issued	2024-06-18
dc.description.abstract	El Aprendizaje Profundo (AP) ha hecho avanzar la visión por ordenador, ofreciendo un rendimiento impresionante en tareas visuales complejas. Sin embargo, persiste la necesidad de estimaciones precisas de la incertidumbre, en particular para las entradas fuera de distribución (OOD, en su acrónimo en inglés). Nuestra investigación evalúa la incertidumbre en Redes Neuronales Convolucionales (CNN, en inglés) y transformadores de visión (ViT, en inglés) utilizando los conjuntos de datos MNIST e ImageNet-1K. Utilizando plataformas de Alto Rendimiento (HPC, en inglés), incluidos el superordenador tradicional Polaris y aceleradores de IA como Cerebras CS-2 y SambaNova DataScale, evaluamos los méritos computacionales y los cuellos de botella de cada plataforma. En este artículo se describen las consideraciones clave para utilizar la HPC en la estimación de la incertidumbre en el AP, y se ofrecen ideas que guían la integración de algoritmos y hardware para aplicaciones de AP robustas, especialmente en visión por ordenador.	spa
dc.description.abstractenglish	Deep Learning (DL) has advanced computer vision, delivering impressive performance on intricate visual tasks. Yet, the need for accurate uncertainty estimations, particularly for out-of-distribution (OOD) inputs, persists. Our research evaluates uncertainty in Convolutional Neural Networks (CNN) and Vision Transformers (ViT) using the MNIST and ImageNet-1K datasets. Using High-Performance (HPC) platforms, including the traditional Polaris supercomputer and AI accelerators like Cerebras CS-2 and SambaNova DataScale, we assessed the computational merits and bottlenecks of each platform. This paper delineates key considerations for using HPC in uncertainty estimations in DL, offering insights that guide the integration of algorithms and hardware for robust DL applications, especially in computer vision.	eng
dc.format.mimetype	application/pdf	spa
dc.identifier.doi	https://doi.org/10.29375/25392115.5274
dc.identifier.instname	instname:Universidad Autónoma de Bucaramanga UNAB	spa
dc.identifier.issn	1657-2831	spa
dc.identifier.issn	2539-2115
dc.identifier.repourl	repourl:https://repository.unab.edu.co	spa
dc.identifier.uri	http://hdl.handle.net/20.500.12749/28291
dc.language.iso	spa	spa
dc.publisher	Universidad Autónoma de Bucaramanga UNAB	spa
dc.relation	https://revistas.unab.edu.co/index.php/rcc/article/view/5274/4084	spa
dc.relation.references	Amini, A., Schwarting, W., & Rus, D. (2020, December 6). Deep evidential regression. In H. Larochelle, M. Ranzato, R. T. Hadsell, M. F. Balcan, & H. Lin (Eds.), NIPS'20: 34th International Conference on Neural Information Processing Systems, Vancouver BC, Canada, December 6-12, (pp. 14927-14937, Article 1251). Red Hook, NY, USA: Curran Associates Inc. doi:10.5555/3495724.3496975
dc.relation.references	ANL. (2021, August 26). Polaris. (Argonne National Laboratory) Retrieved July 2023, from ANL website: https://www.alcf.anl.gov/polaris
dc.relation.references	Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., & Muller, U. (2017, April 25). Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv:1704.07911v1 [cs.CV], 1-8. doi:10.48550/arXiv.1704.07911
dc.relation.references	Cordonnier, J.-B., Loukas, A., & Jaggi, M. (2020). On the relationship between selfattention and convolutional layers. Eighth International Conference on Learning Representations - ICLR 2020, April 26-30. Addis Ababa. Retrieved from https://infoscience.epfl.ch/entities/publication/48815b9c-e947-4c4d-84fa-7ebf1f6df4dd/conferencedetails
dc.relation.references	Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Li, F.-F. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 20-25 June (pp. 248-255). Miami, FL, USA: IEEE. doi:10.1109/CVPR.2009.5206848
dc.relation.references	Emani, M., Vishwanath, V., Adams, C., Papka, M. E., Stevens, R., Florescu, L., . . . Sujeeth, A. (2021, March 26). Accelerating scientific applications with sambanova reconfigurable dataflow architecture. Computing in Science & Engineering, 23(2), 114–119. doi:10.1109/MCSE.2021.3057203
dc.relation.references	Gal, Y., & Ghahramani, Z. (2016, June). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In M. F. Balcan, & K. Q. Weinberger (Ed.), Proceedings of The 33rd International Conference on Machine Learning. 48, pp. 1050-1059. New York, New York, USA (20–22 Jun 2016): PMLR. Retrieved from https://proceedings.mlr.press/v48/gal16.html
dc.relation.references	Geifman, Y., & El-Yaniv, R. (2017, December 4). Selective classification for deep neural networks. In U. von Luxburg, I. M. Guyon, S. Bengio, H. M. Wallach, & R. Fergus (Eds.), NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, December 4 - 9, 2017 (pp. 4885-4894). Red Hook, NY, USA: Curran Associates Inc. doi:10.5555/3295222.3295241
dc.relation.references	Guo, C., Pleiss, G., Sun, Y., & Weinber, K. Q. (2017). On calibration of modern neural networks. In D. Precup, & Y. W. Teh (Ed.), Proceedings of the 34th International Conference on Machine Learning. 70, pp. 1321-1330. PMLR. Retrieved from https://proceedings.mlr.press/v70/guo17a.html
dc.relation.references	Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krishnan, R., & Song, D. (2020, July). Pretrained transformers improve out-of-distribution robustness. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 2744–2751). Online: Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.244
dc.relation.references	Lecun, Y., Jackel, L. D., Bottou, L., Cortes, C., Denker, J. S., Drucker, H., . . . Vapnik, V. (1995). Learning algorithms for classification: A comparison on handwritten digit recognition. In J. H. Oh, C. Kwon, & S. Cho (Eds.), Learning algorithms for classification: A comparison on handwritten digit recognition (pp. 261-276). World Scientific. Retrieved from https://nyuscholars.nyu.edu/en/publications/learning-algorithms-for-classification-a-comparison-on-handwritte
dc.relation.references	Lie, S. (2022). Cerebras architecture deep dive: First look inside the hw/sw co-design for deep learning. In 2022 IEEE Hot Chips 34 Symposium (HCS), 21-23 August (pp. 1–34). Cupertino, CA, USA: IEEE. doi:10.1109/HCS55958.2022.9895479
dc.relation.references	Liu, Y., & Guo, H. (2020, July 13). Peer loss functions: Learning from noisy labels without knowing noise rates. In H. C. Daumé, & A. Singh (Eds.), ICML'20: International Conference on Machine LearningJuly 13 - 18 (Vols. 119, Article 578, pp. 6226–6236). JMLR.org.
dc.relation.references	MacDonald, S., Foley, H., Yap, M., Johnston, R. L., Steven, K., Koufariotis, L. T., . . . Trzaskowski, M. (2023, May 6). Generalising uncertainty improves accuracy and safety of deep learning analytics applied to oncology. Scientific Reports, 13, 7395. doi:10.1038/s41598-023-31126-5
dc.relation.references	Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., . . . Snoek, J. (2019). Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 32). Red Hook, NY, USA: Curran Associates Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2019/file/8558cb408c1d76621371888657d2eb1d-Paper.pdf
dc.relation.references	Ren, A. Z., Dixit, A., Bodrova, A., Singh, S., Tu, S., Brown, N., . . . Majumdar, A. (2023, September 4). Robots that ask for help: Uncertainty alignment for large language model planners. arXiv:2307.01928v2 [cs.RO], 1-24. doi:10.48550/arXiv.2307.01928
dc.relation.references	Tamkin, A., Nguyen, D., Deshpande, S., Mu, J., & Goodman, N. (2022). Active learning helps pretrained models learn the intended task. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 28140-28153). Curran Associates Inc. Retrieved from
dc.relation.uri	https://revistas.unab.edu.co/index.php/rcc/issue/view/303	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.source	Vol. 25 Núm. 2 (2024): Revista Colombiana de Computación (Julio-Diciembre); 23-34	spa
dc.subject	Incertidumbre	spa
dc.subject	Aprendizaje Profundo	spa
dc.subject	Aprendizaje por conjuntos	spa
dc.subject	Aprendizaje evidencial	spa
dc.subject	Inteligencia Artificial	spa
dc.subject.keywords	Uncertainty	eng
dc.subject.keywords	Deep Learning	eng
dc.subject.keywords	Ensembles	eng
dc.subject.keywords	Evidential Learning	eng
dc.subject.keywords	Artificial intelligence	eng
dc.title	Evaluación de nuevas arquitecturas de IA para la estimación de la incertidumbre	spa
dc.title.translated	Evaluation of Novel AI Architectures for Uncertainty Estimation	eng
dc.type.coar	http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.driver	info:eu-repo/semantics/article
dc.type.hasversion	info:eu-repo/semantics/publishedVersion
dc.type.local	Artículo	spa
dc.type.redcol	http://purl.org/redcol/resource_type/ART

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Articulo 3.pdf
Tamaño:: 799.26 KB
Formato:: Adobe Portable Document Format
Descripción:: Artículo

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 347 B
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Revista Colombiana de Computación