A Study of Pipeline Parallelism in Deep Neural Networks

Núñez, Gabriel; Romero Sandí, Hairol; Rojas, Elvis; Meneses, Esteban

doi:https://doi.org/10.29375/25392115.5056

A Study of Pipeline Parallelism in Deep Neural Networks

dc.contributor.author	Núñez, Gabriel
dc.contributor.author	Romero Sandí, Hairol
dc.contributor.author	Rojas, Elvis
dc.contributor.author	Meneses, Esteban
dc.contributor.orcid	Núñez, Gabriel [0000-0002-6907-533X]	spa
dc.contributor.orcid	Romero Sandí, Hairol [0000-0002-3199-1244]	spa
dc.contributor.orcid	Rojas, Elvis [0000-0002-4238-0908]	spa
dc.contributor.orcid	Meneses, Esteban [0000-0002-4307-6000]	spa
dc.date.accessioned	2024-09-19T21:46:23Z
dc.date.available	2024-09-19T21:46:23Z
dc.date.issued	2024-06-18
dc.description.abstractenglish	The current popularity in the application of artificial intelligence to solve complex problems is growing. The appearance of chats based on artificial intelligence or natural language processing has generated the creation of increasingly large and sophisticated neural network models, which are the basis of current developments in artificial intelligence. These neural networks can be composed of billions of parameters and their training is not feasible without the application of approaches based on parallelism. This paper focuses on studying pipeline parallelism, which is one of the most important types of parallelism used to train neural network models in deep learning. In this study we offer a look at the most important concepts related to the topic and we present a detailed analysis of 3 pipeline parallelism libraries: Torchgpipe, FairScale, and DeepSpeed. We analyze important aspects of these libraries such as their implementation and features. In addition, we evaluated them experimentally, carrying out parallel trainings and taking into account aspects such as the number of stages in the training pipeline and the type of balance.	eng
dc.format.mimetype	application/pdf	spa
dc.identifier.doi	https://doi.org/10.29375/25392115.5056
dc.identifier.instname	instname:Universidad Autónoma de Bucaramanga UNAB	spa
dc.identifier.issn	ISSN: 1657-2831	spa
dc.identifier.issn	e-ISSN: 2539-2115	spa
dc.identifier.repourl	repourl:https://repository.unab.edu.co	spa
dc.identifier.uri	http://hdl.handle.net/20.500.12749/26659
dc.language.iso	spa	spa
dc.publisher	Universidad Autónoma de Bucaramanga UNAB	spa
dc.relation	https://revistas.unab.edu.co/index.php/rcc/article/view/5056/3969	spa
dc.relation.references	Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., . . . Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. e Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). November 2–4 (pp. 264-283). Savannah, GA, USA: USENIX Association. https://doi.org/10.48550/arXiv.1605.08695
dc.relation.references	Akintoye, S., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. https://doi.org/10.1109/ACCESS.2022.3193690
dc.relation.references	Alshamrani, R., & Ma, X. (2022). Deep Learning. In C. L. McNeely, & L. A. Schintler (Eds.), Encyclopedia of Big Data (pp. 373-377). Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-32010-6_5
dc.relation.references	Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15). Dallas, TX, USA: IEEE. https://doi.org/10.1109/SC41404.2022.00051
dc.relation.references	Chatelain, A., Djeghri, A., Hesslow, D., & Launay, J. (2022). Is the Number of Trainable Parameters All That Actually Matters? In M. F. Pradier, A. Schein, S. Hyland, F. J. Ruiz, & J. Z. Forde (Ed.), Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops. 163, pp. 27-32. PMLR. https://proceedings.mlr.press/v163/chatelain22a.html
dc.relation.references	Chen, M. (2023). Analysis of Data Parallelism Methods with Deep Neural Network. EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, October 21 - 23 (pp. 1857 - 1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3573428.3573755
dc.relation.references	Chen, Z., Xu, C., Qian, W., & Zhou, A. (2023). Elastic Averaging for Efficient Pipelined DNN Training. Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP´23 (pp. 380-391). Montreal, QC, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3572848.3577484
dc.relation.references	Chilimbi, T., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014). Project Adam: Building an Efficient and Scalable Deep Learning Training System. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI´14). October 6–8 (pp. 570-582). Broomfield, CO: USENIX Association. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf
dc.relation.references	Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q. V., . . . Ng, A. Y. (2012). Large Scale Distributed Deep Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Ed.), Advances in Neural Information Processing Systems (NIPS 2012). 25, pp. 1223-1231. Curran Associates. https://proceedings.neurips.cc/paper_files/paper/2012/file/6aca97005c68f1206823815f66102863-Paper.pdf
dc.relation.references	Deep Learning. (2020). In A. Tatnall (Ed.), Encyclopedia of Education and Information Technologies (First ed., p. 558). Springer Cham. https://doi.org/10.1007/978-3-030-10576-1_300164
dc.relation.references	Deeplearning4j: Deeplearning4j Suite Overview. (2023, July). https://www.deepspeed.ai/
dc.relation.references	DeepSpeed authors: Deepspeed (overview and features). (2023, July). (Microsoft) https://www.deepspeed.ai/
dc.relation.references	FairScale authors. (2021). Fairscale: A general purpose modular pytorch library for high performance and large scale training. https://github.com/facebookresearch/fairscale
dc.relation.references	Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021). DAPPLE: a pipelined data parallel approach for training large models. Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3437801.3441593
dc.relation.references	Farkas, A., Kertész, G., & Lovas, R. (2020). Parallel and Distributed Training of Deep Neural Networks: A brief overview. 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES) (pp. 165-170). Reykjavík, Iceland: IEEE. https://doi.org/10.1109/INES49302.2020.9147123
dc.relation.references	Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. https://doi.org/10.48550/arXiv.1911.04610
dc.relation.references	Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 18). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC]. https://doi.org/10.48550/arXiv.1806.03377
dc.relation.references	Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, July 25). GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv:1811.06965v5 [cs.CV], 1-11. https://doi.org/10.48550/arXiv.1811.06965
dc.relation.references	Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, June 20). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv:1408.5093v1 [cs.CV], 1-4.
dc.relation.references	Keras: Keras api references. (2023, July). https://keras.io/api/
dc.relation.references	Kim, C., Lee, H., Jeong, M., Baek, W., Yoon, B., Kim, I., . . . Kim, S. (2020, April 21). torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models. arXiv:2004.09910v1 [cs.DC], 1-10. https://doi.org/10.48550/arXiv.2004.09910
dc.relation.references	Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. https://doi.org/10.48550/arXiv.1404.5997
dc.relation.references	Li, S., & Hoefler, T. (2021). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3458817.3476145
dc.relation.references	Liang, G., & Alsmadi, I. (2022, February 12). Benchmark Assessment for DeepSpeed Optimization Library. arXiv:2202.12831v1 [cs.LG], 1-8. https://doi.org/10.48550/arXiv.2202.12831
dc.relation.references	Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. https://doi.org/10.1109/CLUSTER51413.2022.00042
dc.relation.references	Luo, Z., Yi, X., Long, G., Fan, S., Wu, C., Yang, J., & Lin, W. (2022). Efficient Pipeline Planning for Expedited Distributed DNN Training. IEEE INFOCOM 2022 - IEEE Conference on Computer Communications (pp. 340-349). IEEE. https://doi.org/INFOCOM48880.2022.9796787
dc.relation.references	Mofrad, M. H., Melhem, R., Ahmad, Y., & Hammoud, M. (2020). Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms. 2020 IEEE High Performance Extreme Computing Conference (HPEC) (pp. 1-7). Waltham, MA, USA: IEEE. https://doi.org/10.1109/HPEC43674.2020.9286195
dc.relation.references	MXNet: Mxnet api docs. (2023, July). https://mxnet.apache.org/versions/1.9.1
dc.relation.references	Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Gang, G. R., . . . Zaharia, M. (2019). PipeDream: generalized pipeline parallelism for DNN training. (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3341301.3359646
dc.relation.references	Padua, D. (2011). Pipelining. In D. Padua (Ed.), Encyclopedia of Parallel Computing (pp. 1562–1563). Boston, MA, USA: Springer. https://doi.org/10.1007/978-0-387-09766-4_335
dc.relation.references	Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. https://www.usenix.org/conference/atc20/presentation/park
dc.relation.references	PlaidML: Plaidml api docs. (2023, July). https://github.com/plaidml/plaidml
dc.relation.references	Pytorch: Pytorch documentation. (2023, July). https://pytorch.org/
dc.relation.references	Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2020, May 13). ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arXiv:1910.02054v3 [cs.LG], 1-24. https://doi.org/10.48550/arXiv.1910.02054
dc.relation.references	Rasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event. July 6 - 10. CA, USA: Association for Computing Machinery. https://doi.org/10.1145/3394486.3406703
dc.relation.references	Rojas, E., Pérez, D., Calhoun, J. C., Bautista Gomez, L., Jones, T., & Meneses, E. (2021). Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration. 2021 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 492-503). Portland, OR, USA: IEEE. https://doi.org/10.1109/Cluster48925.2021.00045
dc.relation.references	Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 8th Latin American Conference, CARLA 2021, October 6–8, 2021, Revised Selected Papers. 1540, pp. 177-192. Guadalajara, Mexico: Springer, Cham. https://doi.org/10.1007/978-3-031-04209-6_13
dc.relation.references	Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., . . . Fei-Fei, L. (2015, January 30). ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575v3 [cs.CV]. https://doi.org/10.48550/arXiv.1409.0575
dc.relation.references	Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. https://doi.org/10.1109/CANDARW51189.2020.00036
dc.relation.references	TensorFlow: Overview. (2023, July). https://www.tensorflow.org/
dc.relation.references	Yang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. International Conference on Learning Representations. https://openreview.net/forum?id=cw-EmNq5zfD
dc.relation.references	Yildirim, E., Arslan, E., Kim, J., & Kosar, T. (2016). Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency. IEEE Transactions on Cloud Computing, 4(1), 63 - 75. https://doi.org/10.1109/TCC.2015.2415804
dc.relation.references	Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). Francisco, CA, USA: IEEE. https://doi.org/10.1109/DAC18074.2021.9586300
dc.relation.references	Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. https://doi.org/10.1016/j.future.2023.04.033
dc.relation.uri	https://revistas.unab.edu.co/index.php/rcc/issue/view/297	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.source	Vol. 25 Núm. 1 (2024): Revista Colombiana de Computación (Enero-Junio); 48-59	spa
dc.subject.keywords	Deep learning	eng
dc.subject.keywords	Parallelism	eng
dc.subject.keywords	Artificial neural networks	eng
dc.subject.keywords	Distributed training	eng
dc.title	A Study of Pipeline Parallelism in Deep Neural Networks	eng
dc.type.coar	http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.driver	info:eu-repo/semantics/article
dc.type.hasversion	info:eu-repo/semantics/publishedVersion
dc.type.local	Artículo	spa
dc.type.redcol	http://purl.org/redcol/resource_type/ART

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Artículo.pdf
Tamaño:: 691.75 KB
Formato:: Adobe Portable Document Format
Descripción:: Artículo

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 347 B
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Revista Colombiana de Computación