A Study of Pipeline Parallelism in Deep Neural Networks
| dc.contributor.author | Núñez, Gabriel | |
| dc.contributor.author | Romero Sandí, Hairol | |
| dc.contributor.author | Rojas, Elvis | |
| dc.contributor.author | Meneses, Esteban | |
| dc.contributor.orcid | Núñez, Gabriel [0000-0002-6907-533X] | spa |
| dc.contributor.orcid | Romero Sandí, Hairol [0000-0002-3199-1244] | spa |
| dc.contributor.orcid | Rojas, Elvis [0000-0002-4238-0908] | spa |
| dc.contributor.orcid | Meneses, Esteban [0000-0002-4307-6000] | spa |
| dc.date.accessioned | 2024-09-19T21:46:23Z | |
| dc.date.available | 2024-09-19T21:46:23Z | |
| dc.date.issued | 2024-06-18 | |
| dc.description.abstractenglish | The current popularity in the application of artificial intelligence to solve complex problems is growing. The appearance of chats based on artificial intelligence or natural language processing has generated the creation of increasingly large and sophisticated neural network models, which are the basis of current developments in artificial intelligence. These neural networks can be composed of billions of parameters and their training is not feasible without the application of approaches based on parallelism. This paper focuses on studying pipeline parallelism, which is one of the most important types of parallelism used to train neural network models in deep learning. In this study we offer a look at the most important concepts related to the topic and we present a detailed analysis of 3 pipeline parallelism libraries: Torchgpipe, FairScale, and DeepSpeed. We analyze important aspects of these libraries such as their implementation and features. In addition, we evaluated them experimentally, carrying out parallel trainings and taking into account aspects such as the number of stages in the training pipeline and the type of balance. | eng |
| dc.format.mimetype | application/pdf | spa |
| dc.identifier.doi | https://doi.org/10.29375/25392115.5056 | |
| dc.identifier.instname | instname:Universidad Autónoma de Bucaramanga UNAB | spa |
| dc.identifier.issn | ISSN: 1657-2831 | spa |
| dc.identifier.issn | e-ISSN: 2539-2115 | spa |
| dc.identifier.repourl | repourl:https://repository.unab.edu.co | spa |
| dc.identifier.uri | http://hdl.handle.net/20.500.12749/26659 | |
| dc.language.iso | spa | spa |
| dc.publisher | Universidad Autónoma de Bucaramanga UNAB | spa |
| dc.relation | https://revistas.unab.edu.co/index.php/rcc/article/view/5056/3969 | spa |
| dc.relation.references | Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., . . . Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. e Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). November 2–4 (pp. 264-283). Savannah, GA, USA: USENIX Association. https://doi.org/10.48550/arXiv.1605.08695 | |
| dc.relation.references | Akintoye, S., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. https://doi.org/10.1109/ACCESS.2022.3193690 | |
| dc.relation.references | Alshamrani, R., & Ma, X. (2022). Deep Learning. In C. L. McNeely, & L. A. Schintler (Eds.), Encyclopedia of Big Data (pp. 373-377). Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-32010-6_5 | |
| dc.relation.references | Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15). Dallas, TX, USA: IEEE. https://doi.org/10.1109/SC41404.2022.00051 | |
| dc.relation.references | Chatelain, A., Djeghri, A., Hesslow, D., & Launay, J. (2022). Is the Number of Trainable Parameters All That Actually Matters? In M. F. Pradier, A. Schein, S. Hyland, F. J. Ruiz, & J. Z. Forde (Ed.), Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops. 163, pp. 27-32. PMLR. https://proceedings.mlr.press/v163/chatelain22a.html | |
| dc.relation.references | Chen, M. (2023). Analysis of Data Parallelism Methods with Deep Neural Network. EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, October 21 - 23 (pp. 1857 - 1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3573428.3573755 | |
| dc.relation.references | Chen, Z., Xu, C., Qian, W., & Zhou, A. (2023). Elastic Averaging for Efficient Pipelined DNN Training. Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP´23 (pp. 380-391). Montreal, QC, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3572848.3577484 | |
| dc.relation.references | Chilimbi, T., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014). Project Adam: Building an Efficient and Scalable Deep Learning Training System. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI´14). October 6–8 (pp. 570-582). Broomfield, CO: USENIX Association. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf | |
| dc.relation.references | Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q. V., . . . Ng, A. Y. (2012). Large Scale Distributed Deep Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Ed.), Advances in Neural Information Processing Systems (NIPS 2012). 25, pp. 1223-1231. Curran Associates. https://proceedings.neurips.cc/paper_files/paper/2012/file/6aca97005c68f1206823815f66102863-Paper.pdf | |
| dc.relation.references | Deep Learning. (2020). In A. Tatnall (Ed.), Encyclopedia of Education and Information Technologies (First ed., p. 558). Springer Cham. https://doi.org/10.1007/978-3-030-10576-1_300164 | |
| dc.relation.references | Deeplearning4j: Deeplearning4j Suite Overview. (2023, July). https://www.deepspeed.ai/ | |
| dc.relation.references | DeepSpeed authors: Deepspeed (overview and features). (2023, July). (Microsoft) https://www.deepspeed.ai/ | |
| dc.relation.references | FairScale authors. (2021). Fairscale: A general purpose modular pytorch library for high performance and large scale training. https://github.com/facebookresearch/fairscale | |
| dc.relation.references | Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021). DAPPLE: a pipelined data parallel approach for training large models. Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3437801.3441593 | |
| dc.relation.references | Farkas, A., Kertész, G., & Lovas, R. (2020). Parallel and Distributed Training of Deep Neural Networks: A brief overview. 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES) (pp. 165-170). Reykjavík, Iceland: IEEE. https://doi.org/10.1109/INES49302.2020.9147123 | |
| dc.relation.references | Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. https://doi.org/10.48550/arXiv.1911.04610 | |
| dc.relation.references | Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 18). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC]. https://doi.org/10.48550/arXiv.1806.03377 | |
| dc.relation.references | Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, July 25). GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv:1811.06965v5 [cs.CV], 1-11. https://doi.org/10.48550/arXiv.1811.06965 | |
| dc.relation.references | Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, June 20). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv:1408.5093v1 [cs.CV], 1-4. | |
| dc.relation.references | Keras: Keras api references. (2023, July). https://keras.io/api/ | |
| dc.relation.references | Kim, C., Lee, H., Jeong, M., Baek, W., Yoon, B., Kim, I., . . . Kim, S. (2020, April 21). torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models. arXiv:2004.09910v1 [cs.DC], 1-10. https://doi.org/10.48550/arXiv.2004.09910 | |
| dc.relation.references | Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. https://doi.org/10.48550/arXiv.1404.5997 | |
| dc.relation.references | Li, S., & Hoefler, T. (2021). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3458817.3476145 | |
| dc.relation.references | Liang, G., & Alsmadi, I. (2022, February 12). Benchmark Assessment for DeepSpeed Optimization Library. arXiv:2202.12831v1 [cs.LG], 1-8. https://doi.org/10.48550/arXiv.2202.12831 | |
| dc.relation.references | Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. https://doi.org/10.1109/CLUSTER51413.2022.00042 | |
| dc.relation.references | Luo, Z., Yi, X., Long, G., Fan, S., Wu, C., Yang, J., & Lin, W. (2022). Efficient Pipeline Planning for Expedited Distributed DNN Training. IEEE INFOCOM 2022 - IEEE Conference on Computer Communications (pp. 340-349). IEEE. https://doi.org/INFOCOM48880.2022.9796787 | |
| dc.relation.references | Mofrad, M. H., Melhem, R., Ahmad, Y., & Hammoud, M. (2020). Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms. 2020 IEEE High Performance Extreme Computing Conference (HPEC) (pp. 1-7). Waltham, MA, USA: IEEE. https://doi.org/10.1109/HPEC43674.2020.9286195 | |
| dc.relation.references | MXNet: Mxnet api docs. (2023, July). https://mxnet.apache.org/versions/1.9.1 | |
| dc.relation.references | Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Gang, G. R., . . . Zaharia, M. (2019). PipeDream: generalized pipeline parallelism for DNN training. (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3341301.3359646 | |
| dc.relation.references | Padua, D. (2011). Pipelining. In D. Padua (Ed.), Encyclopedia of Parallel Computing (pp. 1562–1563). Boston, MA, USA: Springer. https://doi.org/10.1007/978-0-387-09766-4_335 | |
| dc.relation.references | Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. https://www.usenix.org/conference/atc20/presentation/park | |
| dc.relation.references | PlaidML: Plaidml api docs. (2023, July). https://github.com/plaidml/plaidml | |
| dc.relation.references | Pytorch: Pytorch documentation. (2023, July). https://pytorch.org/ | |
| dc.relation.references | Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2020, May 13). ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arXiv:1910.02054v3 [cs.LG], 1-24. https://doi.org/10.48550/arXiv.1910.02054 | |
| dc.relation.references | Rasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event. July 6 - 10. CA, USA: Association for Computing Machinery. https://doi.org/10.1145/3394486.3406703 | |
| dc.relation.references | Rojas, E., Pérez, D., Calhoun, J. C., Bautista Gomez, L., Jones, T., & Meneses, E. (2021). Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration. 2021 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 492-503). Portland, OR, USA: IEEE. https://doi.org/10.1109/Cluster48925.2021.00045 | |
| dc.relation.references | Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 8th Latin American Conference, CARLA 2021, October 6–8, 2021, Revised Selected Papers. 1540, pp. 177-192. Guadalajara, Mexico: Springer, Cham. https://doi.org/10.1007/978-3-031-04209-6_13 | |
| dc.relation.references | Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., . . . Fei-Fei, L. (2015, January 30). ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575v3 [cs.CV]. https://doi.org/10.48550/arXiv.1409.0575 | |
| dc.relation.references | Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. https://doi.org/10.1109/CANDARW51189.2020.00036 | |
| dc.relation.references | TensorFlow: Overview. (2023, July). https://www.tensorflow.org/ | |
| dc.relation.references | Yang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. International Conference on Learning Representations. https://openreview.net/forum?id=cw-EmNq5zfD | |
| dc.relation.references | Yildirim, E., Arslan, E., Kim, J., & Kosar, T. (2016). Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency. IEEE Transactions on Cloud Computing, 4(1), 63 - 75. https://doi.org/10.1109/TCC.2015.2415804 | |
| dc.relation.references | Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). Francisco, CA, USA: IEEE. https://doi.org/10.1109/DAC18074.2021.9586300 | |
| dc.relation.references | Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. https://doi.org/10.1016/j.future.2023.04.033 | |
| dc.relation.uri | https://revistas.unab.edu.co/index.php/rcc/issue/view/297 | spa |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
| dc.source | Vol. 25 Núm. 1 (2024): Revista Colombiana de Computación (Enero-Junio); 48-59 | spa |
| dc.subject.keywords | Deep learning | eng |
| dc.subject.keywords | Parallelism | eng |
| dc.subject.keywords | Artificial neural networks | eng |
| dc.subject.keywords | Distributed training | eng |
| dc.title | A Study of Pipeline Parallelism in Deep Neural Networks | eng |
| dc.type.coar | http://purl.org/coar/resource_type/c_2df8fbb1 | |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
| dc.type.driver | info:eu-repo/semantics/article | |
| dc.type.hasversion | info:eu-repo/semantics/publishedVersion | |
| dc.type.local | Artículo | spa |
| dc.type.redcol | http://purl.org/redcol/resource_type/ART |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- Artículo.pdf
- Tamaño:
- 691.75 KB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Artículo
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 347 B
- Formato:
- Item-specific license agreed upon to submission
- Descripción:
