Volume 15, Issue 2 (3-2023)                   itrc 2023, 15(2): 12-18 | Back to browse issues page


XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Ali A.D. Farahani A, Beitollahi H, Fathy M, Barangi R. A Partial Method for Calculating CNN Networks Based On Loop Tiling. itrc 2023; 15 (2) : 2
URL: http://journal.itrc.ac.ir/article-1-514-en.html
1- School of Computer Engineering Iran University of Science and Technology Tehran, Iran
2- School of Computer Engineering Iran University of Science and Technology Tehran, Iran , Beitollahi@iust.ac.ir
Abstract:   (1484 Views)

Convolutional Neural Networks (CNNs) have been widely deployed in the fields of artificial intelligence and computer vision. In these applications, the CNN part is the most computationally intensive. When these applications are run in an embedded device, the embedded processor can hardly handle the processing. This paper implements loop tiling to explain how one can construct a lightweight, low-power, and efficient CNN hardware accelerator for embedded computing devices. This method breaks a large CNN engine into small CNN engines and calculates them by low hardware resources. Finally, the results of small CNN engines are added and concatenated to construct the large CNN output. Using this method, a small accelerator can be configured to run a wide range of large CNNs. A small accelerator with one layer is designed to evaluate our methodology. Our initial investigations show that based on our methodology, the constructed accelerator can run a modified version of MobileNetV1, 70 times per second.

Article number: 2
Full-Text [PDF 823 kb]   (684 Downloads)    
Type of Study: Research | Subject: Information Technology

References
1. [1] Tianyi, Liu, et al.: 'Implementation of Training Convolutional Neural Networks', arXiv:1506.01195, 2015.
2. [2] Andrii O. Tarasenko, et al.: 'Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future', J. Cognitive neuroscience, 2020.
3. [3] Asifullah Khan1, et al.: 'A Survey of the Recent Architectures of Deep Convolutional Neural Networks', Artificial Intelligence Review, DOI: [DOI:10.1007/s10462- 020-09825-6.]
4. [4] Min Wang, et al.: 'Factorized Convolutional Neural Networks', P. IEEE International conference on Computer Vision Workshops, P.545-553, 2017. [DOI:10.1109/ICCVW.2017.71]
5. [5] Yufei Ma, et al. 'ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler', Integration, the VLSI Journal, 2018, ELSEVIER, pp14-23. [DOI:10.1016/j.vlsi.2017.12.009]
6. [6] Andrew G. Howard, et al. 'MobileNet: Efficient Convolutional Neural Networks for Mobile Vision Applications', arXiv:1704.0486,2017.
7. [7] Xiaocong Lian, et al. 'High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic', IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 27, NO. 8,AUGUST 2019, pp.1874-1885. [DOI:10.1109/TVLSI.2019.2913958]
8. [8] Yongming Shen, et al. 'Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer', Annual17 Volume 15- Number 2 - 2023 (12 -18) 18 IEEE Symposium on Filed-Programmable Custom Computing Machine FCCM, 2017, pp.93-100.
9. [9] Wei Dinga, et al. 'Designing Efficient Accelerator of Depthwise Separable Convolutional Neural Network on FPGA', Journal of Systems Architecture, ELSEVIER 2019, DOI: https://doi.org/10.1016/j.sysarc.2018.12.008 [DOI:10.1016/j.sysarc.2018.12.008.]
10. [10] Jiang Su, et al. 'Redundancy-reduced MobileNet Acceleration on Reconfigurable Logic For ImageNet Classification', International Symposium on Applied Reconfigurable Computing, 16-28, 2018. [DOI:10.1007/978-3-319-78890-6_2]
11. [11] Yu-Hsin chen, et al. 'Efficient Processing of Deep Neural Networks: A Tutorial and Survey', pp. 2295-2329, Proceedings of the IEEE - Vol. 105, No. 12, December 2017. [DOI:10.1109/JPROC.2017.2761740]
12. [12] H. Kopka and P. W. Daly, A Guide to L AT EX, 3rd ed. Harlow, England: Addison-Wesley, 1999.
13. [13] A. Stoutchinin, et al. 'Optimally Scheduling CNN Convolutions for Efficient Memory Access', IEEE Transaction on computer-aided design of integrated circuits and systems, Feb 2019.
14. [14] H. Sharma, et al. 'Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks', ISCA 2018. [15] X. Zhang, et al. "SuffleNet:An Extremly Efficient Convolutional Neural Network for Moblile devices", 2017, [DOI:10.1109/ISCA.2018.00069]
15. arxiv:1707.01083v2.
16. [16] Y. Huang, et al. "An efficient loop tiling framework for convolutional neural network inference accelerators june", vol.16, pp.116-123, the Institue of Engineering and Thechnology, IET Circuits Devices Syst, 2022. [DOI:10.1049/cds2.12091]
17. [17] M. Merouani, et al. "Progress Report: A Deep Learning Guided Exploration of Affine Unimodular Loop Transformations" IMPACT 2022.
18. [18] R. Li, et al. "Analytical Characterization and Design Space Exploration for Optimization of CNNs" ASPLOS '21, April 19-23, 2021, Virtual, USA. [DOI:10.1145/3445814.3446759]
19. [19] P. Darbani, N. Rohbani, H. Beitollahi, P. Lotfi-Kamran " RASHT: A Partially Reconfigurable Architecture for Efficient Implementation of CNNs" IEEE Transactions on very large scale integration (VLSI), Vol. 30, Nr. 7, pp. 860-868, 2022. [DOI:10.1109/TVLSI.2022.3167449]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.