CNN Accelerator Adapted to Quasi Structured Pruning and Dense Mode

Sadough, Amirhossein; Gharaee, Hossein; Amiri, Parviz; Maghami, Mohammad Hossein

Volume 17, Issue 3 (7-2025) itrc 2025, 17(3): 19-33 | Back to browse issues page

Mendeley

Zotero

RefWorks

Sadough A, Gharaee H, Amiri P, Maghami M H. CNN Accelerator Adapted to Quasi Structured Pruning and Dense Mode. itrc 2025; 17 (3) :19-33
URL: http://ijict.itrc.ac.ir/article-1-730-en.html

CNN Accelerator Adapted to Quasi Structured Pruning and Dense Mode

Amirhossein Sadough¹

, Hossein Gharaee²

, Parviz Amiri³

, Mohammad Hossein Maghami³

1- Department of AI, Donders Center for Cognition, Radboud University Netherlands
2- ITRC , gharaee@itrc.ac.ir
3- Department of Electrical Engineering Shahid Rajaee Teacher Training University Tehran, Iran

Abstract: (470 Views)

In recent years, Convolutional Neural Networks (CNN) have been extensively used in machine learning
algorithms related to images due to their exceptional accuracy. The multiplication-accumulation (MAC) in
convolutional layers makes them computationally expensive, and these layers account for 90% of the total computation.
Several researchers have taken advantage of pruning the weights and activations to overcome high computation
bandwidth. These techniques are divided into two categories: 1) unstructured pruning of the weights can achieve heavy
pruning, but in the process, it unbalances data access and computation processes. Consequently, compression coding
for indexing non-zero data increases, which causes much more memory volume. 2) Structured pruning by the specified
pattern prunes the weights and regularizes both computations and memory access but does not support high pruning
amounts compared to unstructured pruning. In this paper, we proposed Quasi Structured Pruning (QSP) that profits
from the high pruning ratio of unstructured pruning. The load balancing property in structured pruning has also been
included in the QSP scheme. Implementation results of our accelerator using VGG16 on a Xilinx XC7Z100 indicate
616.94 GOP/s and 1437.7 GOP/s at just 7.8 watts power consumption for dense and sparse mode, respectively.
Experimental results show that the accelerator is 1.38×, 1.1×, 2.77×, 2.87×, 1.91×, and 1.18× better in terms of DSP
efficiency than previous accelerators in dense mode. As well, our accelerator has achieved 1.9×, 2.92×, 1.67×, and 1.11×
higher DSP efficiency besides 4.52×, 5.31×, 10.38×, and 1.1× better energy efficiency than other state-of-the-art sparse
accelerators.

Keywords: Load balance, convolutional neural network (CNN), hardware accelerator, zero-skipping, quasi-structured pruning (QSP).

Full-Text [PDF 1114 kb] (128 Downloads)

Type of Study: Research | Subject: Network

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Principal Contact