Prabakaran, Bharath Srinivas, Mrazek, Vojtech, Vasicek, Zdenek, Sekanina, Lukas, and Shafique, Muhammad
Subjects
Computer Science - Hardware Architecture
Abstract
Generation and exploration of approximate circuits and accelerators has been a prominent research domain to achieve energy-efficiency and/or performance improvements. This research has predominantly focused on ASICs, while not achieving similar gains when deployed for FPGA-based accelerator systems, due to the inherent architectural differences between the two. In this work, we propose a novel framework, Xel-FPGAs, which leverages statistical or machine learning models to effectively explore the architecture-space of state-of-the-art ASIC-based approximate circuits to cater them for FPGA-based systems given a simple RTL description of the target application. We have also evaluated the scalability of our framework on a multi-stage application using a hierarchical search strategy. The Xel-FPGAs framework is capable of reducing the exploration time by up to 95%, when compared to the default synthesis, place, and route approaches, while identifying an improved set of Pareto-optimal designs for a given application, when compared to the state-of-the-art. The complete framework is open-source and available online at https://github.com/ehw-fit/xel-fpgas. Comment: Accepted for publication at the 42nd International Conference on Computer-Aided Design (ICCAD), November 2023, San Francisco, CA, USA
Computer Science - Neural and Evolutionary Computing
Abstract
In recent years, many design automation methods have been developed to routinely create approximate implementations of circuits and programs that show excellent trade-offs between the quality of output and required resources. This paper deals with evolutionary approximation as one of the popular approximation methods. The paper provides the first survey of evolutionary algorithm (EA)-based approaches applied in the context of approximate computing. The survey reveals that EAs are primarily applied as multi-objective optimizers. We propose to divide these approaches into two main classes: (i) parameter optimization in which the EA optimizes a vector of system parameters, and (ii) synthesis and optimization in which EA is responsible for determining the architecture and parameters of the resulting system. The evolutionary approximation has been applied at all levels of design abstraction and in many different applications. The neural architecture search enabling the automated hardware-aware design of approximate deep neural networks was identified as a newly emerging topic in this area.
Pinos, Michal, Mrazek, Vojtech, and Sekanina, Lukas
Subjects
Computer Science - Neural and Evolutionary Computing
Abstract
There is a growing interest in automated neural architecture search (NAS) methods. They are employed to routinely deliver high-quality neural network architectures for various challenging data sets and reduce the designer's effort. The NAS methods utilizing multi-objective evolutionary algorithms are especially useful when the objective is not only to minimize the network error but also to minimize the number of parameters (weights) or power consumption of the inference phase. We propose a multi-objective NAS method based on Cartesian genetic programming for evolving convolutional neural networks (CNN). The method allows approximate operations to be used in CNNs to reduce the power consumption of a target hardware implementation. During the NAS process, a suitable CNN architecture is evolved together with approximate multipliers to deliver the best trade-offs between the accuracy, network size, and power consumption. The most suitable approximate multipliers are automatically selected from a library of approximate multipliers. Evolved CNNs are compared with common human-created CNNs of a similar complexity on the CIFAR-10 benchmark problem. Comment: Accepted for publication at 24th European Conference on Genetic Programming (EuroGP)
Mrazek, Vojtech, Sekanina, Lukas, and Vasicek, Zdenek
Subjects
Computer Science - Hardware Architecture
Abstract
Approximate circuits have been developed to provide good tradeoffs between power consumption and quality of service in error resilient applications such as hardware accelerators of deep neural networks (DNN). In order to accelerate the approximate circuit design process and to support a fair benchmarking of circuit approximation methods, libraries of approximate circuits have been introduced. For example, EvoApprox8b contains hundreds of 8-bit approximate adders and multipliers. By means of genetic programming we generated an extended version of the library in which thousands of 8- to 128-bit approximate arithmetic circuits are included. These circuits form Pareto fronts with respect to several error metrics, power consumption and other circuit parameters. In our case study we show how a large set of approximate multipliers can be used to perform a resilience analysis of a hardware accelerator of ResNet DNN and to select the most suitable approximate multiplier for a given application. Results are reported for various instances of the ResNet DNN trained on CIFAR-10 benchmark problem. Comment: To appear at the 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS 2020)
Prabakaran, Bharath Srinivas, Mrazek, Vojtech, Vasicek, Zdenek, Sekanina, Lukas, and Shafique, Muhammad
Subjects
Computer Science - Hardware Architecture
Abstract
There has been abundant research on the development of Approximate Circuits (ACs) for ASICs. However, previous studies have illustrated that ASIC-based ACs offer asymmetrical gains in FPGA-based accelerators. Therefore, an AC that might be pareto-optimal for ASICs might not be pareto-optimal for FPGAs. In this work, we present the ApproxFPGAs methodology that uses machine learning models to reduce the exploration time for analyzing the state-of-the-art ASIC-based ACs to determine the set of pareto-optimal FPGA-based ACs. We also perform a case-study to illustrate the benefits obtained by deploying these pareto-optimal FPGA-based ACs in a state-of-the-art automation framework to systematically generate pareto-optimal approximate accelerators that can be deployed in FPGA-based systems to achieve high performance or low-power consumption. Comment: Accepted for Publication at the 57th Design Automation Conference (DAC), July 2020, San Francisco, CA, USA
Ceska, Milan, Matyas, Jiri, Mrazek, Vojtech, Sekanina, Lukas, Vasicek, Zdenek, and Vojnar, Tomas
Applied Soft Computing, Volume 95, October 2020, 106466
Subjects
Computer Science - Neural and Evolutionary Computing
Abstract
We present a novel approach for designing complex approximate arithmetic circuits that trade correctness for power consumption and play important role in many energy-aware applications. Our approach integrates in a unique way formal methods providing formal guarantees on the approximation error into an evolutionary circuit optimisation algorithm. The key idea is to employ a novel adaptive search strategy that drives the evolution towards promptly verifiable approximate circuits. As demonstrated in an extensive experimental evaluation including several structurally different arithmetic circuits and target precisions, the search strategy provides superior scalability and versatility with respect to various approximation scenarios. Our approach significantly improves capabilities of the existing methods and paves a way towards an automated design process of provably-correct circuit approximations.
Vaverka, Filip, Mrazek, Vojtech, Vasicek, Zdenek, and Sekanina, Lukas
Subjects
Computer Science - Distributed, Parallel, and Cluster Computing and Computer Science - Machine Learning
Abstract
Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototyping, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slower than a software DNN implementation running on CPU or GPU and operating with standard floating point arithmetic instructions and common DNN libraries. The reason is that there is no hardware support for approximate arithmetic operations on common CPUs and GPUs and these operations have to be expensively emulated. In order to address this issue, we propose an efficient emulation method for approximate circuits utilized in a given DNN accelerator which is emulated on GPU. All relevant approximate circuits are implemented as look-up tables and accessed through a texture memory mechanism of CUDA capable GPUs. We exploit the fact that the texture memory is optimized for irregular read-only access and in some GPU architectures is even implemented as a dedicated cache. This technique allowed us to reduce the inference time of the emulated DNN accelerator approximately 200 times with respect to an optimized CPU version on complex DNNs such as ResNet. The proposed approach extends the TensorFlow library and is available online at https://github.com/ehw-fit/tf-approximate. Comment: To appear at the 23rd Design, Automation and Test in Europe (DATE 2020). Grenoble, France
Computer Science - Neural and Evolutionary Computing
Abstract
Automated design methods for convolutional neural networks (CNNs) have recently been developed in order to increase the design productivity. We propose a neuroevolution method capable of evolving and optimizing CNNs with respect to the classification error and CNN complexity (expressed as the number of tunable CNN parameters), in which the inference phase can partly be executed using fixed point operations to further reduce power consumption. Experimental results are obtained with TinyDNN framework and presented using two common image classification benchmark problems -- MNIST and CIFAR-10. Comment: TPNC 2019, LNCS 11934, pp. 1-13, 2019
Mrazek, Vojtech, Vasicek, Zdenek, Sekanina, Lukas, Hanif, Muhammad Abdullah, and Shafique, Muhammad
Subjects
Computer Science - Neural and Evolutionary Computing and Computer Science - Machine Learning
Abstract
The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the computational path of DNN accelerators while retraining can completely be avoided. ALWANN provides highly optimized implementations of DNNs for custom low-power accelerators in which the number of computing units is lower than the number of DNN layers. First, a fully trained DNN is converted to operate with 8-bit weights and 8-bit multipliers in convolutional layers. A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized. The optimizations including the multiplier selection problem are solved by means of a multiobjective optimization NSGA-II algorithm. In order to completely avoid the computationally expensive retraining of DNN, which is usually employed to improve the classification accuracy, we propose a simple weight updating scheme that compensates the inaccuracy introduced by employing approximate multipliers. The proposed approach is evaluated for two architectures of DNN accelerators with approximate multipliers from the open-source "EvoApprox" library. We report that the proposed approach saves 30% of energy needed for multiplication in convolutional layers of ResNet-50 while the accuracy is degraded by only 0.6%. The proposed technique and approximate layers are available as an open-source extension of TensorFlow at https://github.com/ehw-fit/tf-approximate. Comment: Accepted for 2019 IEEE/ACM International Conference On Computer-Aided Design (ICCAD'19)
We propose an application-tailored data-driven fully automated method for functional approximation of combinational circuits. We demonstrate how an application-level error metric such as the classification accuracy can be translated to a component-level error metric needed for an efficient and fast search in the space of approximate low-level components that are used in the application. This is possible by employing a weighted mean error distance (WMED) metric for steering the circuit approximation process which is conducted by means of genetic programming. WMED introduces a set of weights (calculated from the data distribution measured on a selected signal in a given application) determining the importance of each input vector for the approximation process. The method is evaluated using synthetic benchmarks and application-specific approximate MAC (multiply-and-accumulate) units that are designed to provide the best trade-offs between the classification accuracy and power consumption of two image classifiers based on neural networks. Comment: Accepted for publication at Design, Automation and Test in Europe (DATE 2019). Florence, Italy
Mrazek, Vojtech, Hanif, Muhammad Abdullah, Vasicek, Zdenek, Sekanina, Lukas, and Shafique, Muhammad
Subjects
Computer Science - Distributed, Parallel, and Cluster Computing and Computer Science - Machine Learning
Abstract
Approximate computing is an emerging paradigm for developing highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations. An open problem is "how to effectively combine circuits from these libraries to construct complex approximate accelerators". This paper proposes a novel methodology for searching, selecting and combining the most suitable approximate circuits from a set of available libraries to generate an approximate accelerator for a given application. To enable fast design space generation and exploration, the methodology utilizes machine learning techniques to create computational models estimating the overall quality of processing and hardware cost without performing full synthesis at the accelerator level. Using the methodology, we construct hundreds of approximate accelerators (for a Sobel edge detector) showing different but relevant tradeoffs between the quality of processing and hardware cost and identify a corresponding Pareto-frontier. Furthermore, when searching for approximate implementations of a generic Gaussian filter consisting of 17 arithmetic operations, the proposed approach allows us to identify approximately $10^3$ highly important implementations from $10^{23}$ possible solutions in a few hours, while the exhaustive search would take four months on a high-end processor. Comment: Accepted for publication at the Design Automation Conference 2019 (DAC'19), Las Vegas, Nevada, USA
Pleskacz, Witold, Sekanina, Lukáš, Renovell, M. (Michel), Bernard, Serge, Kasprowicz, Dominik, and IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems
Subjects
Electronic systems--Design and construction--Congresses, Electronic systems--Testing--Congresses, Electronic circuits--Design and construction--Congresses, Electronic circuits--Testing--Congresses, and Nanoelectronics--Congresses