Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.
Title: Circuits and Architecture for Bio-Inspired AI Accelerators
Abstract: Technological advances in microelectronics envisioned through Moore’s law have led to more powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at the scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant.
Unconventional Compute-in-Memory (CiM) architectures such as the analog winner-takes-all associative-memory, the Charge-Injection Device (CID) processor, and analog-array processing have been proposed as alternatives. Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications (VMMs), and in recent work, multi-bit vector-vector multiplications. A similar approach was used in earlier work, where a charge-injection device array was utilized to store binary coded vectors, and computations were done using binary or multi-bit inputs in the charge domain; computation is carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with a large number of elements, high energy efficiencies can be achieved.
In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target storage technologies: (i) a multilevel non-volatile computational cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) computational bit-cell. Experimental results in deep-submicron CMOS processes demonstrate successful operation; subsequently, behavioral models were developed and employed in large-scale system simulations and emulations. Thereafter, at the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level, demonstrating successful experimental results and providing insight into the integration requirements that larger systems may demand. Finally, on the architectural level, two AI accelerator architectures for data center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory.
Andreas Andreou, Department of Electrical and Computer Engineering
Ralph Etienne-Cummings, Department of Electrical and Computer Engineering
Philippe Pouliquen, Department of Electrical and Computer Engineering