TY - JOUR
T1 - A 5.3 pJ/op approximate TTA VLIW tailored for machine learning
AU - Teittinen, Jukka
AU - Hiienkari, Markus
AU - Zliobaite, Indre
AU - Hollmen, Jaakko
AU - Berg, Heikki
AU - Heiskala, Juha
AU - Viitanen, Timo
AU - Simonsson, Jesse
AU - Koskinen, Lauri
PY - 2017/3/1
Y1 - 2017/3/1
N2 - To achieve energy efficiency in the Internet-of-Things (IoT), more intelligence is required in the wireless IoT nodes. Otherwise, the energy required by the wireless communication of raw sensor data will prohibit battery lifetime, the backbone of IoT. One option to achive this intelligence is to implement a variety of machine learning algorithms on the IoT sensor instead of the cloud. Shown here is sub-milliwatt machine learning accelerator operating at the Ultra-Low Voltage Minimum-Energy Point. The accelerator is a Transport Triggered Architecture (TTA) Application-Specific Instruction-Set Processor (ASIP) targeted for running various Machine Learning algorithms. The ASIP is implemented in 28 nm FDSOI (Fully Depleted Silicon On Insulator) CMOS process, with an operating voltage of 0.35 V, and is capable of 5.3pJ/cycle and 1.8nJ/iteration when performing conventional machine learning algorithms. The ASIP also includes hardware and compiler support for approximate computing. With the machine learning algorithms, computing approximately brings a maximum of 4.7% energy savings.
AB - To achieve energy efficiency in the Internet-of-Things (IoT), more intelligence is required in the wireless IoT nodes. Otherwise, the energy required by the wireless communication of raw sensor data will prohibit battery lifetime, the backbone of IoT. One option to achive this intelligence is to implement a variety of machine learning algorithms on the IoT sensor instead of the cloud. Shown here is sub-milliwatt machine learning accelerator operating at the Ultra-Low Voltage Minimum-Energy Point. The accelerator is a Transport Triggered Architecture (TTA) Application-Specific Instruction-Set Processor (ASIP) targeted for running various Machine Learning algorithms. The ASIP is implemented in 28 nm FDSOI (Fully Depleted Silicon On Insulator) CMOS process, with an operating voltage of 0.35 V, and is capable of 5.3pJ/cycle and 1.8nJ/iteration when performing conventional machine learning algorithms. The ASIP also includes hardware and compiler support for approximate computing. With the machine learning algorithms, computing approximately brings a maximum of 4.7% energy savings.
KW - Approximate computing
KW - Integrated circuit
KW - Machine learning
KW - Minimum energy point
KW - Processor
KW - Timing error detection
UR - http://www.scopus.com/inward/record.url?scp=85010375365&partnerID=8YFLogxK
U2 - 10.1016/j.mejo.2017.01.007
DO - 10.1016/j.mejo.2017.01.007
M3 - Article
AN - SCOPUS:85010375365
SN - 0000-0000
VL - 61
SP - 106
EP - 113
JO - Microelectronics Journal
JF - Microelectronics Journal
ER -