100% found this document useful (1 vote)
121 views

Tensor Processing Unit

The document discusses Google's Tensor Processing Unit (TPU), a custom ASIC chip designed for machine learning and neural network workloads. It provides a high-level overview of the TPU's history, architecture, and performance advantages compared to CPUs and GPUs. The TPU is optimized for neural network operations through its use of 8-16 bit integers, large on-chip memory, and a matrix multiplication unit containing over 65,000 arithmetic logic units connected in a systolic array configuration. This design enables it to perform tens of trillions of operations per second and powers various Google services involving machine learning.

Uploaded by

Osama Asghar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
121 views

Tensor Processing Unit

The document discusses Google's Tensor Processing Unit (TPU), a custom ASIC chip designed for machine learning and neural network workloads. It provides a high-level overview of the TPU's history, architecture, and performance advantages compared to CPUs and GPUs. The TPU is optimized for neural network operations through its use of 8-16 bit integers, large on-chip memory, and a matrix multiplication unit containing over 65,000 arithmetic logic units connected in a systolic array configuration. This design enables it to perform tens of trillions of operations per second and powers various Google services involving machine learning.

Uploaded by

Osama Asghar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Tensor Processing Unit

Tensor Proces

By: Lucas Jodon


Yelman Khan
Overview
● History
● Neural Networks
● Architecture
● Performance
● Real-World Uses
● Future Development
History of TPUs
● Google began searching for a way to support neural networking for the development of their services such as
voice recognition
○ Using existing hardware, they would require twice as many data centers
○ Development of a new architecture instead
● Norman Jouppi begins work on a new architecture to support TensorFlow
○ FPGA’s were not power-efficient enough
○ ASIC design was selected for power and performance benefits
○ Device would execute CISC instructions on many networks
○ Device was made to be programmable, but operate on matrices instead of vector/scalar
○ Resulting device was comparable to a GPU or Signal Processor
Neural Networks
● First proposed in 1944 by Warren McCullough and Walter Pitts
○ Modeled loosely on human learning
● Neural nets are a method of machine learning
○ Computer learns to perform a task by analyzing training examples
○ EX: pair several audio files with the text words they mean, the machine will then find patterns between the audio data and the
labels
○ Each incoming pairing is given a weight, which is added to pre-existing node pairings
○ Once node weights pass a predefined threshold, the pairing is considered active
● Google began development on DistBelief in 2011
○ DistBelief became TensorFlow, which officially released version 1.0.0 in February 2017
○ TensorFlow is a software library with significant machine learning support
○ TensorFlow is intended to be a production grade library for dataflow implementation
Quantization in Neural Networks
● Precision of 32-bit/16-bit floating points usually not required
● Accuracy can be maintained with 8-bit integers
● Energy consumption and hardware footprint is reduced
Architecture Overview
● Large, on-chip DRAM required for
accessing pairing weight values.
● It is Possible to simultaneously
store weights and load activations.
○ TPU can do 64,000 of these
accumulates per cycle.
● First generation used 8-bit
operands and quantization
○ Second generation uses 16-bit
● Matrix Multiplication Unit has 256
× 256 (65,536) ALUs
Architecture Overview Continued
● Minimalistic hardware design used to
improve space and power consumption
○ No caches, branch prediction, out-of-order
execution, multiprocessing, speculative
prefetching, address coalescing,
multithreading, context switching, etc.
○ Minimalism is beneficial here because TPU is
required only to run neural network
prediction
● TPU chip is half the size of the other
chips
○ 28 nm process with a die size ≤ 331 mm
○ This is partially due to simplification of
control logic
TPU Stack
● TPU performs the actual neural
network calculation
● Wide range of neural network
models
● TPU stack translates the API
calls into TPU instructions
CPUs & GPUs
● CPUs and GPUs store values in registers
● A program tracks the read/operate/write operations
● A program tells ALUs :
○ Which Register to read from
○ What operation to perform
○ Which Register to write to
Performance
● TPU consists of Matrix Multiplier Unit (MXU)
● MXU performs hundreds of thousands of operations per
clock cycle
● Reads an input value only once
● Inputs are used many times without storing back to register
● Wires connect adjacent ALUs
● Multiplication and addition are performed in specific order
● Short and energy efficient
● Design is known as systolic array
Matrix Multiplication Unit
● Contains 256 x 256 = 65,536 ALUs
● TPU runs at 700 MHz
● Able to compute 46 x 1012
multiply-and-add operations per second
● Equivalent to 92 Teraops per second in
matrix unit
USES
● RankBrain algorithm used by Google search
● Google Photos
● Google Translate
● Google Cloud Platform
Future Development
● Google Cloud TPUs
○ Uses TPU version 2
○ Each TPU include a high-speed network
○ Allows to build machine learning supercomputers called “TPU Pods”
○ Improvement in training times
○ Allows mixing and matching with other hardware which includes Skylake CPUs and NVIDIA
GPUs
Works Cited
● First In-Depth Look at Google's TPU Architecture
https://www.nextplatform.com/2017/04/05/first-depth-look-googles-tpu-architecture/
● An in-depth look at Google's first Tensor Processing Unit (TPU) | Google Cloud Big Data and
Machine Learning Blog | Google Cloud Platform
https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-u
nit-tpu
● Google Cloud TPU Details Revealed
Servethehome - https://www.servethehome.com/google-cloud-tpu-details-revealed/
● TensorFlow - Google
● https://research.googleblog.com/2015/11/tensorflow-googles-latest-machine_9.html
● Explained: Neural networks
● Larry Hardesty | MIT News Office -
http://news.mit.edu/2017/explained-neural-networks-deep-learning-0414
● https://www.nextplatform.com/2017/04/05/first-depth-look-googles-tpu-architecture/
● Google cloud TPU -
https://www.blog.google/topics/google-cloud/google-cloud-offer-tpus-machine-learning/

You might also like