Xtensa
Xtensa
Xtensa
%vervie&
Background Changes in progress from Xtensa to Xtensa LX Automated Development Process ISA TIE Language Benchmarks
Tensilica
ounded in !""# in Santa Clara$ California %& a group of engineers from Intel$ S'I$ (IPS$ and S&nops&s to compete )ith A*C 'oal+ To address application specific microprocessor cores and soft)are development tools %& designing the first configura%le and e,tensi%le processor core
'hy(
Em%edded application pro%lems )ith high cost custom designs or lo) performance -inefficiencient. processors S&stem on a Chip -SoC. challenge
Traditi
*apidl& increasing num%er of transistors re/uire more *TL %locks on chip 0ardcoded *TL %locks are not fle,i%le 0and1optimi2ed for application specific purposes
Tensilica,s S luti n
Xtensa
-
cusin) n desi)n thr u)h the .r cess r, and n t thr u)h hard&ired *T+
Xtensa
irst appearing in !""" 341%it microprocessor core )ith a graphical configuration interface and integrated tool chain Designed from the start to %e user customi2a%le Emphasi2es instruction1set configura%ilit& as its primar& feature distinguishing it from other core offerings 0as revolutioni2ed the S&stem on a Chip -SoC. challenge through out its development Configura%le and E,tensi%le
Xtensa / 0n a !utshell
Ena%les em%edded s&stem designers to %uild %etter$ more highl& integrated products in significantl& less time Can add speciali2ed functions or instructions to processor and have them recogni2ed as 5native6 %& the entire soft)are development took chain (ove to a higher level of a%straction %& designing )ith processors rather than *TL
Xtensa - 1elivera"les
c unt ran)e3 #5,$$$ / 15$,$$$4 0ncrease in )ates as cust mer adds instructi ns r .ti nal 5eatures
Xtensa / 6eri5icati
n Challen)es
To e,tensivel& verif& the configura%le processor to ensure each possi%le configuration )ill %e %ug free To ena%le the customer to rapidl& integrate the core )hile limiting support costs
#7 instructions five1stage pipeline that supports single1c&cle e,ecution ! 1 load8store model 341entr& orthogonal register file 34 optional e,tra registers
Processor Configuration
># *e)isters ?>#-"its@ ABtensi"le via use 5 T0A instructi ns ! -l atin) P int Pr cess r Cer ver head l .s
Xtensa - 0SA
ISA Influences
;0PS 07;
Xtensa 000
9ith :irtual IP 'roup developed an (P3 audio decoder for Tensilica;s Xtensa configura%le microprocessor architecture< The decoder offers hard)are e,tensions and optimi2ed code for accelerating (P3 decoding 341%it floating point processing 34,341%it hard)are multiplier irst Coprocessor interface
6ectra
1SP enhancements
Xtensa 06
=sed )hite %o, verification methodolog& for the original development Includes >1In Check and the Checker9are Li%rar& made %& (entor 'raphics Could repartition instructions up until point of manufacturing Support multiple processors in ASIC !471%it )ide local memor& interface
Xtensa 6
3?>(02 -s&nthesi2ed.$ as small as !7@ gates -><4?mm4. (ore fle,i%le interfaces for multiple processors
'rite-"ack and &rite-thr u)h caches Anhanced Xtensa + cal ;em ry 0nter5ace Shared data mem ries
(ore Automation
Xtensa CDC44 C m.iler & T0A +an)ua)e im.r vements XT#$$$ Amulati n kit
Xtensa 6 / Per5
rmance C st Timeline
Xtensa =
cust mi<e .r cess r 5r m CDC44 "ased al) rithm usin) XP*AS C m.iler >$E less . &er c nsum.ti n Advanced security .r visi ns in ;;8-ena"led c n5i)urati ns
Xtensa +X
"and&idth, c m.ute .arallelism, and l &-. &er .timi<ati n eFuivalent t hand- .timi<ed, n n.r )ramma"le, *T+-desi)ned hard&are "l cks XP*AS C m.iler and aut mated .r cess )enerat r 8ses -leBi"le +en)th 0nstructi n Xtensi n ?-+0X@ 0deal 5 r3
Xtensa = 6s Xtensa +X
Processor Comparison Xt e ns a 6 Ma jo r I S A Co n fig u ra tio n O p tio n s MAC16 MUL16/ MUL32 Floa t ing Point Unit Ve ct ra LX S IMD DS P Engine Lin#$ MMU Pip e lin e / Arc h ite c tu re O p tio n s Pi%e line S t a ge s FLIX (e c)nolog* Pro c e s s o r I n te rfa c e O p tio n s PIF a n- XLMI A!a ila "le Loa -/ S t ore Unit s De s igne r + De /ine - Port s a n- 0#e #e s Ye s .ne o t A!a ila "le Ye s .ne or (,o Ye s & s t a ge ot A!a ila "le & st a ge / ' s t a ge 2 + t o 1& + ,i-e is s #e %a ra lle l e $e c#t ion Ye s Ye s Ye s ot A!a ila "le Ye s Ye s Ye s Ye s Ye s ot A!a ila "le Xt e ns a LX
Xtensa +X
Strongest
selling point is performance DSP operations can %e encapsulated into custom instructions 0igh performance leads to po)er savings Custom instructions target a special application
Xtensa +X
6s
2eneral Pur. se
Xtensa +X / Traditi
nal +imitati ns
Xtensa +X
Dptions+
Xtensa +X / Hi)hli)hts
Lo)er po)er usage I8D throughput at *TL speeds Dutstanding computer performance XP*ES Compiler
Xtensa +X / +
Automated
the insertion of fine1grain clock gating for ever& functional element of the Xtensa LX processor
This includes 5uncti ns created "y the desi)ner 1irect 0D% ca.a"ility / like *T+
E,tensi%le using
LIX
?-leBi"le +en)th 0nstructi n Xtensi ns@ Similar t 6+0' / "ut cust mi<a"le t 5it a..licati n c de,s needs
instructi ns 5 rmed usin) -+0X t "e rec )ni<ed as native t entire devel .ment system
XP*AS C m.iler
Process Generator
Also receive+
Prec
n5i)ured synthesis scri.ts, test "enches, and s 5t&are-devel .ment t ls CDC44 c m.iler, linker, de"u))er, and instructi n-set simulat r already m di5ied t match the hard&are c n5i)urati n
Create special instructions descri%ed and )ritten in TIE TIE semantics allo) s&stem to modif& soft)are1development tools Integrates changes into processor design Compile )ith s&nthesis tool C test C order
Processor Configuration
P &er 8sa)e3 := 'D;H< , 4: 'D;H< ? 5 and : sta)e .i.eline@ Cl ck S.eed3 >5$ ;H<, 4$$ ;H< ?5 and : sta)e .i.eline@ Cache3
=4 )eneral .ur. se .hysical re)isters ?>#-"its@ = s.ecial .ur. se re)isters ABtensi"le via use 5 T0A and -+0X instructi ns Cer ver head l .s
Xtensa +X Architecture
)eneral .ur. se re)ister 5ile >#-"it .r )ram c unter 1= .ti nal 1-"it " lean re)isters 1= .ti nal >#-"it 5l atin) . int re)isters 4 .ti nal >#-"it ;AC1= data re)isters %.ti nal 6ectra +X 1SP re)isters
Xtensa +X Architecture
ile
r =4 re)isters 0nstructi ns have access thr u)h Gslidin) &ind &H 5 1= re)isters9 'ind & can r tate "y 4, I, r 1# re)isters *e)ister &ind & reduces c de si<e "y limitin) num"er 5 "its 5 r the address and eliminated the need t save and rest re re)ister 5iles
Xtensa +X Architecture
Xtensa +X Pi.elinin)
? or # Stage Pipeline Design ? stage pipeline has stages+ I $ *egister Access$ E,ecute$ Data1(emor& Access$ and register )rite%ack ? stage pipeline accesses memor& in t)o stages< # stage pipeline is e,tended version of the ? stage pipeline )ith e,tra I and (emor& Access stage< E,tra stages provide more time for memor& access< Designer can run at a higher clock speed )hile using slo)er memor& to improve performance
ISA consists of 7> core instructions including %oth !E and 4F %it instructions
*ead S.ecial *e)ister, 'rite S.ecial *e)ister 8sed 5 r savin) and rest rin) c nteBt, Pr cessin) 0nterru.ts and ABce.ti ns, C ntr llin) address translati n Access 8ser *e)isters 8sed 5 r C .r cess r re)isters and re)isters created &ith T0A
*8*, '8*
0SJ!C / &ait 5 r 0nstructi n -etch related chan)es t res lve *SJ!C / &ait 5 r 1is.atch related chan)es t res lve ASJ!CD1SJ!C / 'ait 5 r mem ryDdata eBecuti n related chan)es t res lve
(=L34
;8+>#
adds 1=B1= "it multi.lier ;AC1= adds 1=B1= "it multi.lier and 4$-"it accumulat r
Bits >+3 of a Xtensa instruction determine its length and format$ the %its have a value of !F to specif& it is a :ectra LX instruction Bits F+4# C contain either Xtensa LX core instruction or :ectra LX Load or Store instruction Bits 47+F? C contains either a (AC instruction or a select instruction Bits FE+E3 C contains either AL= and shift instructions or a load and store instruction for the second :ectra LX load8store unit
TIE Compiler
2enerates
5ile used t c n5i)ure s 5t&are devel .ment t ls s that they rec )ni<e T0A ABtensi ns Astimates hard&are si<e 5 ne& instructi n J u can m di5y a..licati n c de t take advanta)e 5 the ne& instructi n and simulate t decide i5 the s.eed advanta)e is & rth the hard&are c st
T0A
*esem%les :erilog (ore concise than *TL -it omits all se/uential logic$ pipeline registers$ and initiali2ation se/uences< The custom instructions and registers descri%ed in TIE are part of the processorAs programming model<
Ge) )a& to communicate )ith e,ternal devices Hueues+ data can %e sent or read through /ueues< A /ueue is defined in the TIE and the compiler generates the interface signals re/uired for the additional port needed to connect to the /ueue< Logic is also automaticall& generated Import1)ire+ processor can sample the value of an e,ternal signal E,port1state+ drive an output %ased
T0A
-usi n
Allo)s &ou to com%ine dependent operations into a single instruction Consider+ computing the average of t)o arra&s
unsigned short Ia$ I%$ IcJ < < < for- i K >J i L nJ iMM. cNiO K -aNiO M %NiO. PP !J
T&
-usi n
Assentially
unsigned short Ia$ I%$ IcJ < < < for- i K >J i L nJ iMM. cNiO K A:E*A'E-aNiO M %NiO.J
-usin) instructi ns int a Gvect rH All &s re.licati n 5 the same .erati n multi.le times in ne instructi n
The 5 ll&in) T0A c de c m.utes multi.le iterati ns in a sin)le instructi n "y c m"inin) -usi n and S0;1
regfile :EC EF 7 v operation :A:E*A'EQout :EC res$ in :EC input>$ in :EC input!R QR Q )ire NE#+>O tmp K Q input>NE3+F7O M input!NE3+F7O$ input>NF#+34O M input!NF#+34O$ input>N3!+!EO M input!N3!+!EO$ input>N!?+>O M input!N!?+>O RJ assign res K QtmpNE#+?4O$ tmpN?>+3?O$ tmpN33+!7O$ tmpN!E+!ORJ R
:EC Ia$ I%$ IcJ for -i K >J i L nJ i MK F.Q cNiO K :A:E*A'E- aNiO$ %NiO .JR
T0A aut matically creates ne& l ad, st re instructi ns t m ve =4-"it vect rs "et&een :EC re)ister 5ile and mem ry
-+0X
in eBtreme eBtensi"ility Hu)e .er5 rmance )ains . ssi"le C de si<e reducti n &ith ut c de "l at
-+0X
-+0X - 8sa)e
=sed selectivel& )hen parallelism is needed Avoids code %loat =sed seemlessl& and modelessl& used )ith standard !E1 and 4F1%it instructions
XP*AS C m.iler
Three optimi2ations methods *eturns optimal configurations along )ith pros and cons -tradeoffs.
XP*AS C m.iler
Anal&2es C8CMM code 'enerates possi%le configurations Compares performance criteria to silicon si2e -cost. *eturns possi%le configurations
Application dependent
C
.erati n sl ts in -+0X
XP*AS / 4 Pr
)ram Test
XP*AS / 4 Pr
=E
)ram Test
ilter
0<4EF De%locking
XP*AS / 4 Pr
)ram Test
(PE'F decoder
#>E
XP*AS / 4 Pr
=>E
)ram Test
s und Fuality 5 c m.ressed 5iles "ecause 5 increased .recisi n availa"le 5 r intermediate calculati ns9 ?#4 "its rather than 1=@ #4-"it audi 5ully c m.ati"le &ith m dern audi standards
Audio packages integrated into an SDC design$ so no additional codec development re/uired Integrated Audio Packages+
1
l"y 1i)ital AC-> 1ec der, 1 l"y 1i)ital AC-> C nsumer Anc der, KS und ;icr K, ;P> Anc derD1ec der, ;PA2-4 aac.lus v1 and v# Anc derD1ec der, ;PA2-#D4 AAC +C Anc derD1ec der, ';A Anc derD1ec der, A;* narr &"and s.eech c dec, A;* &ide"and s.eech c dec9
=ses over 3>> audio specific DLP instructions< eatures dual1multipl& accumulate for 4F,4F and 34,!E %it arithmetic on %oth units 5delivers noticea%l& superior sound /ualit& even )hen decoding prerecorded !E1%it encoded music files< 5
S.eed-u. ABam.le
'S( Audio Codec C )ritten in C Profiling code using unaltered *ISC architecture sho)ed that 7>T of the processor c&cles )ere devoted to multiplication Simpl& %& adding a hard)are multiplier$ the designer can reduce the num%er of c&cles re/uired from 4>F million to 47 million
S.eed-u. ABam.le
like c m.ressi n 5 r the data C nsists 5 I l )ical .erati n I 5 these .erati ns are used t dec de each sym" l in the received di)ital in5 rmati n stream The desi)ner can add a 6iter"i instructi n t the Xtensa 0SA9 The eBtensi n can use the 1#I-"it mem ry "us t l ad data 5 r I sym" ls at nce9 This results in a avera)e eBecuti n time 5 $91= cycles .er "utter5ly9 An unau)mented Xtensa +X eBecutes 6iter"i in 4# cycles9
Xtensa LX received highest %enchmark ever achieved on the Get)orking version 4 test< Xtensa LX has a F, code densit& advantage and a !>>, advantage in %oth die area and po)er dissipation
Gormali2ed -per (02. EE(BC TCPmark Simulates performance in internet ena%led client side performance
Pr cess r Xtensa +X %.timi<ed P &erPC :=$2X P &erPC ;CP:44:A Xtensa +X %ut 5 the 7 B
Gormali2ed -%& (02. EE(BC IPmark Simulates performance in net)ork routers$ gate)a&s$ and s)itches
Pr cess r Xtensa +X %.timi<ed P &erPC :=$2X Xtensa +X %ut 5 the 7 B P &erPC ;CP:44:A
$91:51
Pr cess r Xtensa +X %.timi<ed Xtensa +X %ut 5 the 7 B P &erPC :=$2X P &erPC ;CP:44:A
#55,:=4 #I$,LI4
nt@
&ere very im.ressed &ith TensilicaMs aut mated a..r ach 5 r " th the .r cess r eBtensi ns and the )enerati n 5 the ass ciated s 5t&are t lsH
ne is di)ital "r adcast ena"led Xtensa .r cess r &as used "ecause it ena"led +2 t Gcut desi)n time si)ni5icantly and "e 5irst t market &ith this eBcitin) ne& techn l )y9H Terrestrial di)ital-multimedia-"r adcast system in K rea
11Tensilica;s announced licensees include Agilent$ ALPS$ A(CC -UGI Corporation.$ Astute Get)orks$ ATI$ Avision$ Ba& (icros&stems$ Berkele& 9ireless *esearch Center$ Broadcom$ Cisco S&stems$ Cone,ant S&stems$ C&press$ Crimson (icros&stems$ ET*I$ =UI IL( (icrodevices$ uBitsu Ltd<$ 0udson Soft$ 0ughes Get)ork S&stems$ Ikanos Communications$ L' Electronics$ (arvell$ GEC La%oratories America$ GEC Corporation$ GetEffect$ Geterion$ Gippon Telephone and Telegraph -GTT.$ G:IDIA$ Dl&mpus Dptical Co< Ltd<$ sci1)or,$ Seiko Epson$ Solid State S&stems$ Son&$ ST(icroelectronics$ Stretch$ TranS)itch Corporation$ and :ictor Compan& of Uapan
Kuesti ns(