Abstract Future single-board multi-socket systems may be unable to deliver the needed memory band... more Abstract Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth electrically due to power limitations, which will hurt their ability to drive performance improvements. Energy efficient off-chip silicon photonics could be used to deliver the needed bandwidth, and it could be extended on-chip to create a relatively flat network topology.
Abstract The work in this paper addresses the need to evaluate the impact of emerging interconnec... more Abstract The work in this paper addresses the need to evaluate the impact of emerging interconnect technologies, such as carbon nanotubes (CNTs), in the context of system applications. The critical properties of CNTs are described in terms of equivalent material parameters such that a general methodology of interconnect sizing can be used.
Abstract Monolithically integrated dense WDM photonic network topologies optimized for loss and p... more Abstract Monolithically integrated dense WDM photonic network topologies optimized for loss and power footprint of optical components can achieve up to 4x better energy-efficiency and throughput than electrical interconnects in core-to-core, and 10x in core-to-DRAM networks.
Abstract Silicon photonics is a promising technology for addressing memory bandwidth limitations ... more Abstract Silicon photonics is a promising technology for addressing memory bandwidth limitations in future many-core processors. This article first introduces a new monolithic silicon-photonic technology, which uses a standard bulk CMOS process to reduce costs and improve energy efficiency, and then explores the logical and physical implications of leveraging this technology in processor-to-memory networks.
Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth ele... more Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth electrically due to power limitations, which will hurt their ability to drive performance improvements. Energy efficient o ff-chip silicon photonics could be used to deliver the needed bandwidth, and it could be extended on-chip to create a relatively flat network topology.
Abstract The ever-increasing number of transistors on a chip has resulted in very large scale int... more Abstract The ever-increasing number of transistors on a chip has resulted in very large scale integration (VLSI) systems whose performance and manufacturing costs are driven by on-chip wiring needs. This paper proposes a low overhead wave-pipelined two-slot time division multiplexed (WP/2-TDM) routing technique that harnesses the inherent intra-clock period wire idleness to implement wire sharing in combination with wave-pipelined circuit techniques.
Abstract In a power and area constrained multicore system, the on-chip communication network need... more Abstract In a power and area constrained multicore system, the on-chip communication network needs to be carefully designed to maximize the system performance and programmer productivity while minimizing energy and area. In this paper, we explore the design of energy-efficient low-diameter networks (flattened butterfly and Clos) using equalized on-chip interconnects.
Abstract Future manycore processors will require energy-efficient, high-throughput on-chip networ... more Abstract Future manycore processors will require energy-efficient, high-throughput on-chip networks. Silicon-photonics is a promising new interconnect technology which offers lower power, higher bandwidth density, and shorter latencies than electrical interconnects. In this paper we explore using photonics to implement low-diameter non-blocking crossbar and Clos networks.
ABSTRACT In the era of gigascale integration, both interconnect technologist and interconnect cir... more ABSTRACT In the era of gigascale integration, both interconnect technologist and interconnect circuit designers must work together closely to ensure that the integrated circuit (IC) industry will overcome current and future interconnect limits on system performance, power dissipation, noise, and cost. This paper will review wave-pipelined interconnect circuits that are used to enhance wire performance and density.
Abstract This paper presents an overview of advances in highly-integrated photonic networks for e... more Abstract This paper presents an overview of advances in highly-integrated photonic networks for emerging many-core processors. It explores the tight interaction among logical and physical implementations of all-to-all core-to-core and core-to-DRAM networks, and underlying photonic devices.
Abstract Manycore systems require energy-efficient on-chip networks that provide high throughput ... more Abstract Manycore systems require energy-efficient on-chip networks that provide high throughput and low latency. The performance of these on-chip networks affects cache access latency and, consequently, system performance. This paper proposes solutions to address the performance limitations related to the use of snoop-based cache coherence protocol on switched network-on-chip (NoC).
Abstract Public-key cryptographic devices are vulnerable to fault-injection attacks. As counterme... more Abstract Public-key cryptographic devices are vulnerable to fault-injection attacks. As countermeasures, a number of secure architectures based on linear and nonlinear error detecting codes were proposed. Linear codes provide protection only against primitive adversaries with limited attack capabilities. On the other hand nonlinear codes provide protection against strong adversaries, but at the price of high area overhead (200%-400%).
Abstract Technology scaling will soon enable high-performance processors with hundreds of cores i... more Abstract Technology scaling will soon enable high-performance processors with hundreds of cores integrated onto a single die, but the success of such systems could be limited by the corresponding chip-level interconnection networks. There have been many recent proposals for nanophotonic interconnection networks that attempt to provide improved performance and energy-efficiency compared to electrical networks.
Abstract Multi-level cell (MLC) nand flash memories are popular storage media because of their po... more Abstract Multi-level cell (MLC) nand flash memories are popular storage media because of their power efficiency and large storage density. Conventional reliable MLC nand flash memories based on BCH codes or Reed-Solomon (RS) codes have a large number of undetectable and miscorrected errors. Moreover, standard decoders for BCH and RS codes cannot be easily modified to correct errors beyond their error correcting capability t=[(d-1/2)], where d is the Hamming distance of the code.
We propose an efficient technique for the detection of errors in cryptographic circuits introduce... more We propose an efficient technique for the detection of errors in cryptographic circuits introduced by strong adversaries. Previously a number of linear and nonlinear error detection schemes were proposed. Linear codes provide protection only against primitive adversaries which no longer represents practice. On the other hand nonlinear codes provide protection against strong adversaries, but at the price of high area overhead (200–300%).
Abstract The active on-chip network channel width has a direct impact on the cache and memory acc... more Abstract The active on-chip network channel width has a direct impact on the cache and memory access latency in manycore processors. A good choice of channel width improves the application performance and energy efficiency. In manycore systems, where workload patterns change significantly over time, setting the network channel width statically for the average or worst-case traffic gives sub-optimal energy efficiency.
Abstract Power consumed by interconnect repeaters is a serious concern for future ICs. Ways to ta... more Abstract Power consumed by interconnect repeaters is a serious concern for future ICs. Ways to tackle this issue such as unique optimization of repeater and logic transistor technologies, improved repeater insertion methods and 3D integration are discussed. These techniques reduce total power of a 22 nm 1.4 GHz low power combinational logic block by 55% with negligible performance and area overheads
Abstract Every new VLSI technology generation has resulted in interconnects increasingly limiting... more Abstract Every new VLSI technology generation has resulted in interconnects increasingly limiting the performance, area, and power dissipation of new processors. Subsequently, it is necessary to devise efficient interconnect design techniques to reduce the impact of VLSI interconnects on overall system design. New optimizations of a wave-pipelined multiplexed (WPM) interconnect routing circuit are described in this paper.
[1] C. Batten, et al, “Building manycore processor to DRAM networks with monolithic CMOS silicon ... more [1] C. Batten, et al, “Building manycore processor to DRAM networks with monolithic CMOS silicon photonics,” IEEE Micro, vol. 29, no. 4, pp. 8-21, 2009. [2] M. Tremblay, S. Chaudhry, “A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor,” IEEE ISSCC, pp. 82-83, 2008. [3] S. Bell et al, “TILE64 Processor: A 64-Core SoC with Mesh Interconnect,” IEEE ISSCC, pp.
Abstract Future single-board multi-socket systems may be unable to deliver the needed memory band... more Abstract Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth electrically due to power limitations, which will hurt their ability to drive performance improvements. Energy efficient off-chip silicon photonics could be used to deliver the needed bandwidth, and it could be extended on-chip to create a relatively flat network topology.
Abstract The work in this paper addresses the need to evaluate the impact of emerging interconnec... more Abstract The work in this paper addresses the need to evaluate the impact of emerging interconnect technologies, such as carbon nanotubes (CNTs), in the context of system applications. The critical properties of CNTs are described in terms of equivalent material parameters such that a general methodology of interconnect sizing can be used.
Abstract Monolithically integrated dense WDM photonic network topologies optimized for loss and p... more Abstract Monolithically integrated dense WDM photonic network topologies optimized for loss and power footprint of optical components can achieve up to 4x better energy-efficiency and throughput than electrical interconnects in core-to-core, and 10x in core-to-DRAM networks.
Abstract Silicon photonics is a promising technology for addressing memory bandwidth limitations ... more Abstract Silicon photonics is a promising technology for addressing memory bandwidth limitations in future many-core processors. This article first introduces a new monolithic silicon-photonic technology, which uses a standard bulk CMOS process to reduce costs and improve energy efficiency, and then explores the logical and physical implications of leveraging this technology in processor-to-memory networks.
Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth ele... more Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth electrically due to power limitations, which will hurt their ability to drive performance improvements. Energy efficient o ff-chip silicon photonics could be used to deliver the needed bandwidth, and it could be extended on-chip to create a relatively flat network topology.
Abstract The ever-increasing number of transistors on a chip has resulted in very large scale int... more Abstract The ever-increasing number of transistors on a chip has resulted in very large scale integration (VLSI) systems whose performance and manufacturing costs are driven by on-chip wiring needs. This paper proposes a low overhead wave-pipelined two-slot time division multiplexed (WP/2-TDM) routing technique that harnesses the inherent intra-clock period wire idleness to implement wire sharing in combination with wave-pipelined circuit techniques.
Abstract In a power and area constrained multicore system, the on-chip communication network need... more Abstract In a power and area constrained multicore system, the on-chip communication network needs to be carefully designed to maximize the system performance and programmer productivity while minimizing energy and area. In this paper, we explore the design of energy-efficient low-diameter networks (flattened butterfly and Clos) using equalized on-chip interconnects.
Abstract Future manycore processors will require energy-efficient, high-throughput on-chip networ... more Abstract Future manycore processors will require energy-efficient, high-throughput on-chip networks. Silicon-photonics is a promising new interconnect technology which offers lower power, higher bandwidth density, and shorter latencies than electrical interconnects. In this paper we explore using photonics to implement low-diameter non-blocking crossbar and Clos networks.
ABSTRACT In the era of gigascale integration, both interconnect technologist and interconnect cir... more ABSTRACT In the era of gigascale integration, both interconnect technologist and interconnect circuit designers must work together closely to ensure that the integrated circuit (IC) industry will overcome current and future interconnect limits on system performance, power dissipation, noise, and cost. This paper will review wave-pipelined interconnect circuits that are used to enhance wire performance and density.
Abstract This paper presents an overview of advances in highly-integrated photonic networks for e... more Abstract This paper presents an overview of advances in highly-integrated photonic networks for emerging many-core processors. It explores the tight interaction among logical and physical implementations of all-to-all core-to-core and core-to-DRAM networks, and underlying photonic devices.
Abstract Manycore systems require energy-efficient on-chip networks that provide high throughput ... more Abstract Manycore systems require energy-efficient on-chip networks that provide high throughput and low latency. The performance of these on-chip networks affects cache access latency and, consequently, system performance. This paper proposes solutions to address the performance limitations related to the use of snoop-based cache coherence protocol on switched network-on-chip (NoC).
Abstract Public-key cryptographic devices are vulnerable to fault-injection attacks. As counterme... more Abstract Public-key cryptographic devices are vulnerable to fault-injection attacks. As countermeasures, a number of secure architectures based on linear and nonlinear error detecting codes were proposed. Linear codes provide protection only against primitive adversaries with limited attack capabilities. On the other hand nonlinear codes provide protection against strong adversaries, but at the price of high area overhead (200%-400%).
Abstract Technology scaling will soon enable high-performance processors with hundreds of cores i... more Abstract Technology scaling will soon enable high-performance processors with hundreds of cores integrated onto a single die, but the success of such systems could be limited by the corresponding chip-level interconnection networks. There have been many recent proposals for nanophotonic interconnection networks that attempt to provide improved performance and energy-efficiency compared to electrical networks.
Abstract Multi-level cell (MLC) nand flash memories are popular storage media because of their po... more Abstract Multi-level cell (MLC) nand flash memories are popular storage media because of their power efficiency and large storage density. Conventional reliable MLC nand flash memories based on BCH codes or Reed-Solomon (RS) codes have a large number of undetectable and miscorrected errors. Moreover, standard decoders for BCH and RS codes cannot be easily modified to correct errors beyond their error correcting capability t=[(d-1/2)], where d is the Hamming distance of the code.
We propose an efficient technique for the detection of errors in cryptographic circuits introduce... more We propose an efficient technique for the detection of errors in cryptographic circuits introduced by strong adversaries. Previously a number of linear and nonlinear error detection schemes were proposed. Linear codes provide protection only against primitive adversaries which no longer represents practice. On the other hand nonlinear codes provide protection against strong adversaries, but at the price of high area overhead (200–300%).
Abstract The active on-chip network channel width has a direct impact on the cache and memory acc... more Abstract The active on-chip network channel width has a direct impact on the cache and memory access latency in manycore processors. A good choice of channel width improves the application performance and energy efficiency. In manycore systems, where workload patterns change significantly over time, setting the network channel width statically for the average or worst-case traffic gives sub-optimal energy efficiency.
Abstract Power consumed by interconnect repeaters is a serious concern for future ICs. Ways to ta... more Abstract Power consumed by interconnect repeaters is a serious concern for future ICs. Ways to tackle this issue such as unique optimization of repeater and logic transistor technologies, improved repeater insertion methods and 3D integration are discussed. These techniques reduce total power of a 22 nm 1.4 GHz low power combinational logic block by 55% with negligible performance and area overheads
Abstract Every new VLSI technology generation has resulted in interconnects increasingly limiting... more Abstract Every new VLSI technology generation has resulted in interconnects increasingly limiting the performance, area, and power dissipation of new processors. Subsequently, it is necessary to devise efficient interconnect design techniques to reduce the impact of VLSI interconnects on overall system design. New optimizations of a wave-pipelined multiplexed (WPM) interconnect routing circuit are described in this paper.
[1] C. Batten, et al, “Building manycore processor to DRAM networks with monolithic CMOS silicon ... more [1] C. Batten, et al, “Building manycore processor to DRAM networks with monolithic CMOS silicon photonics,” IEEE Micro, vol. 29, no. 4, pp. 8-21, 2009. [2] M. Tremblay, S. Chaudhry, “A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor,” IEEE ISSCC, pp. 82-83, 2008. [3] S. Bell et al, “TILE64 Processor: A 64-Core SoC with Mesh Interconnect,” IEEE ISSCC, pp.
Uploads
Papers by Ajay Joshi