Skip to main content

M. Sami

Followers

3

Following

3

Co-authors

2

Public Views

University of Michigan

Cristina Silvano

Politecnico di Milano

Giovanni De Micheli

Giuseppe Desoli

Gustavo de Veciana

Emiliano Piccinelli

Politecnico di Milano

Interests

Uploads

Papers by M. Sami

Software Power Estimation of the LX Core: A Case Study

Power Estimation and Optimization Methodologies for VLIW-Based Embedded Systems

In this chapter we present an application of the introduced energy model‚ to theLx processor‚ a c... more In this chapter we present an application of the introduced energy model‚ to theLx processor‚ a commercial 4-issue VLIW core jointly designed by HPLabs and STMicroelectronics. As we said previously‚ the characterization strategy takes into account several software-level parameters and provides an efficient instruction-level power model based on instruction clustering techniques. The accuracy of the proposed model has been qualified against a industrial gate-level simulation-based power estimation engine. The experimental results show an average error of 1.9% between the instruction-level model and the gate-level model‚ with a standard deviation of 5.8%. We conclude the chapter by showing how the presented power estimation methodology has been succesfully applied to explore the power consumption at the software level by constructing a brand new horizontal execution-scheduling algorithm providing an average energy saving of 12%. Furthermore‚ we show how the proposed model has been extended to provide early power figures and energy/performance trade-offs of a future multi-clusterd architecture of the same target processor.

Exploiting data forwarding to reduce the power budget of VLIW embedded processors

Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001

An instruction-level methodology for power estimation and optimization of embedded VLIW cores

Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition

Low-power data forwarding for VLIW embedded architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2002

In this paper, we propose a low-power approach to the design of embedded very long instruction wo... more In this paper, we propose a low-power approach to the design of embedded very long instruction word (VLIW) processor architectures based on the forwarding (or bypassing) hardware, which provides operands from interstage pipeline registers directly to the inputs of the function units. The power optimization technique exploits the forwarding paths to avoid the power cost of writing/reading short-lived variables to/from the register file (RF). Such optimization is justified by the fact that, in application-specific embedded systems, a significant number of variables are short-lived, that is, their liveness (from first definition to last use) spans only few instructions. Values of short-lived variables can thus be accessed directly through the forwarding registers, avoiding writeback to the RF by the producer instruction and successive read from the RF by the consumer instruction. The decision concerning the enabling of the RF writeback phase is taken at compile time by the compiler static scheduling algorithm. This approach implies a minimal overhead on the complexity of the processor control logic and, thus, no critical path increase. The application of the proposed solution to a VLIW embedded core has shown an average RF power saving of 7.8% with respect to the unoptimized approach on the given set of target benchmarks (patent pending owned by ST Microelectronics).

Testability analysis and behavioral testing of the Hopfield neural paradigm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1998

Testability analysis and test pattern generation for neural architectures can be performed at a v... more Testability analysis and test pattern generation for neural architectures can be performed at a very high abstraction level on the computational paradigm. In this paper, we consider the case of Hopfield's networks, as the simplest example of networks with feedback loops. A behavioral error model based on Finite State Machines is introduced for this neural paradigm, to allow for a good representation of physical faults in widelydifferent implementations. Conditions for controllability, observability and global testability are derived in order to verify whether all modeled errors are excited and propagated to the primary outputs. From the given abstract model, we define the behavioral test pattern generation algorithm which creates the minimum length test sequence for any digital implementation.

Active fault tolerant control for nonlinear systems with simultaneous actuator and sensor faults

International Journal of Control, Automation and Systems, 2013

The goal of this paper is to describe a novel fault tolerant tracking control (FTTC) strategy bas... more The goal of this paper is to describe a novel fault tolerant tracking control (FTTC) strategy based on robust fault estimation and compensation of simultaneous actuator and sensor faults. Within the framework of fault tolerant control (FTC) the challenge is to develop an FTTC design strategy for nonlinear systems to tolerate simultaneous actuator and sensor faults that have bounded first time derivatives. The main contribution of this paper is the proposal of a new architecture based on a combination of actuator and sensor Takagi-Sugeno (T-S) proportional state estimators augmented with proportional and integral feedback (PPI) fault estimators together with a T-S dynamic output feedback control (TSDOFC) capable of time-varying reference tracking. Within this architecture the design freedom for each of the T-S estimators and the control system are available separately with an important consequence on robust L 2 norm fault estimation and robust L 2 norm closed-loop tracking performance. The FTTC strategy is illustrated using a nonlinear inverted pendulum example with time-varying tracking of a moving linear position reference.

A Digital Front-end Readout Mlcrosystem For Calorimeters At LHC

1993 IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference

The Front-End Read-out MIcrosystem (FERMI) for calorimeters at LHC presented last year has been f... more The Front-End Read-out MIcrosystem (FERMI) for calorimeters at LHC presented last year has been further developed with the aim to achieve a full silicon implementation early 1994. Each microsystem will, as before, contain 9 channels with 15-16 bits dynamic range using IO-bit AD converters sampled every 15 ns. The function is accomplished by using a non-linear amplifier in front of the ADC to compress high amplitude signals, reversing the transformation to obtain overall linearity in a look-up table after the digitisation. The direct first-level trigger output for digitally filtered energy data is equipped with pulse recognition capability. The main data flow enters a pipeline memory from which relevant portions are extracted to second and third-level triggers via an adaptive 7-tap digital filter. The module is presently developed as 14 ASICs to be mounted on a silicon substrate (Multi-chip module, MCM). Circuit solutions to all analogue parts have already been implemented, tested and found to conform to specifications. The first digital ASIC will shortly be sent to fabrication. Fault tolerance strategies, i.e. redundancy, reconfigurability, concurrent processing and error corrections have been defined and integrated into the design.

A high-level synthesis approach to optimum design of self-checking circuits

Proceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition

We present an innovative solution to design of selfchecking systems implementing arithmetic algor... more We present an innovative solution to design of selfchecking systems implementing arithmetic algorithms. Rather than substituting self-checking units in system synthesized independently of self-checking requirements, we introduce self-checking in high-level synthesis as a requirement already for scheduling the DFG. Rules granting error detection allow optimum partitioning of the DFG; minimum-latency, resource-constrained scheduling is performed with the support of such partitioning so as to optimize the number of checkers as well as that of other resources.

Energy estimation and optimization of embedded VLIW processors based on instruction clustering

Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324), 2002

Aim of this paper is to propose a methodology for the definition of an instruction-level energy e... more Aim of this paper is to propose a methodology for the definition of an instruction-level energy estimation framework for VLIW (Very Long Instruction Word) processors. The power modeling methodology is the key issue to define an effective energy-aware software optimisation strategy for stateof-the-art ILP (Instruction Level Parallelism) processors. The methodology is based on an energy model for VLIW processors that exploits instruction clustering to achieve an efficient and fine grained energy estimation. The approach aims at reducing the complexity of the characterization problem for VLIW processors from exponential, with respect to the number of parallel operations in the same very long instruction, to quadratic, with respect to the number of instruction clusters. Furthermore, the paper proposes a spatial scheduling algorithm based on a low-power reordering of the parallel operations within the same long instruction. Experimental results have been carried out on the Lx processor, a 4-issue VLIW core jointly designed by HPLabs and STMicroelectronics. The results have shown an average error of 1.9% between the cluster-based estimation model and the reference design, with a standard deviation of 5.8%. For the Lx architecture, the spatial instruction scheduling algorithm provides an average energy saving of 12%.

Security Aspects in Networks-on-Chips: Overview and Proposals for Secure Implementations

10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007), 2007

Security has gained increasing relevance in the development of embedded devices. Towards the aim ... more Security has gained increasing relevance in the development of embedded devices. Towards the aim of a secure system at each level of the design, in this paper we address the security aspects related to Networks-on-Chips (NoCs) architectures. After presenting the attacks that we believe are more likely to address NoCs, we survey existing academic and industrial secure architectures relevant to our case, focusing in particular on their communication infrastructure. We outline and propose possible solutions to contrast some of the attacks described and suggest the use of the NoC as mean to monitor and detect unexpected system behaviors 1 .

Power-aware branch prediction techniques

Proceedins of the 14th ACM Great Lakes symposium on VLSI - GLSVLSI '04, 2004

Main goal of the paper is introducing a dynamic branch prediction scheme suitable for energy-awar... more Main goal of the paper is introducing a dynamic branch prediction scheme suitable for energy-aware VLIW (Very Long Instruction Word) processors. The proposed technique is based on a compiler hint mechanism to filter the accesses to the branch predictor blocks. Experimental results have been carried out on Lx/ST200, an industrial 4-issue VLIW architecture. We gathered two sets of results: First, by introducing the proposed low-power branch prediction technique in the Lx processor, which features fully static branch prediction, a significant improvement of the energy-delay metric has been observed. Second, we evaluated filtering efficacy of the proposed method and we found that it gets an access reduction to the branch prediction unit of 93% with respect to a processor directly derived from Lx, featuring cycle-by-cycle prediction, corresponding to an average 9% energy reduction of the whole processor power budget.

An instruction-level energy model for embedded VLIW architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2002

In this paper, an instruction-level energy model is proposed for the data-path of very long instr... more In this paper, an instruction-level energy model is proposed for the data-path of very long instruction word (VLIW) pipelined processors that can be used to provide accurate power consumption information during either an instruction-level simulation or power-oriented scheduling at compile time. The analytical model takes into account several software-level parameters (such as instruction ordering, pipeline stall probability, and instruction cache miss probability) as well as microarchitectural-level ones (such as pipeline stage power consumption per instruction) providing an efficient pipeline-aware instruction-level power estimation, whose accuracy is very close to those given by RT or gate-level simulations. The problem of instruction-level power characterization of a-issue VLIW processor is (2) where is the number of operations in the ISA and is the number of parallel instructions composing the very long instruction. One of the advantages of the proposed model consists of reducing the complexity of the characterization problem to (2). The proposed model has been used to characterize a four-issue VLIW core with a six-stage pipeline, and its accuracy and efficiency has been compared with respect to energy estimates derived by gate-level simulation. Experimental results (carried out on a set of embedded DSP benchmarks) have demonstrated an average error in accuracy of 4.8% of the instruction-level estimation engine with respect to the gate-level engine. The average simulation speed-up of the instruction-level power estimation engine with respect to the gate-level engine is of four orders of magnitude approximately.

Sensitivity to errors in artificial neural networks: a behavioral approach

IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 1995

Abstract The problem of sensitivity to errors in artificial neural networks is discussed here con... more

Reducing the complexity of instruction-level power models for VLIW processors

Design Automation for Embedded Systems, 2005

FERMI: a digital Front End and Readout MIcrosystem for high resolution calorimetry

Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 1995

Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

The paper introduces a dynamic branch prediction scheme suitable for energy-aware Very Long Instr... more The paper introduces a dynamic branch prediction scheme suitable for energy-aware Very Long Instruction Word (VLIW) processors. The proposed technique is based on a compiler hint mechanism to filter the accesses to the branch predictor blocks. We define a configurable hint instruction which anticipates some static information about the upcoming branch to reduce the hardware involved in the prediction, thus, the energy consumption. To analyze the effectiveness of the proposed low-power branch prediction scheme, we combined it with some well-known dynamic branch prediction techniques suitable for VLIW processors. The analyzed branch predictors are characterized by simple hardware implementations, matching the low-power characteristics of the target VLIW processors. Experimental results have been carried out on Lx, an industrial 4-issue VLIW architecture.

Designing for yield: a defect-tolerant approach to high-level synthesis

by Giacomo Buonanno and M. Sami

Proceedings 1998 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (Cat. No.98EX223), 1998

Defect-tolerant techniques can be effectively applied to regular structures which allow a very si... more Defect-tolerant techniques can be effectively applied to regular structures which allow a very simple reconfiguration technique. A typical example is represented by memories, where algorithms for row and column elimination grant very good results with a limited area overhead (namely, a limited number of spare rows and columns). The reconfiguration technologies developed for memories could be applied to other devices

Design of Fault Tolerant Network Interfaces for NoCs

by Laura Micconi and M. Sami

2011 14th Euromicro Conference on Digital System Design, 2011

As the complexity of designs increases and technology scales down into the deep-submicron domain,... more As the complexity of designs increases and technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the networks-on-chip (NoCs) components increases. In this work, we focus on the study and evaluation of techniques for increasing reliability and resilience of network interfaces (NIs) within NoC-based multiprocessor system-on-chip architectures. NIs act as interfaces between intellectual property cores and the communication infrastructure; the faulty behavior of one of them could affect, therefore, the overall system. In this work, we propose a functional fault model for the NI components by evaluating their susceptibility to faults. We present a two-level fault-tolerant solution that can be employed for mitigating the effects of both permanent and temporary faults in the NI. Experimental simulations show that with a limited overhead, we can obtain an NI reliability comparable to the one obtainable by implementing the system by using standard triple modular redundancy techniques, while saving up to 48 percent in area, as well as obtaining a significant energy reduction.

Human TCR-Binding Affinity is Governed by MHC Class Restriction

The Journal of Immunology, 2007

T cell recognition is initiated by the binding of TCRs to peptide-MHCs (pMHCs), the interaction b... more T cell recognition is initiated by the binding of TCRs to peptide-MHCs (pMHCs), the interaction being characterized by weak affinity and fast kinetics. Previously, only 16 natural TCR/pMHC interactions have been measured by surface plasmon resonance (SPR). Of these, 5 are murine class I, 5 are murine class II, and 6 are human class I-restricted responses. Therefore, a significant gap exists in our understanding of human TCR/pMHC binding due to the limited SPR data currently available for human class I responses and the absence of SPR data for human class II-restricted responses. We have produced a panel of soluble TCR molecules originating from human T cells that respond to naturally occurring disease epitopes and their cognate pMHCs. In this study, we compare the binding affinity and kinetics of eight class-I-specific TCRs (TCR-Is) to pMHC-I with six class-II-specific TCRs (TCR-IIs) to pMHC-II using SPR. Overall, there is a substantial difference in the TCR-binding equilibrium constants for pMHC-I and pMHC-II, which arises from significantly faster on-rates for TCRs binding to pMHC-I. In contrast, the off-rates for all human TCR/pMHC interactions fall within a narrow window regardless of class restriction, thereby providing experimental support for the notion that binding half-life is the principal kinetic feature controlling T cell activation.

New trends in intelligent system design for embedded and measurement applications

IEEE Instrumentation & Measurement Magazine, 1999

Applications ntelligent systems adopt soft-computing techniques (encompassing neural networks, fu... more Applications ntelligent systems adopt soft-computing techniques (encompassing neural networks, fuzzy logic, genetic algorithms, and expert systems) to solve complex problems by mimicking human reasoning. On the other hand, conventional algorithmic approaches are extremely powerful and efficient in fackling applications for which a procedural solutlon can be easily defined. By themselves, each of these techniques may be the optimal solution for a subproblem, but not efficient enough to solve the problem as a whole. Composite systems, consisting of conventional and soft-computing components in cooperation, are now more than a promise to face complex application needs. In this article we present recent advances in the design of composite systems, with specific reference to embedded and measurement applications.

Software Power Estimation of the LX Core: A Case Study

Power Estimation and Optimization Methodologies for VLIW-Based Embedded Systems

In this chapter we present an application of the introduced energy model‚ to theLx processor‚ a c... more In this chapter we present an application of the introduced energy model‚ to theLx processor‚ a commercial 4-issue VLIW core jointly designed by HPLabs and STMicroelectronics. As we said previously‚ the characterization strategy takes into account several software-level parameters and provides an efficient instruction-level power model based on instruction clustering techniques. The accuracy of the proposed model has been qualified against a industrial gate-level simulation-based power estimation engine. The experimental results show an average error of 1.9% between the instruction-level model and the gate-level model‚ with a standard deviation of 5.8%. We conclude the chapter by showing how the presented power estimation methodology has been succesfully applied to explore the power consumption at the software level by constructing a brand new horizontal execution-scheduling algorithm providing an average energy saving of 12%. Furthermore‚ we show how the proposed model has been extended to provide early power figures and energy/performance trade-offs of a future multi-clusterd architecture of the same target processor.

Exploiting data forwarding to reduce the power budget of VLIW embedded processors

Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001

An instruction-level methodology for power estimation and optimization of embedded VLIW cores

Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition

Low-power data forwarding for VLIW embedded architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2002

In this paper, we propose a low-power approach to the design of embedded very long instruction wo... more In this paper, we propose a low-power approach to the design of embedded very long instruction word (VLIW) processor architectures based on the forwarding (or bypassing) hardware, which provides operands from interstage pipeline registers directly to the inputs of the function units. The power optimization technique exploits the forwarding paths to avoid the power cost of writing/reading short-lived variables to/from the register file (RF). Such optimization is justified by the fact that, in application-specific embedded systems, a significant number of variables are short-lived, that is, their liveness (from first definition to last use) spans only few instructions. Values of short-lived variables can thus be accessed directly through the forwarding registers, avoiding writeback to the RF by the producer instruction and successive read from the RF by the consumer instruction. The decision concerning the enabling of the RF writeback phase is taken at compile time by the compiler static scheduling algorithm. This approach implies a minimal overhead on the complexity of the processor control logic and, thus, no critical path increase. The application of the proposed solution to a VLIW embedded core has shown an average RF power saving of 7.8% with respect to the unoptimized approach on the given set of target benchmarks (patent pending owned by ST Microelectronics).

Testability analysis and behavioral testing of the Hopfield neural paradigm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1998

Testability analysis and test pattern generation for neural architectures can be performed at a v... more Testability analysis and test pattern generation for neural architectures can be performed at a very high abstraction level on the computational paradigm. In this paper, we consider the case of Hopfield's networks, as the simplest example of networks with feedback loops. A behavioral error model based on Finite State Machines is introduced for this neural paradigm, to allow for a good representation of physical faults in widelydifferent implementations. Conditions for controllability, observability and global testability are derived in order to verify whether all modeled errors are excited and propagated to the primary outputs. From the given abstract model, we define the behavioral test pattern generation algorithm which creates the minimum length test sequence for any digital implementation.

Active fault tolerant control for nonlinear systems with simultaneous actuator and sensor faults

International Journal of Control, Automation and Systems, 2013

The goal of this paper is to describe a novel fault tolerant tracking control (FTTC) strategy bas... more The goal of this paper is to describe a novel fault tolerant tracking control (FTTC) strategy based on robust fault estimation and compensation of simultaneous actuator and sensor faults. Within the framework of fault tolerant control (FTC) the challenge is to develop an FTTC design strategy for nonlinear systems to tolerate simultaneous actuator and sensor faults that have bounded first time derivatives. The main contribution of this paper is the proposal of a new architecture based on a combination of actuator and sensor Takagi-Sugeno (T-S) proportional state estimators augmented with proportional and integral feedback (PPI) fault estimators together with a T-S dynamic output feedback control (TSDOFC) capable of time-varying reference tracking. Within this architecture the design freedom for each of the T-S estimators and the control system are available separately with an important consequence on robust L 2 norm fault estimation and robust L 2 norm closed-loop tracking performance. The FTTC strategy is illustrated using a nonlinear inverted pendulum example with time-varying tracking of a moving linear position reference.

A Digital Front-end Readout Mlcrosystem For Calorimeters At LHC

1993 IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference

The Front-End Read-out MIcrosystem (FERMI) for calorimeters at LHC presented last year has been f... more The Front-End Read-out MIcrosystem (FERMI) for calorimeters at LHC presented last year has been further developed with the aim to achieve a full silicon implementation early 1994. Each microsystem will, as before, contain 9 channels with 15-16 bits dynamic range using IO-bit AD converters sampled every 15 ns. The function is accomplished by using a non-linear amplifier in front of the ADC to compress high amplitude signals, reversing the transformation to obtain overall linearity in a look-up table after the digitisation. The direct first-level trigger output for digitally filtered energy data is equipped with pulse recognition capability. The main data flow enters a pipeline memory from which relevant portions are extracted to second and third-level triggers via an adaptive 7-tap digital filter. The module is presently developed as 14 ASICs to be mounted on a silicon substrate (Multi-chip module, MCM). Circuit solutions to all analogue parts have already been implemented, tested and found to conform to specifications. The first digital ASIC will shortly be sent to fabrication. Fault tolerance strategies, i.e. redundancy, reconfigurability, concurrent processing and error corrections have been defined and integrated into the design.

A high-level synthesis approach to optimum design of self-checking circuits

Proceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition

We present an innovative solution to design of selfchecking systems implementing arithmetic algor... more We present an innovative solution to design of selfchecking systems implementing arithmetic algorithms. Rather than substituting self-checking units in system synthesized independently of self-checking requirements, we introduce self-checking in high-level synthesis as a requirement already for scheduling the DFG. Rules granting error detection allow optimum partitioning of the DFG; minimum-latency, resource-constrained scheduling is performed with the support of such partitioning so as to optimize the number of checkers as well as that of other resources.

Energy estimation and optimization of embedded VLIW processors based on instruction clustering

Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324), 2002

Aim of this paper is to propose a methodology for the definition of an instruction-level energy e... more Aim of this paper is to propose a methodology for the definition of an instruction-level energy estimation framework for VLIW (Very Long Instruction Word) processors. The power modeling methodology is the key issue to define an effective energy-aware software optimisation strategy for stateof-the-art ILP (Instruction Level Parallelism) processors. The methodology is based on an energy model for VLIW processors that exploits instruction clustering to achieve an efficient and fine grained energy estimation. The approach aims at reducing the complexity of the characterization problem for VLIW processors from exponential, with respect to the number of parallel operations in the same very long instruction, to quadratic, with respect to the number of instruction clusters. Furthermore, the paper proposes a spatial scheduling algorithm based on a low-power reordering of the parallel operations within the same long instruction. Experimental results have been carried out on the Lx processor, a 4-issue VLIW core jointly designed by HPLabs and STMicroelectronics. The results have shown an average error of 1.9% between the cluster-based estimation model and the reference design, with a standard deviation of 5.8%. For the Lx architecture, the spatial instruction scheduling algorithm provides an average energy saving of 12%.

Security Aspects in Networks-on-Chips: Overview and Proposals for Secure Implementations

10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007), 2007

Security has gained increasing relevance in the development of embedded devices. Towards the aim ... more Security has gained increasing relevance in the development of embedded devices. Towards the aim of a secure system at each level of the design, in this paper we address the security aspects related to Networks-on-Chips (NoCs) architectures. After presenting the attacks that we believe are more likely to address NoCs, we survey existing academic and industrial secure architectures relevant to our case, focusing in particular on their communication infrastructure. We outline and propose possible solutions to contrast some of the attacks described and suggest the use of the NoC as mean to monitor and detect unexpected system behaviors 1 .

Power-aware branch prediction techniques

Proceedins of the 14th ACM Great Lakes symposium on VLSI - GLSVLSI '04, 2004

Main goal of the paper is introducing a dynamic branch prediction scheme suitable for energy-awar... more Main goal of the paper is introducing a dynamic branch prediction scheme suitable for energy-aware VLIW (Very Long Instruction Word) processors. The proposed technique is based on a compiler hint mechanism to filter the accesses to the branch predictor blocks. Experimental results have been carried out on Lx/ST200, an industrial 4-issue VLIW architecture. We gathered two sets of results: First, by introducing the proposed low-power branch prediction technique in the Lx processor, which features fully static branch prediction, a significant improvement of the energy-delay metric has been observed. Second, we evaluated filtering efficacy of the proposed method and we found that it gets an access reduction to the branch prediction unit of 93% with respect to a processor directly derived from Lx, featuring cycle-by-cycle prediction, corresponding to an average 9% energy reduction of the whole processor power budget.

An instruction-level energy model for embedded VLIW architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2002

In this paper, an instruction-level energy model is proposed for the data-path of very long instr... more In this paper, an instruction-level energy model is proposed for the data-path of very long instruction word (VLIW) pipelined processors that can be used to provide accurate power consumption information during either an instruction-level simulation or power-oriented scheduling at compile time. The analytical model takes into account several software-level parameters (such as instruction ordering, pipeline stall probability, and instruction cache miss probability) as well as microarchitectural-level ones (such as pipeline stage power consumption per instruction) providing an efficient pipeline-aware instruction-level power estimation, whose accuracy is very close to those given by RT or gate-level simulations. The problem of instruction-level power characterization of a-issue VLIW processor is (2) where is the number of operations in the ISA and is the number of parallel instructions composing the very long instruction. One of the advantages of the proposed model consists of reducing the complexity of the characterization problem to (2). The proposed model has been used to characterize a four-issue VLIW core with a six-stage pipeline, and its accuracy and efficiency has been compared with respect to energy estimates derived by gate-level simulation. Experimental results (carried out on a set of embedded DSP benchmarks) have demonstrated an average error in accuracy of 4.8% of the instruction-level estimation engine with respect to the gate-level engine. The average simulation speed-up of the instruction-level power estimation engine with respect to the gate-level engine is of four orders of magnitude approximately.

Sensitivity to errors in artificial neural networks: a behavioral approach

IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 1995

Abstract The problem of sensitivity to errors in artificial neural networks is discussed here con... more

Reducing the complexity of instruction-level power models for VLIW processors

Design Automation for Embedded Systems, 2005

FERMI: a digital Front End and Readout MIcrosystem for high resolution calorimetry

Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 1995

Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

The paper introduces a dynamic branch prediction scheme suitable for energy-aware Very Long Instr... more The paper introduces a dynamic branch prediction scheme suitable for energy-aware Very Long Instruction Word (VLIW) processors. The proposed technique is based on a compiler hint mechanism to filter the accesses to the branch predictor blocks. We define a configurable hint instruction which anticipates some static information about the upcoming branch to reduce the hardware involved in the prediction, thus, the energy consumption. To analyze the effectiveness of the proposed low-power branch prediction scheme, we combined it with some well-known dynamic branch prediction techniques suitable for VLIW processors. The analyzed branch predictors are characterized by simple hardware implementations, matching the low-power characteristics of the target VLIW processors. Experimental results have been carried out on Lx, an industrial 4-issue VLIW architecture.

Designing for yield: a defect-tolerant approach to high-level synthesis

by Giacomo Buonanno and M. Sami

Proceedings 1998 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (Cat. No.98EX223), 1998

Defect-tolerant techniques can be effectively applied to regular structures which allow a very si... more Defect-tolerant techniques can be effectively applied to regular structures which allow a very simple reconfiguration technique. A typical example is represented by memories, where algorithms for row and column elimination grant very good results with a limited area overhead (namely, a limited number of spare rows and columns). The reconfiguration technologies developed for memories could be applied to other devices

Design of Fault Tolerant Network Interfaces for NoCs

by Laura Micconi and M. Sami

2011 14th Euromicro Conference on Digital System Design, 2011

As the complexity of designs increases and technology scales down into the deep-submicron domain,... more As the complexity of designs increases and technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the networks-on-chip (NoCs) components increases. In this work, we focus on the study and evaluation of techniques for increasing reliability and resilience of network interfaces (NIs) within NoC-based multiprocessor system-on-chip architectures. NIs act as interfaces between intellectual property cores and the communication infrastructure; the faulty behavior of one of them could affect, therefore, the overall system. In this work, we propose a functional fault model for the NI components by evaluating their susceptibility to faults. We present a two-level fault-tolerant solution that can be employed for mitigating the effects of both permanent and temporary faults in the NI. Experimental simulations show that with a limited overhead, we can obtain an NI reliability comparable to the one obtainable by implementing the system by using standard triple modular redundancy techniques, while saving up to 48 percent in area, as well as obtaining a significant energy reduction.

Human TCR-Binding Affinity is Governed by MHC Class Restriction

The Journal of Immunology, 2007

T cell recognition is initiated by the binding of TCRs to peptide-MHCs (pMHCs), the interaction b... more T cell recognition is initiated by the binding of TCRs to peptide-MHCs (pMHCs), the interaction being characterized by weak affinity and fast kinetics. Previously, only 16 natural TCR/pMHC interactions have been measured by surface plasmon resonance (SPR). Of these, 5 are murine class I, 5 are murine class II, and 6 are human class I-restricted responses. Therefore, a significant gap exists in our understanding of human TCR/pMHC binding due to the limited SPR data currently available for human class I responses and the absence of SPR data for human class II-restricted responses. We have produced a panel of soluble TCR molecules originating from human T cells that respond to naturally occurring disease epitopes and their cognate pMHCs. In this study, we compare the binding affinity and kinetics of eight class-I-specific TCRs (TCR-Is) to pMHC-I with six class-II-specific TCRs (TCR-IIs) to pMHC-II using SPR. Overall, there is a substantial difference in the TCR-binding equilibrium constants for pMHC-I and pMHC-II, which arises from significantly faster on-rates for TCRs binding to pMHC-I. In contrast, the off-rates for all human TCR/pMHC interactions fall within a narrow window regardless of class restriction, thereby providing experimental support for the notion that binding half-life is the principal kinetic feature controlling T cell activation.

New trends in intelligent system design for embedded and measurement applications

IEEE Instrumentation & Measurement Magazine, 1999

Applications ntelligent systems adopt soft-computing techniques (encompassing neural networks, fu... more Applications ntelligent systems adopt soft-computing techniques (encompassing neural networks, fuzzy logic, genetic algorithms, and expert systems) to solve complex problems by mimicking human reasoning. On the other hand, conventional algorithmic approaches are extremely powerful and efficient in fackling applications for which a procedural solutlon can be easily defined. By themselves, each of these techniques may be the optimal solution for a subproblem, but not efficient enough to solve the problem as a whole. Composite systems, consisting of conventional and soft-computing components in cooperation, are now more than a promise to face complex application needs. In this article we present recent advances in the design of composite systems, with specific reference to embedded and measurement applications.