Skip to main content

Pieter Van Der Wolf

Followers

11

Following

5

Co-authors

5

Public Views

Vrije Universiteit Brussel

David Pierre Leibovitz

Carleton University

Simon Fraser University

Kansas State University

Kirsten Jacobson

University of Maine

Hacettepe University

Francesco Ruotolo

Università della Campania Luigi Vanvitelli

Fusun Balik Sanli

Yildiz Technical University

Viacheslav Kuleshov

Stockholm University

Dragos Simandan

Brock University

Interests

Uploads

Papers by Pieter Van Der Wolf

Network Calculus Applied to Verification of Memory Access Performance in SoCs

2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2007

... Tomas Henriksson and Pieter van der Wolf NXP Semiconductors Research Eindhoven, The Netherlan... more

Modular SoC Integration with Subsystems: The Audio Subsystem Case

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, 2013

ABSTRACT

TTL Hardware Interface: A High-Level Interface for Streaming Multiprocessor Architectures

Embedded Systems for Real-Time Multimedia, 2006

Digital chips for multimedia applications use function-specific hardware co-processors to achieve... more Digital chips for multimedia applications use function-specific hardware co-processors to achieve high performance at low power consumption. These co-processors are typically equipped with traditional address-based interfaces. Networks-on-chips (NoCs) are emerging as scalable interconnect for advanced digital chips. Integration of co-processors with NoCs requires load/store packetizing wrappers on the network interfaces. This leads to unnecessary address generation and address transportation over

Transparent Embedded Compression in Systems-on-Chip

by René van der Vleuten and Pieter Van Der Wolf

2006 IEEE Workshop on Signal Processing Systems Design and Implementation, 2006

Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media p... more Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media processing. We apply embedded compression for bandwidth-hungry image processing functions in order to alleviate this bandwidth bottleneck. In our solution embedded compression is implemented as part of the system-on-chip infrastructure, fully transparent for the hardware and software image processing components. Hence it can be applied without

Industrial IP integration flows based on IP-XACT#8482; standards

by Emmanuel Vaumorin, Pieter Van Der Wolf, and Wolfgang Ecker

Proceedings of the conference on Design, automation and test in Europe - DATE '08, 2008

Abstract Effective integration of advanced systems-on-chip (SoC) requires extensive reuse of IP m... more

Design and programming of embedded multiprocessors

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '04, 2004

We present design technology for the structured design and programming of embedded multi-processo... more We present design technology for the structured design and programming of embedded multi-processor systems. It comprises a task-level interface that can be used both for developing parallel application models and as a platform interface for implementing applications on multi-processor architectures. Associated mapping technology supports refinement of application models towards implementation. By linking application development and implementation aspects, the technology integrates the specification and design phases in the MPSoC design process. Two design cases demonstrate the efficient implementation of the platform interface on different architectures. Industry-wide standardization of a task-level interface can facilitate reuse of function-specific hardware / software modules across companies.

Design space exploration of streaming multiprocessor architectures

IEEE Workshop on Signal Processing Systems, 2002

In this paper, we present a comparison of two design-space exploration approaches. The comparison... more

Transparent Embedded Compression in Systems-on-Chip

2006 IEEE Workshop on Signal Processing Systems Design and Implementation, 2006

Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media p... more Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media processing. We apply embedded compression for bandwidth-hungry image processing functions in order to alleviate this bandwidth bottleneck. In our solution embedded compression is implemented as part of the system-on-chip infrastructure, fully transparent for the hardware and software image processing components. Hence it can be applied without

A Methodology to Design Programmable Embedded Systems

Lecture Notes in Computer Science, 2002

Embedded systems architectures are increasingly becoming programmable, which means that an archit... more Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

An approach for quantitative analysis of application-specific dataflow architectures

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors, 1997

In this paper we present an approach for quantitative analysis of application-specific dataflow a... more In this paper we present an approach for quantitative analysis of application-specific dataflow architectures. The approach allows the designer to rate design alternatives in a quantitative way and therefore supports him in the design process to find better performing architectures. The context of our work is Video Signal Processing algorithms which are mapped onto weakly-programmable, coarse-grain dataflow architectures. The algorithms are represented as Kahn graphs with the functionality of the nodes being coarse-grain functions. We have implemented an architecture simulation environment that permits the definition of dataflow architectures as a composition of architecture elements, such as functional units, buffer elements and communication structures. The abstract, clockcycle accurate simulator has been built using a multi-threading package and employs object oriented principles. This results in a configurable and efficient simulator. Algorithms can subsequently be executed on the architecture model producing quantitative information for selected performance metrics. Results are presented for the simulation of a realistic application on several dataflow architecture alternatives, showing that many different architectures can be simulated in modest time on a modern workstation. communication structure.

The construction of a retargetable simulator for an architecture template

Proceedings of the sixth international workshop on Hardware/software codesign - CODES/CASHE '98, 1998

Systems in the domain of high-performance video signal processing are becoming more and more prog... more Systems in the domain of high-performance video signal processing are becoming more and more programmable. We suggest an approach to design such systems that involves measuring, via simulation, the performance of various architectures on which a set of applications are mapped. This approach requires a retargetable simulator for an architecture template. We describe the retargetable simulator that we constructed for a stream-oriented application-specific dataflow architecture. For each architecture instance of the architecture template, a specific simulator is derived in three steps: the architecture instance is constructed, an execution model is added, and the executable architecture is instrumented to obtain performance numbers. We used object oriented principles together with a high-level simulation mechanism to ensure retargetability and an efficient simulation speed. Finally, we explain how a retargetable simulator can be encapsulated within an environment for automated design space exploration.

TriMedia CPU64 application development environment

Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040), 1999

The architecture of the TriMedia CPU64 is based on the TM1000 DSPCPU. The original VLIW architect... more The architecture of the TriMedia CPU64 is based on the TM1000 DSPCPU. The original VLIW architecture has been extended with the concepts of vector processing and superoperations. The new vector operations and superoperations need to be supported by the compiler and simulator to make them accessible to application programmers. It was our intention to support these new features while remaining compliant with the ANSI C standard. This paper describes the mechanisms which were implemented to achieve this goal. Furthermore, the optimization of applications needs to address the vectorization of the functions to be implemented. Some general guidelines for producing efficient vectorized code are given.

TriMedia CPU64 architecture

Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040), 1999

We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for e... more We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for embedded use in media-processing devices like DTVs and set-top boxes. Intended as a core, its design must be supplemented with on-chip co-processors to obtain a cost-effective system. Good performance is obtained through a uniform 64-bit 5 issue-slot VLIW design, supporting subword parallelism with an extensive instruction set optimized with respect to media-processing. Multi-slot 'super-ops' allow powerful multi-argument and multi-result operations. As an example, an IDCT algorithm shows a very low instruction count in comparison with other processors. To achieve good performance, critical sections in the application program source code need to be rewritten with vector data types and function calls for media operations. Benchmarking with several media applications was used to tune the instruction set and study cache behavior. This resulted in a VLIW architecture with wide data paths and relatively simple cpu control.

A Methodology to Design Programmable Embedded Systems - The Y-Chart Approach

Systems, Architectures, Modeling, and Simulation, 2002

Embedded systems architectures are increasingly becoming programmable, which means that an archit... more Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

YAPI: application modeling for signal processing systems

Proceedings 37th Design Automation Conference, 2000

We present a programming interface called YAPI to model signal processing applications as process... more We present a programming interface called YAPI to model signal processing applications as process networks. The purpose of YAPI is to enable the reuse of signal processing applications and the mapping of signal processing applications onto heterogeneous systems that contain hardware and software components. To this end, YAPI separates the concerns of the application programmer, who determines the functionality of the system, and the system designer, who determines the implementation of the functionality. The proposed model of computation extends the existing model of Kahn process networks with channel selection to support non-deterministic events. We provide an efficient implementation of YAPI in the form of a C++ run-time library to execute the applications on a workstation. Subsequently, the applications are used by the system designer as input for mapping and performance analysis in the design of complex signal processing systems. We evaluate this methodology on the design of a digital video broadcast system-on-chip.

A methodology for architecture exploration of heterogeneous signal processing systems

1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461), 1999

We present a methodology for the exploration of signal processing architectures at the system lev... more We present a methodology for the exploration of signal processing architectures at the system level. The methodology, named SPADE, provides a means to quickly build models of architectures at an abstract level, to easily map applications, modeled as Kahn Process Networks, onto these architecture models, and to analyze the performance of the resulting system by simulation. The methodology distinguishes between

System level design with SPADE: an M-JPEG case study

IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281), 2001

Abstract In this paper we present and evaluate the SPADE (System level Performance Analysis and D... more

A trace transformation technique for communication refinement

Proceedings of the ninth international symposium on Hardware/software codesign - CODES '01, 2001

Models of computation like Kahn and dataflow process networks provide convenient means for modeli... more Models of computation like Kahn and dataflow process networks provide convenient means for modeling signal processing applications. This is partly due to the abstract primitives that these models offer for communication between concurrent processes. However, when mapping an application model onto an architecture, these primitives need to be mapped onto architecture level communication primitives. We present a trace transformation technique

An MPEG-2 decoder case study as a driver for a system level design methodology

Proceedings of the seventh international workshop on Hardware/software codesign - CODES '99, 1999

... The Thdr pro-cess is aware of the high level bitstream organization and distributes the retri... more

Eclipse: heterogeneous multiprocessor architecture for flexible media processing

Proceedings 16th International Parallel and Distributed Processing Symposium, 2002

Eclipse is a heterogeneous multiprocessor architecture for high-performance media processing, inc... more Eclipse is a heterogeneous multiprocessor architecture for high-performance media processing, including high- definition MPEG encoding/decoding. The scalable archi- tecture framework concurrently executes media process- ing kernels in function-specific multi-tasking coprocessors and a media processor, communicating via on-chip mem- ory. Eclipse instances combine application configuration flexibility with the efficiency of function-specific hard- ware.

Network Calculus Applied to Verification of Memory Access Performance in SoCs

2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2007

... Tomas Henriksson and Pieter van der Wolf NXP Semiconductors Research Eindhoven, The Netherlan... more

Modular SoC Integration with Subsystems: The Audio Subsystem Case

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, 2013

ABSTRACT

TTL Hardware Interface: A High-Level Interface for Streaming Multiprocessor Architectures

Embedded Systems for Real-Time Multimedia, 2006

Digital chips for multimedia applications use function-specific hardware co-processors to achieve... more Digital chips for multimedia applications use function-specific hardware co-processors to achieve high performance at low power consumption. These co-processors are typically equipped with traditional address-based interfaces. Networks-on-chips (NoCs) are emerging as scalable interconnect for advanced digital chips. Integration of co-processors with NoCs requires load/store packetizing wrappers on the network interfaces. This leads to unnecessary address generation and address transportation over

Transparent Embedded Compression in Systems-on-Chip

by René van der Vleuten and Pieter Van Der Wolf

2006 IEEE Workshop on Signal Processing Systems Design and Implementation, 2006

Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media p... more Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media processing. We apply embedded compression for bandwidth-hungry image processing functions in order to alleviate this bandwidth bottleneck. In our solution embedded compression is implemented as part of the system-on-chip infrastructure, fully transparent for the hardware and software image processing components. Hence it can be applied without

Industrial IP integration flows based on IP-XACT#8482; standards

by Emmanuel Vaumorin, Pieter Van Der Wolf, and Wolfgang Ecker

Proceedings of the conference on Design, automation and test in Europe - DATE '08, 2008

Abstract Effective integration of advanced systems-on-chip (SoC) requires extensive reuse of IP m... more

Design and programming of embedded multiprocessors

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '04, 2004

We present design technology for the structured design and programming of embedded multi-processo... more We present design technology for the structured design and programming of embedded multi-processor systems. It comprises a task-level interface that can be used both for developing parallel application models and as a platform interface for implementing applications on multi-processor architectures. Associated mapping technology supports refinement of application models towards implementation. By linking application development and implementation aspects, the technology integrates the specification and design phases in the MPSoC design process. Two design cases demonstrate the efficient implementation of the platform interface on different architectures. Industry-wide standardization of a task-level interface can facilitate reuse of function-specific hardware / software modules across companies.

Design space exploration of streaming multiprocessor architectures

IEEE Workshop on Signal Processing Systems, 2002

In this paper, we present a comparison of two design-space exploration approaches. The comparison... more

Transparent Embedded Compression in Systems-on-Chip

2006 IEEE Workshop on Signal Processing Systems Design and Implementation, 2006

Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media p... more Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media processing. We apply embedded compression for bandwidth-hungry image processing functions in order to alleviate this bandwidth bottleneck. In our solution embedded compression is implemented as part of the system-on-chip infrastructure, fully transparent for the hardware and software image processing components. Hence it can be applied without

A Methodology to Design Programmable Embedded Systems

Lecture Notes in Computer Science, 2002

Embedded systems architectures are increasingly becoming programmable, which means that an archit... more Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

An approach for quantitative analysis of application-specific dataflow architectures

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors, 1997

In this paper we present an approach for quantitative analysis of application-specific dataflow a... more In this paper we present an approach for quantitative analysis of application-specific dataflow architectures. The approach allows the designer to rate design alternatives in a quantitative way and therefore supports him in the design process to find better performing architectures. The context of our work is Video Signal Processing algorithms which are mapped onto weakly-programmable, coarse-grain dataflow architectures. The algorithms are represented as Kahn graphs with the functionality of the nodes being coarse-grain functions. We have implemented an architecture simulation environment that permits the definition of dataflow architectures as a composition of architecture elements, such as functional units, buffer elements and communication structures. The abstract, clockcycle accurate simulator has been built using a multi-threading package and employs object oriented principles. This results in a configurable and efficient simulator. Algorithms can subsequently be executed on the architecture model producing quantitative information for selected performance metrics. Results are presented for the simulation of a realistic application on several dataflow architecture alternatives, showing that many different architectures can be simulated in modest time on a modern workstation. communication structure.

The construction of a retargetable simulator for an architecture template

Proceedings of the sixth international workshop on Hardware/software codesign - CODES/CASHE '98, 1998

Systems in the domain of high-performance video signal processing are becoming more and more prog... more Systems in the domain of high-performance video signal processing are becoming more and more programmable. We suggest an approach to design such systems that involves measuring, via simulation, the performance of various architectures on which a set of applications are mapped. This approach requires a retargetable simulator for an architecture template. We describe the retargetable simulator that we constructed for a stream-oriented application-specific dataflow architecture. For each architecture instance of the architecture template, a specific simulator is derived in three steps: the architecture instance is constructed, an execution model is added, and the executable architecture is instrumented to obtain performance numbers. We used object oriented principles together with a high-level simulation mechanism to ensure retargetability and an efficient simulation speed. Finally, we explain how a retargetable simulator can be encapsulated within an environment for automated design space exploration.

TriMedia CPU64 application development environment

Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040), 1999

The architecture of the TriMedia CPU64 is based on the TM1000 DSPCPU. The original VLIW architect... more The architecture of the TriMedia CPU64 is based on the TM1000 DSPCPU. The original VLIW architecture has been extended with the concepts of vector processing and superoperations. The new vector operations and superoperations need to be supported by the compiler and simulator to make them accessible to application programmers. It was our intention to support these new features while remaining compliant with the ANSI C standard. This paper describes the mechanisms which were implemented to achieve this goal. Furthermore, the optimization of applications needs to address the vectorization of the functions to be implemented. Some general guidelines for producing efficient vectorized code are given.

TriMedia CPU64 architecture

Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040), 1999

We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for e... more We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for embedded use in media-processing devices like DTVs and set-top boxes. Intended as a core, its design must be supplemented with on-chip co-processors to obtain a cost-effective system. Good performance is obtained through a uniform 64-bit 5 issue-slot VLIW design, supporting subword parallelism with an extensive instruction set optimized with respect to media-processing. Multi-slot 'super-ops' allow powerful multi-argument and multi-result operations. As an example, an IDCT algorithm shows a very low instruction count in comparison with other processors. To achieve good performance, critical sections in the application program source code need to be rewritten with vector data types and function calls for media operations. Benchmarking with several media applications was used to tune the instruction set and study cache behavior. This resulted in a VLIW architecture with wide data paths and relatively simple cpu control.

A Methodology to Design Programmable Embedded Systems - The Y-Chart Approach

Systems, Architectures, Modeling, and Simulation, 2002

Embedded systems architectures are increasingly becoming programmable, which means that an archit... more Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

YAPI: application modeling for signal processing systems

Proceedings 37th Design Automation Conference, 2000

We present a programming interface called YAPI to model signal processing applications as process... more We present a programming interface called YAPI to model signal processing applications as process networks. The purpose of YAPI is to enable the reuse of signal processing applications and the mapping of signal processing applications onto heterogeneous systems that contain hardware and software components. To this end, YAPI separates the concerns of the application programmer, who determines the functionality of the system, and the system designer, who determines the implementation of the functionality. The proposed model of computation extends the existing model of Kahn process networks with channel selection to support non-deterministic events. We provide an efficient implementation of YAPI in the form of a C++ run-time library to execute the applications on a workstation. Subsequently, the applications are used by the system designer as input for mapping and performance analysis in the design of complex signal processing systems. We evaluate this methodology on the design of a digital video broadcast system-on-chip.

A methodology for architecture exploration of heterogeneous signal processing systems

1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461), 1999

We present a methodology for the exploration of signal processing architectures at the system lev... more We present a methodology for the exploration of signal processing architectures at the system level. The methodology, named SPADE, provides a means to quickly build models of architectures at an abstract level, to easily map applications, modeled as Kahn Process Networks, onto these architecture models, and to analyze the performance of the resulting system by simulation. The methodology distinguishes between

System level design with SPADE: an M-JPEG case study

IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281), 2001

Abstract In this paper we present and evaluate the SPADE (System level Performance Analysis and D... more

A trace transformation technique for communication refinement

Proceedings of the ninth international symposium on Hardware/software codesign - CODES '01, 2001

Models of computation like Kahn and dataflow process networks provide convenient means for modeli... more Models of computation like Kahn and dataflow process networks provide convenient means for modeling signal processing applications. This is partly due to the abstract primitives that these models offer for communication between concurrent processes. However, when mapping an application model onto an architecture, these primitives need to be mapped onto architecture level communication primitives. We present a trace transformation technique

An MPEG-2 decoder case study as a driver for a system level design methodology

Proceedings of the seventh international workshop on Hardware/software codesign - CODES '99, 1999

... The Thdr pro-cess is aware of the high level bitstream organization and distributes the retri... more

Eclipse: heterogeneous multiprocessor architecture for flexible media processing

Proceedings 16th International Parallel and Distributed Processing Symposium, 2002

Eclipse is a heterogeneous multiprocessor architecture for high-performance media processing, inc... more Eclipse is a heterogeneous multiprocessor architecture for high-performance media processing, including high- definition MPEG encoding/decoding. The scalable archi- tecture framework concurrently executes media process- ing kernels in function-specific multi-tasking coprocessors and a media processor, communicating via on-chip mem- ory. Eclipse instances combine application configuration flexibility with the efficiency of function-specific hard- ware.