Papers by Pieter Van Der Wolf
2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2007
... Tomas Henriksson and Pieter van der Wolf NXP Semiconductors Research Eindhoven, The Netherlan... more ... Tomas Henriksson and Pieter van der Wolf NXP Semiconductors Research Eindhoven, The Netherlands tomas.henriksson@nxp.com ... A busy period is defined to be a maximal in-terval of time such that data flows on the output link of the multiplexer at rate Cout throughout the ...
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, 2013
ABSTRACT
Embedded Systems for Real-Time Multimedia, 2006
Digital chips for multimedia applications use function-specific hardware co-processors to achieve... more Digital chips for multimedia applications use function-specific hardware co-processors to achieve high performance at low power consumption. These co-processors are typically equipped with traditional address-based interfaces. Networks-on-chips (NoCs) are emerging as scalable interconnect for advanced digital chips. Integration of co-processors with NoCs requires load/store packetizing wrappers on the network interfaces. This leads to unnecessary address generation and address transportation over
2006 IEEE Workshop on Signal Processing Systems Design and Implementation, 2006
Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media p... more Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media processing. We apply embedded compression for bandwidth-hungry image processing functions in order to alleviate this bandwidth bottleneck. In our solution embedded compression is implemented as part of the system-on-chip infrastructure, fully transparent for the hardware and software image processing components. Hence it can be applied without
Proceedings of the conference on Design, automation and test in Europe - DATE '08, 2008
Abstract Effective integration of advanced systems-on-chip (SoC) requires extensive reuse of IP m... more Abstract Effective integration of advanced systems-on-chip (SoC) requires extensive reuse of IP modules as well as automation of the IP integration process, including verification. Key enablers for this are standards to describe and package IP modules. We focus on the IP-...
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '04, 2004
We present design technology for the structured design and programming of embedded multi-processo... more We present design technology for the structured design and programming of embedded multi-processor systems. It comprises a task-level interface that can be used both for developing parallel application models and as a platform interface for implementing applications on multi-processor architectures. Associated mapping technology supports refinement of application models towards implementation. By linking application development and implementation aspects, the technology integrates the specification and design phases in the MPSoC design process. Two design cases demonstrate the efficient implementation of the platform interface on different architectures. Industry-wide standardization of a task-level interface can facilitate reuse of function-specific hardware / software modules across companies.
IEEE Workshop on Signal Processing Systems, 2002
In this paper, we present a comparison of two design-space exploration approaches. The comparison... more In this paper, we present a comparison of two design-space exploration approaches. The comparison is in terms of (1) speed of simulation versus accuracy of performance num-bers, and (2) connection to trajectories for detailed design. The two approaches are: The trace driven approach ...
2006 IEEE Workshop on Signal Processing Systems Design and Implementation, 2006
Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media p... more Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media processing. We apply embedded compression for bandwidth-hungry image processing functions in order to alleviate this bandwidth bottleneck. In our solution embedded compression is implemented as part of the system-on-chip infrastructure, fully transparent for the hardware and software image processing components. Hence it can be applied without
Lecture Notes in Computer Science, 2002
Embedded systems architectures are increasingly becoming programmable, which means that an archit... more Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.
Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors, 1997
In this paper we present an approach for quantitative analysis of application-specific dataflow a... more In this paper we present an approach for quantitative analysis of application-specific dataflow architectures. The approach allows the designer to rate design alternatives in a quantitative way and therefore supports him in the design process to find better performing architectures. The context of our work is Video Signal Processing algorithms which are mapped onto weakly-programmable, coarse-grain dataflow architectures. The algorithms are represented as Kahn graphs with the functionality of the nodes being coarse-grain functions. We have implemented an architecture simulation environment that permits the definition of dataflow architectures as a composition of architecture elements, such as functional units, buffer elements and communication structures. The abstract, clockcycle accurate simulator has been built using a multi-threading package and employs object oriented principles. This results in a configurable and efficient simulator. Algorithms can subsequently be executed on the architecture model producing quantitative information for selected performance metrics. Results are presented for the simulation of a realistic application on several dataflow architecture alternatives, showing that many different architectures can be simulated in modest time on a modern workstation. communication structure.
Proceedings of the sixth international workshop on Hardware/software codesign - CODES/CASHE '98, 1998
Systems in the domain of high-performance video signal processing are becoming more and more prog... more Systems in the domain of high-performance video signal processing are becoming more and more programmable. We suggest an approach to design such systems that involves measuring, via simulation, the performance of various architectures on which a set of applications are mapped. This approach requires a retargetable simulator for an architecture template. We describe the retargetable simulator that we constructed for a stream-oriented application-specific dataflow architecture. For each architecture instance of the architecture template, a specific simulator is derived in three steps: the architecture instance is constructed, an execution model is added, and the executable architecture is instrumented to obtain performance numbers. We used object oriented principles together with a high-level simulation mechanism to ensure retargetability and an efficient simulation speed. Finally, we explain how a retargetable simulator can be encapsulated within an environment for automated design space exploration.
Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040), 1999
The architecture of the TriMedia CPU64 is based on the TM1000 DSPCPU. The original VLIW architect... more The architecture of the TriMedia CPU64 is based on the TM1000 DSPCPU. The original VLIW architecture has been extended with the concepts of vector processing and superoperations. The new vector operations and superoperations need to be supported by the compiler and simulator to make them accessible to application programmers. It was our intention to support these new features while remaining compliant with the ANSI C standard. This paper describes the mechanisms which were implemented to achieve this goal. Furthermore, the optimization of applications needs to address the vectorization of the functions to be implemented. Some general guidelines for producing efficient vectorized code are given.
Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040), 1999
We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for e... more We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for embedded use in media-processing devices like DTVs and set-top boxes. Intended as a core, its design must be supplemented with on-chip co-processors to obtain a cost-effective system. Good performance is obtained through a uniform 64-bit 5 issue-slot VLIW design, supporting subword parallelism with an extensive instruction set optimized with respect to media-processing. Multi-slot 'super-ops' allow powerful multi-argument and multi-result operations. As an example, an IDCT algorithm shows a very low instruction count in comparison with other processors. To achieve good performance, critical sections in the application program source code need to be rewritten with vector data types and function calls for media operations. Benchmarking with several media applications was used to tune the instruction set and study cache behavior. This resulted in a VLIW architecture with wide data paths and relatively simple cpu control.
Systems, Architectures, Modeling, and Simulation, 2002
Embedded systems architectures are increasingly becoming programmable, which means that an archit... more Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.
Proceedings 37th Design Automation Conference, 2000
We present a programming interface called YAPI to model signal processing applications as process... more We present a programming interface called YAPI to model signal processing applications as process networks. The purpose of YAPI is to enable the reuse of signal processing applications and the mapping of signal processing applications onto heterogeneous systems that contain hardware and software components. To this end, YAPI separates the concerns of the application programmer, who determines the functionality of the system, and the system designer, who determines the implementation of the functionality. The proposed model of computation extends the existing model of Kahn process networks with channel selection to support non-deterministic events. We provide an efficient implementation of YAPI in the form of a C++ run-time library to execute the applications on a workstation. Subsequently, the applications are used by the system designer as input for mapping and performance analysis in the design of complex signal processing systems. We evaluate this methodology on the design of a digital video broadcast system-on-chip.
1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461), 1999
We present a methodology for the exploration of signal processing architectures at the system lev... more We present a methodology for the exploration of signal processing architectures at the system level. The methodology, named SPADE, provides a means to quickly build models of architectures at an abstract level, to easily map applications, modeled as Kahn Process Networks, onto these architecture models, and to analyze the performance of the resulting system by simulation. The methodology distinguishes between
IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281), 2001
Abstract In this paper we present and evaluate the SPADE (System level Performance Analysis and D... more Abstract In this paper we present and evaluate the SPADE (System level Performance Analysis and Design space Exploration) methodol-ogy through an illustrative case study. SPADE is a method and tool for architecture exploration of heterogeneous signal process-ing ...
Proceedings of the ninth international symposium on Hardware/software codesign - CODES '01, 2001
Models of computation like Kahn and dataflow process networks provide convenient means for modeli... more Models of computation like Kahn and dataflow process networks provide convenient means for modeling signal processing applications. This is partly due to the abstract primitives that these models offer for communication between concurrent processes. However, when mapping an application model onto an architecture, these primitives need to be mapped onto architecture level communication primitives. We present a trace transformation technique
Proceedings of the seventh international workshop on Hardware/software codesign - CODES '99, 1999
... The Thdr pro-cess is aware of the high level bitstream organization and distributes the retri... more ... The Thdr pro-cess is aware of the high level bitstream organization and distributes the retrieved sequence and picture properties to other processes. ... The parameterization would then allow us to do sensitivity analysis and some design space exploration for this architecture. ...
Proceedings 16th International Parallel and Distributed Processing Symposium, 2002
Eclipse is a heterogeneous multiprocessor architecture for high-performance media processing, inc... more Eclipse is a heterogeneous multiprocessor architecture for high-performance media processing, including high- definition MPEG encoding/decoding. The scalable archi- tecture framework concurrently executes media process- ing kernels in function-specific multi-tasking coprocessors and a media processor, communicating via on-chip mem- ory. Eclipse instances combine application configuration flexibility with the efficiency of function-specific hard- ware.
Uploads
Papers by Pieter Van Der Wolf