Is this you? Create Your Porfile

Tero Kangas

Tampere University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tero Kangas is active.

Explore More

Publication

Featured researches published by Tero Kangas.

ACM Transactions in Embedded Computing Systems | 2006

UML-based multiprocessor SoC design framework

Tero Kangas; Petri Kukkala; Heikki Orsila; Erno Salminen; Marko Hännikäinen; Timo D. Hämäläinen; Jouni Riihimäki; Kimmo Kuusilinna

This paper describes a complete design flow for multiprocessor systems-on-chips (SoCs) covering the design phases from system-level modeling to FPGA prototyping. The design of complex heterogeneous systems is enabled by raising the abstraction level and providing several system-level design automation tools. The system is modeled in a UML design environment following a new UML profile that specifies the practices for orthogonal application and architecture modeling. The design flow tools are governed in a single framework that combines the subtools into a seamless flow and visualizes the design process. Novel features also include an automated architecture exploration based on the system models in UML, as well as the automatic back and forward annotation of information in the design flow. The architecture exploration is based on the global optimization of systems that are composed of subsystems, which are then locally optimized for their particular purposes. As a result, the design flow produces an optimized component allocation, task mapping, and scheduling for the described application. In addition, it implements the entire system for FPGA prototyping board. As a case study, the design flow is utilized in the integration of state-of-the-art technology approaches, including a wireless terminal architecture, a network-on-chip, and multiprocessing utilizing RTOS in a SoC. In this study, a central part of a WLAN terminal is modeled, verified, optimized, and prototyped with the presented framework.

Journal of Systems Architecture | 2007

Automated memory-aware application distribution for Multi-processor System-on-Chips

Heikki Orsila; Tero Kangas; Erno Salminen; Timo D. Hämäläinen; Marko Hännikäinen

Mapping of applications on a Multi-processor System-on-Chip (MP-SoC) is a crucial step to optimize performance, energy and memory constraints at the same time. The problem is formulated as finding solutions to a cost function of the algorithm performing mapping and scheduling under strict constraints. Our solution is based on simultaneous optimization of execution time and memory consumption whereas traditional methods only concentrate on execution time. Applications are modeled as static acyclic task graphs that are mapped on MP-SoC with customized simulated annealing. The automated mapping in this paper is especially purposed for MP-SoC architecture exploration, which typically requires a large number of trials without human interaction. For this reason, a new parameter selection scheme for simulated annealing is proposed that sets task mapping specific optimization parameters automatically. The scheme bounds optimization iterations to a reasonable limit and defines an annealing schedule that scales up with application and architecture complexity. The presented parameter selection scheme compared to extensive optimization achieves 90% goodness in results with only 5% optimization time, which helps large-scale architecture exploration where optimization time is important. The optimization procedure is analyzed with simulated annealing, group migration and random mapping algorithms using test graphs from the Standard Task Graph Set. Simulated annealing is found better than other algorithms in terms of both optimization time and the result. Simultaneous time and memory optimization method with simulated annealing is shown to speed up execution by 63% without memory buffer size increase. As a comparison, optimizing only execution time yields 112% speedup, but also increases memory buffers by 49%.

signal processing systems | 2006

HIBI Communication Network for System-on-Chip

Erno Salminen; Tero Kangas; Timo D. Hämäläinen; Jouni Riihimäki; Vesa Lahtinen; Kimmo Kuusilinna

This paper presents a communication network targeted for complex system-on-chip (SoC) and network-on-chip (NoC) designs. The Heterogeneous IP Block Interconnection (HIBI) aims at maximum efficiency and minimum energy per transmitted bit combined with quality-of-service (QoS) in transfers. Other features include support for hierarchical topologies with several clock domains, flexible scalability, and runtime reconfiguration of network parameters. HIBI is intended for integrating coarse-grain components such as intellectual property (IP) blocks that have size of thousands of gates.HIBI has been implemented in VHDL and SystemC and synthesized on several CMOS technologies and on FPGA. A 32-bit wrapper requires 5400 gates and runs with 315 MHz on 0.18 μ m technology which shows that only minimal area overhead is paid for the advanced features. The area and frequency results are well comparable to other NoC proposals.Furthermore, data transfers are shown to approach the maximum theoretical performance for protocol efficiency. HIBI network is accompanied with a design framework with tools for optimizing the system through automated design space exploration.

international symposium on system-on-chip | 2006

Parameterizing Simulated Annealing for Distributing Task Graphs on Multiprocessor SoCs

Heikki Orsila; Tero Kangas; Erno Salminen; Timo D. Hämäläinen

Mapping an application on multiprocessor system-on-chip (MPSoC) is a crucial step in architecture exploration. The problem is to minimize optimization effort and application execution time. Simulated annealing is a versatile algorithm for hard optimization problems, such as task distribution on MPSoCs. We propose a new method of automatically selecting parameters for a modified simulated annealing algorithm to save optimization effort. The method determines a proper annealing schedule and transition probabilities for simulated annealing, which makes the algorithm scalable with respect to application and platform size. Applications are modeled as static acyclic task graphs which are mapped to an MPSoC. The parameter selection method is validated by extensive simulations with 50 and 300 node graphs from the standard graph set

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2004

HIBI v.2 Communication Network for System-on-Chip

Erno Salminen; Vesa Lahtinen; Tero Kangas; Jouni Riihimäki; Kimmo Kuusilinna; Timo D. Hämäläinen

This paper presents a communication network targeted for complex system-on-chip (SoC) and network-on-chip (NoC) designs. The Heterogeneous IP Block Interconnection v.2 (HIBI) aims at maximum efficiency and energy saving per transmitted bit combined with guaranteed quality-of-service (QoS) in transfers. Other features include support for arbitrary topologies with several clock domains, flexible scalablility in signalling and run-time reconfiguration of network parameters. HIBI has been implemented in VHDL and SystemC and synthesized in 0.18 CMOS technology with area comparable to other NoC wrappers. HIBI data transfers are shown to approach the maximum theoretical performance for protocol efficiency.

Journal of Systems Architecture | 2007

Benchmarking mesh and hierarchical bus networks in system-on-chip context

Erno Salminen; Tero Kangas; Vesa Lahtinen; Jouni Riihimäki; Kimmo Kuusilinna; Timo D. Hämäläinen

The performance and area of a System-on-Chip depend on the utilized communication method. This paper presents simulation-based comparison of generic, synthesizable single bus, hierarchical bus, and 2-dimensional mesh on-chip networks. Performance of the network depends heavily on the application and therefore six test cases with multiple parameter values are used. Furthermore, two versions of each network topology are compared. The results show that hierarchical bus scales well to large number of agents and offers a good performance and area trade-off although it has smaller aggregate bandwidth and area than mesh. Hierarchical HIBI bus achieves runtimes comparable to 2-dimensional cut-through mesh with about 50% smaller network logic. However, depending on the test case, the runtime can be reduced by 20-50% when wider bus links are utilized.

international symposium on system-on-chip | 2003

Using a communication generator in SoC architecture exploration

Tero Kangas; Jouni Riihimäki; Erno Salminen; Kimmo Kuusilinna; Timo D. Hämäläinen

This paper presents an implementation of a communication generator, named transaction generator. It is utilized in communication-centric SoC architecture exploration where the objective is to find the optimal hardware allocation, task partitioning, and scheduling with a given application model and architectural requirements. An application is abstracted with a process network model of computation and architecture is described with characteristic metrics. In addition to accelerating architecture exploration, transaction generator can be used in development, verification and comparison of on-chip communication networks.

Microprocessors and Microsystems | 2002

Interconnection scheme for continuous-media systems-on-a-chip

Vesa Lahtinen; Kimmo Kuusilinna; Tero Kangas; Timo D. Hämäläinen

Abstract In this paper, an on-chip interconnection scheme called Heterogeneous IP Block Interconnection (HIBI) is presented. HIBI offers a scaleable and easy to use architecture for system-on-a-chip designs. Its most distinguishing feature is the lack of a central arbiter and specialized arbitration signals. The arbitration is distributed among the connected agents that are made aware of each others communication requirements. This enables data transmissions with very low latencies and also minimizes the amount of needed signal lines. These features make the scheme a good fit to continuous-media systems transmitting large amounts of streaming data. With the presented interconnection scheme, bus efficiencies greater than 90% have been achieved in several test cases. With the streaming burst transfers and time slot based accesses of HIBI, throughputs of over one data transmission per clock cycle are possible.

international symposium on system-on-chip | 2005

Hybrid Algorithm for Mapping Static Task Graphs on Multiprocessor SoCs

Heikki Orsila; Tero Kangas; Timo D. Hämäläinen

Mapping of applications on multiprocessor system-on-chip is a crucial step in the system design to optimize the performance, energy and memory constraints at the same time. The problem is formulated as finding solutions to an objective function of the algorithm performing the mapping and scheduling under strict constraints. Our solution is a new hybrid algorithm that distributes the computational tasks modeled as static acyclic task graphs The algorithm uses simulated annealing and group migration algorithms consecutively and it combines a non-greedy global and greedy local optimization techniques to have good properties of both ways. The algorithm begins as coarse grain optimization and moves towards fine grained optimization. As a case study we used ten 50-nodc graphs from the Standard Task Graph Set and averaged results over 100 optimization runs. The hybrid algorithm gives 8% better execution time on a system with four processing elements compared to simulated annealing. In addition, the number of iterations increased only moderately, which justifies the new algorithm in SoC design.

norchip | 2005

Requirements for network-on-chip benchmarking

Erno Salminen; Tero Kangas; Jouni Riihimäki; Timo D. Hämäläinen

This work presents the motivation, basic concepts, and requirements for benchmarking a network-on-chip (NoC). Currently there is practically no benchmark sets for NoC or the presented tools do not meet the requirements. The presented benchmarking method utilizes traffic generator with a dataflow models of the applications. Combined with transaction-level NoC, the abstract application model allows approximately 200/spl times/ speedup and on average 10% error in estimated runtime w.r.t. cycle-accurate HW/SW cosimulation without exposing the exact internal functionality of the application.

Explore More