Is this you? Create Your Porfile

Pedro P. Carballo

University of Las Palmas de Gran Canaria

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pedro P. Carballo is active.

Explore More

Publication

Featured researches published by Pedro P. Carballo.

digital systems design | 2004

CASSE: a system-level modeling and design-space exploration tool for multiprocessor systems-on-chip

Victor Reyes; Tomás Bautista; Gustavo Marrero; Pedro P. Carballo; Wido Kruijtzer

As SoC complexity grows new methodologies and tools for system design and time-effective ditsign space exploration are required. In this paper we introduce a tool called CASSE, what stands for Camellia system-on-chip simulation environment. CASSE is a fast, flexible, and modular SystemC-based simulation environment which aims to be useful for design-space exploration and system-level design at different abstraction levels. The tool uses transaction-level modeling techniques for fast simulations and easy architectural modeling, and bridge the gap to system implementation by a progressive refinement approach. CASSE is being used in the European 1ST-2001-34410 CAMELLIA project, which focuses on the mapping of innovative smart imaging applications onto an existing video encoding architecture.

Microprocessing and Microprogramming | 1991

Speed-area-power optimization for DCFL and SDCFL class of logic using ring notation

K. Eshraghian; Roberto Sarmiento; Pedro P. Carballo; Antonio Núñez

Abstract Advances in the development of digital GaAs integrated circuits have progressed to the point that designers of signal and data processors can discern the system applications for which GaAs is best suited. Basic computation primitives in DSP and image processing systems are usually adders, multipliers and delay elements. In this paper we present the results of a systematic study conducted to evaluate the influence of layout and design methodologies, both conventional and innovative ones, on the performance of those DSP computation primitives.

Proceedings VHDL International Users' Forum. Fall Conference | 1997

Rapid-prototyping of high-performance RISC cores with VHDL

Tomás Bautista; Gustavo Marrero; Pedro P. Carballo; Antonio Núñez

The authors present some experiences they have obtained in the conception and description of a SPARC v8 IU core to be embedded in custom applications. Its design has been carried out using VHDL-based tools such as Synopsys for debugging and synthesis, and Cascades Epoch for the final implementation stage. These experiences have been gathered into a proposed methodology for the rapid design of high-performance embeddable cores.

digital systems design | 2013

Scalable Video Coding Deblocking Filter FPGA and ASIC Implementation Using High-Level Synthesis Methodology

Pedro P. Carballo; Omar Espino; Romén Neris; Pedro Hernández-Fernández; Tomasz Szydzik; Antonio Núñez

This paper describes key concepts in the design and implementation of a deblocking filter (DF) for a H.264/SVC video decoder. The DF supports QCIF and CIF video formats with temporal and spatial scalability. The design flow starts from a SystemC functional model and has been refined using high-level synthesis methodology to RTL micro architecture. The process is guided with performance measurements (latency, cycle time, power, resource utilization) with the objective of assuring the quality of results of the final system. The functional model of the DF is created in an incremental way from the AVC DF model using OpenSVC source code as reference. The design flow continues with the logic synthesis and the implementation on the FPGA using various strategies. The FPGA implementation is capable to run at 100 MHz, and macro blocks are processed in 6, 500 clock cycles for a throughput of 130 fps for QCIF format and 37 fps for CIF format. A validation platform has been developed using the embedded PowerPC processor in the FPGA, composing a SoC that integrates the tasks for frame generation and visualization on a TFT screen. The FPGA implements both the DF core and a General Purpose Memory Controller (GPMC) slave core. Both cores are connected to the PowerPC440 embedded processor using Local Link interfaces. The FPGA also contains a local memory capable of storing information necessary to filter a complete frame and to store a decoded picture frame. The complete system is implemented in a Virtex5 FX70T device. An ASIC implementation of the deblocking filter has been done using UMC CMOS 65nm technology. The ASIC implementation is running at 181.8 MHz, occupying an area of 596, 392.4 μm2.

VLSI Circuits and Systems VI | 2013

Implementation of scalable video coding deblocking filter from high-level SystemC description

Pedro P. Carballo; Omar Espino; Romén Neris; Pedro Hernández-Fernández; Tomasz Szydzik; Antonio Núñez

This paper describes key concepts in the design and implementation of a deblocking filter (DF) for a H.264/SVC video decoder. The DF supports QCIF and CIF video formats with temporal and spatial scalability. The design flow starts from a SystemC functional model and has been refined using high‐level synthesis methodology to RTL microarchitecture. The process is guided with performance measurements (latency, cycle time, power, resource utilization) with the objective of assuring the quality of results of the final system. The functional model of the DF is created in an incremental way from the AVC DF model using OpenSVC source code as reference. The design flow continues with the logic synthesis and the implementation on the FPGA using various strategies. The final implementation is chosen among the implementations that meet the timing constraints. The DF is capable to run at 100 MHz, and macroblocks are processed in 6,500 clock cycles for a throughput of 130 fps for QCIF format and 37 fps for CIF format. The proposed architecture for the complete H.264/SVC decoder is composed of an OMAP 3530 SOC (ARM Cortex‐A8 GPP + DSP) and the FPGA Virtex‐5 acting as a coprocessor for DF implementation. The DF is connected to the OMAP SOC using the GPMC interface. A validation platform has been developed using the embedded PowerPC processor in the FPGA, composing a SoC that integrates the frame generation and visualization in a TFT screen. The FPGA implements both the DF core and a GPMC slave core. Both cores are connected to the PowerPC440 embedded processor using LocalLink interfaces. The FPGA also contains a local memory capable of storing information necessary to filter a complete frame and to store a decoded picture frame. The complete system is implemented in a Virtex5 FX70T device.

Proceedings of SPIE | 2007

Exploring system interconnection architectures with VIPACES: from direct connections to NOCs

Armando Sánchez-Peña; Pedro P. Carballo; Antonio Núñez

This paper presents a simple environment for the verification of AMBA 3 AXI systems in Verification IP (VIP) production called VIPACES (Verification Interface Primitives for the development of AXI Compliant Elements and Systems). These primitives are presented as a not compiled library written in SystemC where interfaces are the core of the library. The definition of interfaces instead of generic modules let the user construct custom modules improving the resources spent during the verification phase as well as easily adapting his modules to the AMBA 3 AXI protocol. This topic is the main discussion in the VIPACES library. The paper focuses on comparing and contrasting the main interconnection schemes for AMBA 3 AXI as modeled by VIPACES. For assessing these results we propose a validation scenario with a particular architecture belonging to the domain of MPEG4 video decoding, which is compound by an AXI bus connecting an IDCT and other processing resources.

Proceedings of SPIE | 2007

Accelerating a MPEG-4 video decoder through custom software/hardware co-design

Jorge L. Díaz; D. Barreto; Luz García; Gustavo Marrero; Pedro P. Carballo; Antonio Núñez

In this paper we present a novel methodology to accelerate an MPEG-4 video decoder using software/hardware co-design for wireless DAB/DMB networks. Software support includes the services provided by the embedded kernel &mgr;C/OS-II, and the application tasks mapped to software. Hardware support includes several custom co-processors and a communication architecture with bridges to the main system bus and with a dual port SRAM. Synchronization among tasks is achieved at two levels, by a hardware protocol and by kernel level scheduling services. Our reference application is an MPEG-4 video decoder composed of several software functions and written using a special C++ library named CASSE. Profiling and space exploration techniques were used previously over the Advanced Simple Profile (ASP) MPEG-4 decoder to determinate the best HW/SW partition developed here. This research is part of the ARTEMI project and its main goal is the establishment of methodologies for the design of real-time complex digital systems using Programmable Logic Devices with embedded microprocessors as target technology and the design of multimedia systems for broadcasting networks as reference application.

digital systems design | 2006

VIPACES, Verification Interface Primitives for the Development of AXI Compliant Elements and Systems

Armando Sánchez-Peña; Pedro P. Carballo; Luz García; Antonio Núñez

This paper presents VIPACES (verification interface primitives for the development of AXI compliant elements and systems), a simple environment for the verification of AMBA 3 AXI systems in verification IP (VIP) production. The elements come from the necessity of creating generic modules, in the verification phase, for this widely used protocol. These primitives are presented as a not compiled library written in SystemC where interfaces are the core of the library. The definition of interfaces instead of generic modules let the user construct custom modules improving the resources spent during the verification phase as well as easily adapting his own modules to the AMBA 3 AXI protocol. As validation scenario, results obtained for an AXI bus connecting IDCT and other processing resources for MPEG4 video decoding are presented

Proceedings of SPIE | 2003

Method of generating trustworthy performance estimations for soft-IPs

Margarita Marrero; Pedro P. Carballo; Antonio Núñez

At 0.25, 0.18 um processes and beyond important process variations occur not only from one fab to another among batches. Moreover as we approach the realm of deep-submicron design, process variations even across a single die are predicted to become a major source of spread. Reduced signal levels, noise margins and timing windows are all contributing to make previously minor variations in geometry and technological parameters a big issue for circuit design. Worse still, new mechanisms appear that cause important variations not only in transistors but also in interconnect. And some of those mechanisms, show greater variation across a single die than across similar structures on different dice from a wafer. Thus the chip designer must expect significant and not necessarily predictable differences between transistors and between interconnect resistances on a single die. Given this scenario widely recognised by process engineers, and given the additional spread built-in in the process of mapping from a soft IP design to a hard IP block, if the designer had the opportunity to know certain performance parameters of the final hard-cores without doing successive synthesis it would lead to an easier and more predictable and accurate integration of the blocks in the system. In this sense, pre-characterised trust-worthy soft-IP blocks would be preferred candidates to select. We have explored ways for quantifying and analysing the synthesis to layout spread so that, instead of modelling the spread in devices and interconnects, we model and quantify at a higher abstraction level the technology mapping process as a whole, for a set of seed designs that will give bounds and guidelines for the behaviour of other designs when they are mapped to the same technology. For that purpose, only the best-, typical-, worst-case and other process variation corners need to be known. The analysis is based in the actual measured spread of reference seed designs as they experience spread when passing from soft to hard designs.

Proceedings of SPIE | 2003

Some experiences using system-on-chip buses

Pedro P. Carballo; Pablo Santos; Margarita Marrero; Antonio Núñez

Advances in fabrication and design technologies have contributed to integrate a complete system on a chip. A system-on-chip (SoC) is generally composed of a microprocessor core, on-chip memory and one or more specific coprocessors IPs. One of the major drawbacks of this approach is the differences in the interfaces that each virtual component (VC) of the SoC presents. The idea of a common bus infrastructure allows us to smooth the system integration and has been considered as a design solution for SoC architectures. This paper presents a review of different alternatives for SoC buses and summarizes some experiences of their use. Different alternatives exist for SoC buses. ARM has proposed AMBA (Advanced Microcontroller Bus Architecture) as an open specification that serves as a framework for SoC design. AMBA is a bus architecture multiplayer for high performance SoC designs. AMBA support multi-master configurations where a bus arbiter must be included. AMBA-Lite is a simpler alternative if you are using only one master. IBM uses CoreConnect Bus architecture as a SoC solution for buses. CoreConnect share some similarities with AMBA because both use a multilayer bus to accommodate different speeds in the system: AHB and PLB can be compared. The same situation occurs for APB and OPB. Other alternatives can be found. Wishbone is an Open Bus Specification form opencores.org that tries to solve the problem of IP integration. The idea is to specify a common interface between cores to accelerate the development of virtual components. VSIA has proposed Virtual Component Interface (VCI) as a solution to solve the problem of virtual component integration. VCI specify three types of protocols depending on the level of complexity: Peripheral, Basic and Advanced VCI. The development of the IPs compatible with any of the SoC buses above presented is a complex problem. One solution is the use of wrappers that adapts the interface of the Virtual Component to the protocol supported by the SoC buses. The two main characteristics of these wrappers are that the increased in latency and area would be as low as possible. The second solution is to design the IP with the final environment in mind.

Explore More