Mark Debbage | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Debbage is active.

Explore More

Publication

Featured researches published by Mark Debbage.

high performance interconnects | 2015

Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics

Mark S. Birrittella; Mark Debbage; Ram Huggahalli; James A. Kunz; Tom Lovett; Todd M. Rimmer; Keith D. Underwood; Robert C. Zak

The Intel® Omni-Path Architecture (Intel® OPA) is designed to enable a broad class of computations requiring scalable, tightly coupled CPU, memory, and storage resources. Integration between devices in the Intel® OPA family and Intel® CPUs enable improvements in system level packaging and network efficiency. When coupled with the new user-focused open standard APIs developed by the OpenFabrics Alliance (OFA) Open Fabrics Initiative (OFI), host fabric interfaces (HFIs) and switches in the Intel® OPA family are optimized to provide low latency, high bandwidth, and high message rate. Intel® OPA provides important innovations to enable a multi-generation, scalable fabric, including: link layer reliability, extended fabric addressing, and optimizations for high core count CPUs. Datacenter needs are also a core focus for Intel® OPA, which includes: link level traffic flow optimization to minimize datacenter jitter for high priority packets, robust partitioning support, quality of service support, and a centralized fabric management system. Basic performance metrics from first generation HFI and switch implementations demonstrate the potential of the new fabric architecture.

IEEE Micro | 2000

SH-5: the 64 bit superH architecture

Prasenjit Biswas; Atsushi Hasegawa; Srinivas Mandaville; Mark Debbage; Andy Sturges; Fumio Arakawa; Yasuhiko Saito; Kunio Uchiyama

A collaborative effort of Hitachi and STMicroelectronics, the SH-5 is the latest member of the SuperH microprocessor series. Its CPU core is the first implementation of a new instruction set architecture consisting of 32-bit instructions, 64-bit registers, SIMD (single-instruction, multiple-data) instructions for multimedia applications, and a compatibility mode supporting the 16-bit SuperH instruction set. Embodying an emerging philosophy of embedded-core design, the SH-5 provides a platform for a wide range of applications: set top cable boxes, digital TV, voice over IP (Internet telephony), network processing, PDAs (personal digital assistants), Internet appliances, in-car information systems, game machines, and so on. A single cost-effective, optimum design that will cater to the requirements of such a wide range of applications is not feasible. So the SH-5 core supports a carefully selected set of functions critical to meeting the performance, power, and code-size requirements of these applications. At the same time, it: provides features that ease integration into a system on chip (SOC) that uses application-specific hardware modules to cater to specific requirements.

IEEE Micro | 2016

Enabling Scalable High-Performance Systems with the Intel Omni-Path Architecture

Mark S. Birrittella; Mark Debbage; Ram Huggahalli; James A. Kunz; Tom Lovett; Todd M. Rimmer; Keith D. Underwood; Robert C. Zak

The Intel Omni-Path Architecture (Intel OPA) is designed to enable a broad class of computations requiring scalable, tightly coupled CPU, memory, and storage resources. Integration between the Intel OPA family and Intel CPUs enable improvements in system-level packaging and network efficiency. When coupled with the new open standard APIs developed by the OpenFabrics Alliance (OFA) Open Fabrics Initiative (OFI), the Intel OPA family is optimized to provide low latency, high bandwidth, and a high message rate. Intel OPA enables a multigeneration, scalable fabric through innovations including link layer reliability, extended fabric addressing, and optimizations for high-core-count CPUs. Intel OPA also provides optimizations to address datacenter needs, including link-level traffic flow optimization, to minimize jitter for high-priority packets, partitioning support, quality-of-service support, and a centralized fabric management system. Basic performance metrics from first-generation host fabric interface and switch implementations demonstrate the new fabric architectures potential.

high performance interconnects | 2017

Host Software Stack Optimizations to Maximize Aggregate Fabric Throughput

Vignesh T. Ravi; James Erwin; Pradeep Sivakumar; Cq Tang; Jianxin Xiong; Ravindra Babu Ganapathi; Mark Debbage

Scientific HPC applications along with the emerging class of Big Data and Machine Learning workloads are rapidly driving the fabric scale both on premises and in the cloud. Achieving high aggregate fabric throughput is paramount to the overall performance of the application. However, achieving high fabric throughput at scale can be challenging - that is, the application communication pattern will need to map well on to the target fabric architecture, and the multi-layered host software stack in the middle will need to orchestrate that mapping optimally to unleash the full performance.In this paper, we investigate low-level optimizations to the host software stack with the goal of improving the aggregate fabric throughput, and hence, application performance. We develop and present a number of optimization and tuning techniques that are key driving factors to the fabric performance at scale - such as, Fine-grained interleaving, improved pipelining, and careful resource utilization and management. We believe that these low-level optimizations can be commonly leveraged by several programming models and their runtime implementations making these optimizations broadly applicable. Using a set of well-known MPI-based scientific applications, we demonstrate that these optimizations can significantly improve the overall fabric throughput and the application performance. Interestingly, we also observe that some of these optimizations are inter-related and can additively contribute to the overall performance.

Archive | 2016

Sending packets using optimized pio write sequences without sfences

Mark Debbage; Yatin M. Mutha

Archive | 2015

Optimized credit return mechanism for packet sends

Mark Debbage; Yatin M. Mutha

IEICE Transactions on Electronics | 2001

Embedded Processor Core with 64-Bit Architecture and Its System-On-Chip Integration for Digital Consumer Products

Kunio Uchiyama; Fumio Arakawa; Yasuhiko Saito; Koki Noguchi; Atsushi Hasegawa; Shinichi Yoshioka; Naohiko Irie; Takeshi Kitahara; Mark Debbage; Andy Sturges

IEICE Transactions on Electronics | 2002