Anthony S. Fong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anthony S. Fong is active.

Explore More

Publication

Featured researches published by Anthony S. Fong.

ACM Sigarch Computer Architecture News | 2003

Method manipulation in an object-oriented processor

Mok Pak Lun; Richard C. Y. Li; Anthony S. Fong

In Object-Oriented Programming (OOP), programmers divide the software applications into small objects, each of which is responsible for performing part of the work. The communications between different objects are through method invocation. Method manipulation is taking a big role in an OOP system. The performance of method invocation would greatly affect the performance of an OOP system. To speed up method communication, an object-oriented computer architecture is proposed at hardware level. High Level Instruction Set Computer (HISC) is one of the OOP architectures proposed. It applies hardware readable data structures to provide mapping of OOP features directly from architectural level. One of the architectural designs that use the concept of HISC is jHISC v3, which is currently under development. It targets to support Java at architecture level.In this paper, we will discuss the method manipulation procedures in jHISC v3 system. By defining the hardware readable data structures for method context, we will see how jHISC v3 processor can use such information to provide a secure method manipulation mechanism for OOP.

Microelectronics Journal | 2013

Parallel architecture for DNA sequence inexact matching with Burrows-Wheeler Transform

Yao Xin; Benben Liu; Biao Min; Will X. Y. Li; Ray C. C. Cheung; Anthony S. Fong; Ting Fung Chan

The Burrows-Wheeler Transform (BWT) based methodology seems ideally suited for DNA sequence alignment due to its high speed and low space complexity. Despite being efficient in exact matching, the application of BWT in inexact matching still has problems due to the excessive backtracking process. This paper presents a hardware architecture for the BWT-based inexact sequence mapping algorithm using the Field Programmable Gate Array (FPGA). The proposed design can handle up to two errors, including mismatches and gaps. The original recursive algorithm implementation is dealt with using hierarchical tables, and is then parallelized to a large extension through a dual-base extension method. Extensive performance evaluations for the proposed architecture have been conducted using both Virtex 6 and Virtex 7 FPGAs. This design is considerably faster than a direct implementation. When compared with the popular software evaluation tool BWA, our architecture can achieve the same match quality tolerating up to two errors. In an execution speed comparison with the BWA aln process, our design outperforms a range of CPU platforms with multiple threads under the same configuration conditions.

embedded and ubiquitous computing | 2005

Hardware concurrent garbage collection for short-lived objects in mobile java devices

Chi Hang Yau; Yi Yu Tan; Anthony S. Fong; Wing Shing Yu

jHISC is an object-oriented processor for embedded system aiming at accelerating Java execution by hardware approach. Garbage collection is one of the critical tasks in a Java Virtual Machine. In this paper, we have conduct a study of dynamic object allocation and garbage collection behavior of Java program based on SPECjvm 98 benchmark suite and MIDP applications for mobile phones. Life, size, and reference count distribution of Java objects are measured. We found most Java objects die very young, small in size and have small number reference counts. Reference counting object cache with hardware write barrier and object allocator is proposed to provide the hardware concurrent garbage collection for small size objects in jHISC. Hardware support on write barrier greatly reduces the overhead to perform the reference count update. The reference counting collector reclaims the memory occupied by object immediately after the object become garbage. The hardware allocator provides a constant time object allocation. From the investigation, over half of Java objects can be garbage collected by the object cache that makes it unnecessary for these objects to copy to the main memory.

ACM Sigarch Computer Architecture News | 2003

A computer architecture with access control and cache option tags on individual instruction operands

Anthony S. Fong

The access control on data is usually on per memory page basis and is implemented in the Translation Lookaside Buffer (TLB) via page tables managed by memory management of operating systems. In addition, to optimize the memory reference performance, it is desirable to specify if a page should be encached or not, so that unnecessary and undesirable data encaching will be avoided. It is also desirable to be able to specify if maintaining data coherency in a multiprocessing system is required. Such maintenance demands a lot of checking, which creates performance bottlenecks and system complexity, but only a small percentage of data requires absolute data coherency. In all these cases, pages are not logical entities for such attributes or characteristic assignments. Better choices are the operands, which can map directly to the variables in a program.In this paper, we are proposing an architecture to support the access control, the maintenance requirement of data coherency and optional encaching on data as system attributes on individual operands.

international conference on information technology coding and computing | 2000

Object-relational database management system (ORDBMS) using frame model approach

Hing Kwok Wong; Anthony S. Fong

This paper describes a methodology and development approach to implement an object relational database management system (ORDBMS) with the frame model (Fong, 1997). The frame model method is a solution to extend relational database management systems (RDBMS) to handle complex data such as objects. Objects can be represented in a frame model schema as classes, constraints and methods. Schema translation is the first step to implement the object-relational database management system (ORDBMS). This paper investigates a stepwise methodology for schema translation from a relational model to a frame model. The second step is the implementation of ORDBMS to incorporate RDBMS as a kernel, application program interface (API) as interactive SQL and a method command interpreter. Methods are pre-compiled and stored in RDBMS as stored procedures. Methods can be invoked directly through API or a constraint class that is defined in the frame model schema when the associated interactive SQL is issued from the API.

international conference on electronics circuits and systems | 1998

Integrated partition integer execution unit for multimedia and conventional applications

K.C. Tang; Angus Wu; Anthony S. Fong; Derek Chi-Wai Pao

Multimedia instruction set extensions such as Visual Instruction Set from Sun Microsystems, MMX technology from Intel Corporation and AMD adopt the SIMD approach to achieve significant performance improvement for multimedia applications. These existing solutions implement the multimedia unit either within floating point pipeline or sharing the control with integer execution unit and do not support conventional applications. This paper demonstrates the multimedia unit can be integrated into the integer execution unit to deliver benefits in both multimedia and conventional applications.

network and parallel computing | 2007

An instruction folding solution to a java processor

Tan Yiyu; Anthony S. Fong; Yang Xiaojian

Java is widely applied into embedded devices. Java programs are compiled into Java bytecodes, which are executed into the Java virtual machine. The Java virtual machine is a stack machine and instruction folding is a technique to reduce the redundant stack operations. In this paper, a simple instruction folding algorithm is proposed for a Java processor named jHISC, where bytecodes are classified into five categories and the operation results of incomplete folding groups are hold for further folding. In the benchmark JVM98, with respect to all stack operations, the percentage of the eliminated P and C type instructions varies from 87% to 98% and the average is about 93%. The reduced instructions are between 37% and 50% of all operations and the average is 44%.

application-specific systems, architectures, and processors | 2006

Architectural Support on Object-Oriented Programming in a JAVA Processor

Tan Yiyu; Yau Chihang; Anthony S. Fong

Java is widely applied in mobile devices and network applications due to its object-oriented features and corresponding advantages such as security, robustness, platform independence. However, almost all the current Java processors do not provide enough hardware support on object-oriented programming so that the object-oriented related operations are performed by software traps or microcode. Their performance of executing Java program is not well, especially for memory-constraint mobile devices. In this paper, a Java processor architecture named jHISC is proposed, which implements the features of object-oriented programming by hardware directly and executes the object-oriented instructions much faster

ieee region 10 conference | 2002

Dynamic memory allocation/deallocation behavior in Java programs

Anthony S. Fong; R.C.L. Li

As the object-oriented paradigm becomes the mainstream paradigm on software development, due to its attractive features that can effectively divide a complex software problem into several independent modules, one of the object-oriented languages - Java, has become popular in recent years because it is widely used in the rapidly growing Internet computing. However, as with other object-oriented languages, Java has the big drawback of poor performance. One of the main causes of the performance deficiency is the extensive use of dynamic memory allocations and deallocations during object or array creations and destructions. Even a simple Othello applet game easily requires half a million of memory allocations for only one game play. After analysis of the memory allocation behavior of Java programs, it is concluded that about 99% of the allocations are of sizes less than 1024 bytes. The analysis also shows that most of the allocated chunks in small size will survive only a very short period of time and after they are garbage collected, they can be reused in the very near future. This implies the memory allocation/deallocation pattern in Java is bounded to a certain kind of locality. The information presented in this paper is a reference for designing an efficient hardware memory allocation/deallocation unit.

annual acis international conference on computer and information science | 2007

Combining Local and Global History Hashing in Perceptron Branch Prediction

C. Y. Ho; Anthony S. Fong

As the instruction issue rate and depth of pipelining increase, branch prediction is considered as a performance hurdle for modern processors. Extremely high branch prediction accuracy is essential to deliver their potential performance. Many perceptron branch predictors have been investigated to improve the dynamic branch prediction in recent years. This paper introduces combining local history hashing and global history hashing in perceptron branch prediction. This proposed perceptron predictor utilizes self-history as well as global history in indexing different weights of a perceptron. The simulation results show that our proposed perceptron predictor is more accurate than the one using either global history hashing or local history hashing alone. Our proposed perceptron predictor is able to achieve 4.13% misprediction rate and even 0.45% misprediction rate in some cases. And it has an improvement of 9.21% over using global history hashing alone, the mapping scheme proposed by Tarjan and Skadron.

Explore More