Is this you? Create Your Porfile

Chia-Tien Dan Lo

Southern Polytechnic State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chia-Tien Dan Lo is active.

Explore More

Publication

Featured researches published by Chia-Tien Dan Lo.

IEEE Transactions on Mobile Computing | 2003

Active memory processor: a hardware garbage collector for real-time Java embedded devices

Witawas Srisa-an; Chia-Tien Dan Lo; Ji-en Morris Chang

Java possesses many advantages for embedded system development, including fast product deployment, portability, security, and a small memory footprint. As Java makes inroads into the market for embedded systems, much effort is being invested in designing real-time garbage collectors. The proposed garbage-collected memory module, a bitmap-based processor with standard DRAM cells is introduced to improve the performance and predictability of dynamic memory management functions that include allocation, reference counting, and garbage collection. As a result, memory allocation can be done in constant time and sweeping can be performed in parallel by multiple modules. Thus, constant time sweeping is also achieved regardless of heap size. This is a major departure from the software counterparts where sweeping time depends largely on the size of the heap. In addition, the proposed design also supports limited-field reference counting, which has the advantage of distributing the processing cost throughout the execution. However, this cost can be quite large and results in higher power consumption due to frequent memory accesses and the complexity of the main processor. By doing reference counting operation in a coprocessor, the processing is done outside of the main processor. Moreover, the hardware cost of the proposed design is very modest (about 8000 gates). Our study has shown that 3-bit reference counting can eliminate the need to invoke the garbage collector in all tested applications. Moreover, it also reduces the amount of memory usage by 77 percent.

computer software and applications conference | 2010

Green Computing Methodology for Next Generation Computing Scientists

Chia-Tien Dan Lo; Kai Qian

Green computing has been an active research area which studies an efficient use of computing resources. It is a growing import subject that creates an urgent need to train next generation computer scientists or practitioners to think “green.” However, green computing has not yet been well taught in computer science (CS) or computer engineering programs (CE) programs, partly due to the lack of rooms to add a new course to those programs. Presented in this paper is an effort to reform core concepts of CS/CE to inculcate green computing in subjects such as algorithms, and operating systems.

international conference on computer design | 2002

Performance enhancements to the Active Memory System

Witawas Srisa-an; Chia-Tien Dan Lo; J.M. Chang

The Active Memory System - a garbage collected memory module - was introduced as a way to provide hardware support for garbage collection in embedded systems. The major component in the design was the Active Memory Processor (AMP) that utilized a set of bit-maps and a combinational circuit to perform mark-sweep garbage collection. The design can achieve constant time for both allocation and sweeping. In this paper two enhancements are made to the design of AMP so that it can perform one-bit reference counting that postpones the need to perform garbage collection. Moreover, a caching mechanism is also introduced to reduce the hardware cost of the design. The experimental results show that the proposed modification can reduce the number of garbage collection invocations by 76%. The speed-up in marking time can be as much as 5.81. With the caching mechanism, the hardware cost can be as small as 27 K gates and 6 KB of SRAM.

field-programmable logic and applications | 2006

Shift-Or Circuit for Efficient Network Intrusion Detection Pattern Matching

Huang-Chun Roan; Wen Jyi Hwang; Chia-Tien Dan Lo

This paper introduces a novel FPGA-based signature match co-processor architecture serving as the core of a hardware-based network intrusion detection system (NIDS). The signature match co-processor architecture is based on the shift-or algorithm. The architecture is comprised of simple shift registers, or-gates, and ROMs where patterns are stored. As compared with related work, experimental results show that the proposed work achieves higher throughput and less hardware resource in the FPGA implementations of NIDS systems

Journal of Systems and Software | 2002

DMMX: dynamic memory management extensions

J. Morris Chang; Witawas Srisa-an; Chia-Tien Dan Lo; Edward F. Gehringer

Dynamic memory management allows programmers to be more productive and increases system reliability and functionality. However, software algorithms for memory management are slow and non-deterministic. It is well known that object-oriented applications tend to be dynamic memory intensive. This has led programmers to eschew dynamic memory allocation for many real-time and embedded systems. Instead, programmers using Java or C++ as a development language frequently decide to allocate memory statically instead of dynamically. In this paper, we present the design of a bitmap-based memory allocator implemented primarily in combinational logic to allocate memory in a small, predictable amount of time. It works in conjunction with an application-specific instruction-set extension called the dynamic memory management extension (DMMX). Allocation is done through a complete binary tree of combinational logic, which allows constant-time object creation. The garbage collection algorithm is mark sweep, where the sweeping phase can be accomplished in constant time. This hardware scheme can greatly improve the speed and predictability of dynamic memory management. The proposed DMMX is an add-on approach, which allows easy integration into any CPU, hardware-implemented Java virtual machine, or processor in memory.

international conference on computer design | 2000

Architectural support for dynamic memory management

J. Morris Chang; Witawas Srisa-an; Chia-Tien Dan Lo

Recent advances in software engineering, such as graphical user interfaces and object-oriented programming, have caused applications to become more memory intensive. These applications tend to allocate dynamic memory prolifically. Moreover, automatic dynamic memory reclamation (garbage collection, GC) has become a popular feature in modern programming languages. As a result, the time consumed by dynamic storage management can be up to one-third of the program execution time. This illustrates the need for a high-performance memory management scheme. This paper presents a top-level design and evaluation of the proposed instruction extensions to facilitate heap management.

Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future | 2000

Scalable hardware-algorithm for mark-sweep garbage collection

Witawas Srisa-an; Chia-Tien Dan Lo; J.M. Chang

The memory-intensive nature of object-oriented languages such as C++ and Java has created the need for high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity in the heap region. Thus, a high-performance memory manager is needed to cope with such applications. As todays VLSI technology advances, it becomes increasingly attractive to map software algorithms such as malloc(), free() and garbage collection into hardware. This paper presents a hardware design of a sweeping function (for mark-and-sweep garbage collection) that fully utilizes the advantages of combinational logic. In our scheme, the bit sweep can detect and sweep the garbage in a constant time. Bit-map marking in software can improve the cache performance and reduce number of page faults; however, it often requires several instructions to perform a single mark. In our scheme, only one hardware instruction is required per mark. Moreover, since the complexity of the sweeping phase is often higher than the marking phase, the garbage collection time may be substantially improved. The hardware complexity of the proposed scheme (bit-sweeper) is O(n), where n represents the size of the bit map.

IEEE Transactions on Parallel and Distributed Systems | 2012

Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction

Yi-Gang Tai; Chia-Tien Dan Lo; Kleanthis Psarris

Many scientific or engineering applications involve matrix operations, in which reduction of vectors is a common operation. If the core operator of the reduction is deeply pipelined, which is usually the case, dependencies between the input data elements cause data hazards. To tackle this problem, we propose a new reduction method with low latency and high pipeline utilization. The performance of the proposed design is evaluated for both single data set and multiple data set scenarios. Further, QR decomposition is used to demonstrate how the proposed method can accelerate its execution. We implement the design on an FPGA and compare its results to other methods.

Journal of Systems and Software | 2004

The design and analysis of a quantitative simulator for dynamic memory management

Chia-Tien Dan Lo; Witawas Srisa-an; J. Morris Chang

The use of object-oriented programming in software development allows software systems to be more robust and more maintainable. At the same time, the development time and expense are also reduced. To achieve these benefits, object-oriented applications use dynamic memory management (DMM) to create generic objects that can be reused. Consequently, these applications are often highly dynamic memory intensive. For the last three decades, several DMM schemes have been proposed. Such schemes include first fit, best fit. segregated fit, and buddy systems. Because the performance (e.g., speed, memory utilization, etc.) of each scheme differs, it becomes a difficult choice in selecting the most suitable approach for an application and what parameters (e.g., block size. etc.) should be adopted. In this paper, a DMM simulation tool and its usage are presented. This tool receives DMM traces of C/C++ or Java programs and performs simulation according to the scheme (first fit, best fit, buddy system, and segregated fit) defined by the user. Techniques required to obtain memory traces are presented. At the end of each simulation run, a variety of performance metrics are reported to the users. By using this tool, software engineers can evaluate system performance and decide which algorithm is the most suitable. Moreover, hardware engineers can perform a system analysis before hardware (e.g., modified buddy system, first fit, etc.) is fabricated.

international symposium on performance analysis of systems and software | 2000

Do generational schemes improve the garbage collection efficiency

Witawas Srisa-an; J.M. Chang; Chia-Tien Dan Lo

Recently, most research efforts on garbage collection have concentrated on reducing pause times. However, very little effort has been spent on the study of garbage collection efficiency, especially generational garbage collection which was introduced as a way to reduce garbage collection pause times. In this paper a detailed study of garbage collection efficiency in generational schemes is presented. The study provides a mathematical model for the efficiency of generation garbage collection. Additionally, important issues such as write-barrier overhead, pause times, residency, and heap size are also addressed. We find that generational garbage collection often has lower garbage collection efficiency than other approaches (e.g. mark-sweep, copying) due to a smaller collected area and write-barrier overhead.

Explore More