J.M. Chang
Illinois Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by J.M. Chang.
international conference on computer design | 2002
Witawas Srisa-an; Chia-Tien Dan Lo; J.M. Chang
The Active Memory System - a garbage collected memory module - was introduced as a way to provide hardware support for garbage collection in embedded systems. The major component in the design was the Active Memory Processor (AMP) that utilized a set of bit-maps and a combinational circuit to perform mark-sweep garbage collection. The design can achieve constant time for both allocation and sweeping. In this paper two enhancements are made to the design of AMP so that it can perform one-bit reference counting that postpones the need to perform garbage collection. Moreover, a caching mechanism is also introduced to reduce the hardware cost of the design. The experimental results show that the proposed modification can reduce the number of garbage collection invocations by 76%. The speed-up in marking time can be as much as 5.81. With the caching mechanism, the hardware cost can be as small as 27 K gates and 6 KB of SRAM.
Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future | 2000
Witawas Srisa-an; Chia-Tien Dan Lo; J.M. Chang
The memory-intensive nature of object-oriented languages such as C++ and Java has created the need for high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity in the heap region. Thus, a high-performance memory manager is needed to cope with such applications. As todays VLSI technology advances, it becomes increasingly attractive to map software algorithms such as malloc(), free() and garbage collection into hardware. This paper presents a hardware design of a sweeping function (for mark-and-sweep garbage collection) that fully utilizes the advantages of combinational logic. In our scheme, the bit sweep can detect and sweep the garbage in a constant time. Bit-map marking in software can improve the cache performance and reduce number of page faults; however, it often requires several instructions to perform a single mark. In our scheme, only one hardware instruction is required per mark. Moreover, since the complexity of the sweeping phase is often higher than the marking phase, the garbage collection time may be substantially improved. The hardware complexity of the proposed scheme (bit-sweeper) is O(n), where n represents the size of the bit map.
ieee international conference on high performance computing data and analytics | 2000
Woo Hyong Lee; J.M. Chang; Y. Hasan
The importance of dynamic memory management has increased significantly as there is a growing number of developments in object-oriented programs. Many studies show that dynamic memory management is one of the most expensive components in many software systems. In C++ programs especially, it tends to have prolific object creation and deletion. These objects tend to have short life-spans. This paper presents a dynamic memory allocation strategy to reuse these objects to speed up the object management. This object reuse scheme is implemented through the overloading C++ operators, new and delete. The C++ allocation patterns are studied thoroughly in this paper. Over 90% of objects are not bigger than 512 bytes and allocated prolifically. The proposed scheme is made feasible by reuse of these small objects. Our allocation scheme is simple and fast because it requires no splitting and no coalescing, and reduces the number of malloc() calls. It maintains its own free-list which is used for object reuse. The experimented results, based on the proposed allocation scheme, show that allocation speed is increased up to four times compared to other well-known algorithms. Our scheme is purely source-code oriented and built on top of malloc. Therefore, this approach is portable for application to existing code and safe to use with different mallocs.
international symposium on performance analysis of systems and software | 2000
Witawas Srisa-an; J.M. Chang; Chia-Tien Dan Lo
Recently, most research efforts on garbage collection have concentrated on reducing pause times. However, very little effort has been spent on the study of garbage collection efficiency, especially generational garbage collection which was introduced as a way to reduce garbage collection pause times. In this paper a detailed study of garbage collection efficiency in generational schemes is presented. The study provides a mathematical model for the efficiency of generation garbage collection. Additionally, important issues such as write-barrier overhead, pause times, residency, and heap size are also addressed. We find that generational garbage collection often has lower garbage collection efficiency than other approaches (e.g. mark-sweep, copying) due to a smaller collected area and write-barrier overhead.
international performance computing and communications conference | 2002
Qian Yang; Witawas Srisa-an; T. Skotiniotis; J.M. Chang
The performance issues of garbage collection (GC) have been studied for over three decades. This paper uses a new cycle accurate timing tool to measure GC metrics such as allocation latencies, component elapse time (mark, sweep, and compact) and object life span. The data are then used to derive runtime heap residency and overall GC time. In the past, researchers study object life span through a space based approach, where the amount of allocated memory determines GC invocations. We propose a tune based methodology as a complement. Time plays an important role in server environments, where allocations can come in bursts. The experimental results indicate that a time based approach yields significantly less GC calls, while maintains almost the same heap residency as the space based approach. This translates to a more efficient way to collect garbage.
ieee international conference on high performance computing data and analytics | 2000
J.M. Chang; Y. Hasan; Woo Hyong Lee
Dynamic memory management (DMM) has been a high cost component in many software systems. In particular, the use of object orientation often results in an intensive use of dynamic memory, making the dynamic memory performance problem worse. The paper presents a profile based strategy to improve the performance of DMM. The performance improvement comes from a segregated strategy without splitting and coalescing cost. This modification is made feasible by preallocating the free-list based on the profile data of heap memory usage. In this research, the empirical study shows that the maximum number of live objects of each size is independent of the input; this data provides a profile and estimate for the amount of memory the application will need to run and can be preallocated to give great improvement in performance. Compared to the average performance of well known algorithms, the profile based approach is about 3.9 times to 6.5 times faster.
international parallel and distributed processing symposium | 2002
Chia-Tien Dan Lo; Witawas Srisa-an; J.M. Chang
Parallel, multithreaded Java applications such as Web servers, database servers, and scientific applications are becoming increasingly prevalent. Most of them have high object instantiation rates through the new bytecode that is implemented in a garbage collection subsystem typically. For aforementioned applications, traditional garbage collectors are often the bottleneck that limits program performance and processor utilization on multiprocessor systems. They suffer from long garbage collection pauses (stop-the-world mark-sweep algorithm) or inability of collecting cyclic garbage (reference counting approach). Generational garbage collection, however, is based only on the weak generational hypothesis that most objects die young. In this paper, a new multithreaded concurrent generational garbage collector (MCGC) based on mark-sweep with the assistance of reference counting is proposed. The MCGC can take advantage of multiple CPUs in an SMP system and the merits or light weight processes. Furthermore, the long garbage collection pause can be reduced and the garbage collection efficiency can be enhanced.
IEEE Transactions on Very Large Scale Integration Systems | 1999
Witawas Srisa-an; Chia-Tien Dan Lo; J.M. Chang
The memory intensive nature of object-oriented languages such as C++ and Java has created the need of a high-performance dynamic memory management. Object oriented applications often generate higher memory intensity in the heap region. Thus, high-performance memory manager is needed to cope with such applications. As todays VLSI technology advances, it becomes more and more attractive to map basic software algorithms such as malloc(), free(), and realloc() into hardware. This paper presents a hardware design of realloc function that fully utilizes the advantage of combinational logic. There are two steps needed to complete a reallocation process: (a) try to reallocate an the original memory block and (b) if (a) failed, allocate another memory block and copy the contents of the original block to this new location. In our scheme, (a) can be done in constant time. For (b), the allocation of new memory block and the deallocation of original block are done in constant time. The hardware complexity of proposed scheme (i.e. X-unit, RS-unit, and ESG-unit) is O(n), where n represents the size of bit-map.
international symposium on performance analysis of systems and software | 2001
Yang Qian; Witawas Srisa-an; T. Skotiniotis; J.M. Chang
Due to the increasing popularity of java in clienthemer environments, most of todays server applications are multithreaded. Thus, research focusing on the performance analysis of multi-threaded environments has become increasingly important. Since per-thread information can be crucial in such analysis, measuring tools are needed to provide perthread information that may include cycle-based timers and filters to eliminate tracing overhead. In this papel; a Cycle Accurate Thread Timing for Linun Environment (CAlTLE) is presented. This approach provides a cycle-accurate timer with functions to filter out tracing overhead by coordinating efforts from both kernel and user applications. In this scheme, the kernel keeps track of accurate thread timing, while applications inform the kernel which part of the execution is to be measured. To demonstrate the tool
It Professional | 2003
Chia-Tien Dan Lo; Witawas Srisa-an; J.M. Chang
functionality, two case studies are provided, which include measuring latencies incurred by malloc calls and monitoring potential memory heap contention in multithreaded-multiprocessor environments.