Stephen Charles Hsu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen Charles Hsu is active.

Explore More

Publication

Featured researches published by Stephen Charles Hsu.

workshop on applications of computer vision | 1994

A system for automated iris recognition

Richard P. Wildes; Jane C. Asmuth; Gilbert L. Green; Stephen Charles Hsu; Raymond J. Kolczynski; James R. Matey; Sterling E. McBride

This paper describes a prototype system for personnel verification based on automated iris recognition. The motivation for this endeavour stems from the observation that the human iris provides a particularly interesting structure on which to base a technology for noninvasive biometric measurement. In particular, it is known in the biomedical community that irises are as distinct as fingerprints or patterns of retinal blood vessels. Further, since the iris is an overt body its appearance is amenable to remote examination with the aid of a computer vision system. The body of this paper details the design and operation of such a system. Also presented are the results of an empirical study where the system exhibits flawless performance in the evaluation of 520 iris images.<<ETX>>

computer vision and pattern recognition | 2006

3D Building Detection and Modeling from Aerial LIDAR Data

Vivek Verma; Rakesh Kumar; Stephen Charles Hsu

This paper presents a method to detect and construct a 3D geometric model of an urban area with complex buildings using aerial LIDAR (Light Detection and Ranging) data. The LIDAR data collected from a nadir direction is a point cloud containing surface samples of not only the building roofs and terrain but also undesirable clutter from trees, cars, etc. The main contribution of this work is the automatic recognition and estimation of simple parametric shapes that can be combined to model very complex buildings from aerial LIDAR data. The main components of the detection and modeling algorithms are (i) Segmentation of roof and terrain points. (ii) Roof topology Inference. We introduce the concept of a roof-topology graph to represent the relationships between the various planar patches of a complex roof structure. (iii) Parametric roof composition. Simple parametric roof shapes that can be combined to create a complex roof structure of a building are recognized by searching for sub-graphs in its roof-topology graph. (iv) Terrain Modeling. The terrain is identified and modeled as a triangulated mesh. Finally, we provide experimental results that demonstrate the validity of our approach for rapid and automatic building detection and geometric modeling with real LIDAR data. We are able to model cities and other urban areas at the rate of about 10 minutes per sq. mile on a low-end PC.

IS&T/SPIE's Symposium on Electronic Imaging: Science & Technology | 1995

Mosaic-based video compression

Michal Irani; Stephen Charles Hsu; P. Anandan

We describe a technique for video compression, based on a mosaic image representation obtained from all frames in a scene sequence, giving a panoramic view of the scene. We describe two types of mosaics, static and dynamic, which are suited for storage and transmission applications, respectively. In each case, the mosaic construction process aligns the images using a global parametric motion transformation, usually canceling the effect of camera motion on the dominant portion of the scene. The residual motions that are not compensated by the parametric motion are then analyzed for their significance and coded. The mosaic representation exploits large scale spatial and temporal correlations in image sequences. In many applications where there is significant camera motion (e.g., remote surveillance), it performs substantially better than traditional interframe compression methods, and offers the potential for very low bitrate transmission. In storage applications, such as digital libraries and video editing environments, it has the additional benefit of enabling direct access and retrieval of single frames at a time.

computer vision and pattern recognition | 2000

Pose estimation, model refinement, and enhanced visualization using video

Stephen Charles Hsu; Supun Samarasekera; Rakesh Kumar; Harpreet S. Sawhney

In this paper we present methods for exploitation and enhanced visualization of video given a prior coarse untextured polyhedral model of a scene. Since it is necessary to estimate the 3D poses of the moving camera, we develop an algorithm where tracked features are used to predict the pose between frames and the predicted poses are refined by a coarse to fine process of aligning projected 3D model line segments to oriented image gradient energy pyramids. The estimated poses can be used to update the model with information derived from video, and to re-project and visualize the video from different points of view with a larger scene context. Via image registration, we update the placement of objects in the model and the 3D shape of new or erroneously modeled objects, then map video texture to the model. Experimental results are presented for long aerial and ground level videos of a large-scale urban scene.

Laser Radar Technology and Applications VIII | 2003

Automatic registration and visualization of occluded targets using ladar data

Stephen Charles Hsu; Supun Samarasekera; Rakesh Kumar

High-resolution 3D imaging ladar systems can penetrate foliage and camouflage to sample fragments of concealed surfaces of interest. Samples collected while the ladar moves can be integrated into a coherent object shape, provided that sensor poses are known. We detail a system for automatic data-driven registration of ladar frames, consisting of a coarse search stage, a pairwise fine registration stage using an iterated closest points algorithm, and a multi-view registration strategy. We evaluate this approach using simulated and field-collected ladar imagery of foliage-occluded objects. Even after alignment and aggregation, it is often difficult for human observers to find, assess, and recognize objects from a point cloud display. We survey and demonstrate basic display manipulations, surface fitting techniques, and clutter suppression to enhance visual exploitation of 3D imaging ladar data.

workshop on applications of computer vision | 1998

Influence of global constraints and lens distortion on pose and appearance recovery from a purely rotating camera

Stephen Charles Hsu; Harpreet S. Sawhney

Given a video sequence acquired by an uncalibrated camera rotating about a fixed center of projection, it is desired to estimate the appearance of the scene in all possible directions, recover the camera orientations, and determine the constant focal length and radial lens distortion of the camera, without recourse to physical scene measurements. While parts of this problem have been studied before, there has not yet been a comprehensive algorithmic solution that accommodates a variety of trajectories nor a characterization of the prerequisites for successful reconstruction. This paper studies a computationally efficient algorithm that takes maximum advantage of all information available locally among pairs or small groups of frames as well as global closed cycle constraints. It works with any rotational trajectory, not just panning about a fixed axis. Analyzing the algorithms error surface predicts the sensitivity of estimation to lens distortion and global constraints. Both the analysis and experiments with natural images demonstrate that it is crucial to use either lens distortion compensation or closed cycle constraints to get accurate parameter estimates and well-aligned mosaic images.

computer vision and pattern recognition | 2001

Geocoded terrestrial mosaics using pose sensors and video registration

Stephen Charles Hsu

The paper presents a complete algorithm for building geocoded terrestrial mosaics from aerial video accompanied by GPS/INS readings, without relying on ground survey points or reference imagery to provide geographic control. The 2D mosaic-to-video frame mappings are jointly estimated by bundle adjustment of constraints from pose sensor data and interframe registrations. Multiple-swath video collections are handled by automatically registering spatially adjacent frames across swaths. The proposed approach optimally combines the pose and interframe constraints for geocoding, unlike existing 2D mosaic techniques, while avoiding the complexity of 3D reconstruction. The method was validated on two highly dissimilar operating scenarios, with quantitative evaluation against ground truth supplied by known geocoded reference imagery. One test over hilly terrain found median mosaic continuity and geocoding errors of 1.6 m and 3.1 m, respectively, which is acceptable for many tasks. Characterization of such errors is essential for acceptance of this video mosaic process in critical geospatial applications.

Proceedings of SPIE, the International Society for Optical Engineering | 2005

Wide-area terrain mapping by registration of flash LIDAR imagery

Barbara Hanna; Bing-Bing Chai; Stephen Charles Hsu

Three-dimensional flash LIDAR coupled with a 2D RGB camera on an aerial platform is an efficient data collection method for mapping wide-area terrain and urban sites with imagery draped over a 3D model. In order to assemble a seamless and geographically accurate mosaic product despite GPS/INS errors, frames of imagery require data-driven registration. In the approach described in this paper, all spatially overlapping frame pairs are registered, be they adjacent in time, within the same flight line, or across flight lines, and the alignment model accounts for parallax due to 3D structure. All pairwise registration constraints, along with GPS/INS measurements, are combined by least squares adjustment to estimate the pose of each frame. Registered LIDAR frames are then combined and regridded to a uniformly sampled DEM, which is then used to orthorectify and mosaic the RGB frames. Furthermore, in order to process and store hours of data efficiently, a control strategy partitions the entire terrain into moderate size tiles, within which the pairwise registration, least squares adjustment, and resampling are performed. In a flash LIDAR system designed to map 360 sq. km per hour at 1m resolution, the software will achieve near real-time throughput on a commercial PC.

Battlespace digitization and network-centric warfare . Conference | 2002

Immersive remote monitoring of urban sites

Rakesh Kumar; Harpreet S. Sawhney; Aydin Arpa; Supun Samarasekera; Manoj Aggrawal; Stephen Charles Hsu; David Nistér; Keith J. Hanna

In a typical security and monitoring system a large number of networked cameras are installed at fixed positions around a site under surveillance. There is generally no global view or map that shows the guard how the views of different cameras relate to one another. Individual cameras may be equipped with pan, tilt and zoom capabilities, and the guard may be able to follow an intruder with one camera, then pick him up with another. But such tracking can be difficult, and hand off between cameras disorienting. The guard does not have the ability to continually shift his viewpoint. More over current systems do not scale up with the number of cameras. The system becomes more unwieldy as cameras are added to the system. In this paper, we will present the system and key algorithms for remote immersive monitoring of an urban site using a blanket of video cameras. The guard monitors the world using a live 3D model, which is constantly being updated from different directions using the multiple video streams. The world can be monitored remotely from any virtual viewpoint. The observer can see the entire scene from far and get a birds eye view or can fly/zoom in and see activity of interest up close. A 3D-site model is constructed of the urban site and used as glue for combining the multiple video streams. Moreover each of the video cameras has smart image processing associated with it, which allows it to detect moving and new objects in the scene and recover their 3D geometry and pose of the camera with respect to the world model. Each video stream is overlaid on top of the video model using the recovered pose. Virtual views of the scene are generated by combining the various video streams, the background 3D model and the recovered 3D geometry of foreground objects. These moving objects are highlighted on the 3D model and used as a cue by the operator to direct his viewpoint.

Archive | 1995