Frank M. Ciaramello
Cornell University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Frank M. Ciaramello.
IEEE Transactions on Image Processing | 2011
Frank M. Ciaramello; Sheila S. Hemami
Real-time, two-way transmission of American Sign Language (ASL) video over cellular networks provides natural communication among members of the Deaf community. As a communication tool, compressed ASL video must be evaluated according to the intelligibility of the conversation, not according to conventional definitions of video quality. Guided by linguistic principles and human perception of ASL, this paper proposes a full-reference computational model of intelligibility for ASL (CIM-ASL) that is suitable for evaluating compressed ASL video. The CIM-ASL measures distortions only in regions relevant for ASL communication, using spatial and temporal pooling mechanisms that vary the contribution of distortions according to their relative impact on the intelligibility of the compressed video. The model is trained and evaluating using ground truth experimental data collected in three separate studies. The CIM-ASL provides accurate estimates of subjective intelligibility and demonstrates statistically significant improvements over computational models traditionally used to estimate video quality. The CIM-ASL is incorporated into an H.264 compliant video coding framework, creating a closed-loop encoding system optimized explicitly for ASL intelligibility. The ASL-optimized encoder achieves bitrate reductions between 10% and 42%, without reducing intelligibility, when compared to a general purpose H.264 encoder.
international conference on image processing | 2011
Frank M. Ciaramello; Amy R. Reibman
We present a methodology to systematically stress objective image quality estimators (QEs). Using computational results instead of expensive subjective tests, we obtain rigorous information of a QEs performance on a constrained but comprehensive set of degraded images. Our process quantifies many of a QEs potential vulnerabilities. Knowledge of these weaknesses can be used to improve a QE during its design process, to assist in selecting which QE to deploy in a real system, and to interpret the results of a chosen QE once deployed.
international conference on acoustics, speech, and signal processing | 2009
Mehmet E. Yildiz; Frank M. Ciaramello; Anna Scaglione
Given a network of N nodes with the i-th sensors observation x<inf>i</inf> ∈ R<sup>M</sup>, the matrix containing all Euclidean distances among measurements ||x<inf>i</inf> − x<inf>j</inf> || ∀i, j ∈ {1,…, N} is a useful description of the data. While reconstructing a distance matrix has wide range of applications, we are particularly interested in the manifold reconstruction and its dimensionality reduction for data fusion and query. To make this map available to the all of the nodes in the network, we propose a fully decentralized consensus gossiping algorithm which is based on local neighbor communications, and does not require the existence of a central entity. The main advantage of our solution is that it is insensitive to changes in the network topology and it is fully decentralized. We describe the proposed algorithm in detail, study its complexity in terms of the number of inter-node radio transmissions and showcase its performance numerically.
Proceedings of SPIE | 2011
Frank M. Ciaramello; Amy R. Reibman
The subjective tests used to evaluate image and video quality estimators (QEs) are expensive and time consuming. More problematic, the majority of subjective testing is not designed to find systematic weaknesses in the evaluated QEs. As a result, a motivated attacker can take advantage of these systematic weaknesses to gain unfair monetary advantage. In this paper, we draw on some lessons of software testing to propose additional testing procedures that target a specific QE under test. These procedures supplement, but do not replace, the traditional subjective testing procedures that are currently used. The goal is to motivate the design of objective QEs which are better able to accurately characterize human quality assessment.
Proceedings of SPIE | 2011
Frank M. Ciaramello; Sheila S. Hemami
Real-time videoconferencing using cellular devices provides natural communication to the Deaf community. For this application, compressed American Sign Language (ASL) video must be evaluated in terms of the intelligibility of the conversation and not in terms of the overall aesthetic quality of the video. This work conducts an experiment to determine the subjective preferences of ASL users in terms of the trade-off between intelligibility and quality when varying the proportion of the bitrate allocated explicitly to the regions of the video containing the signer. A rate-distortion optimization technique, which jointly optimizes for quality and intelligibility according to a user-specified parameter, generates test video pairs for the subjective experiment. Preliminary results suggest that at high bitrates, users prefer videos in which the non-signer regions in the video are encoded with some nominal rate. As the total encoding bitrate decreases, users prefer video in which a greater proportion of the rate is allocated to the signer.
international conference on image processing | 2010
Frank M. Ciaramello; Sheila S. Hemami
Objective estimators for video are expected to estimate accurately subjective ratings provided by humans. This work presents a subjective experiment designed to acquire intelligibility ratings for a collection of compressed ASL videos. The distortions present in the experimental database are analyzed in terms of their impact on the performance of objective estimators. Distortions that do not significantly vary across space or time cannot adequately challenge traditional objective estimators, such as PSNR and RMS distortion contrast, and an objective intelligibility measure designed specifically for ASL video provides negligible improvements in prediction accuracy. Distortions that vary across space and time, affecting only localized regions in the video, are considered spatially and temporally diverse. When the distortions present in the experimental database are sufficiently diverse, the objective intelligibility measure estimates subjective ratings more accurately than PSNR and RMS distortion contrast.
conference on information sciences and systems | 2010
Frank M. Ciaramello; Jung Ko; Sheila S. Hemami
Real-time videoconferencing using cellular devices provides natural communication to the Deaf community. Compressed American Sign Language video must be evaluated in terms of the intelligibility of the conversation and not in terms of the overall aesthetic quality of the video. This work studies the trade-offs between intelligibility and quality when varying the proportion of the rate allocated explicitly to the signer. An intelligibility distortion measure and a quality measure (PSNR) are applied in a rate-distortion optimization framework and a novel encoding technique controls the degree to which intelligibility is emphasized over quality. Understanding the relationship between intelligibility and quality allows the encoder to identify operating points that maximize PSNR while maintaining a minimal level of intelligibility. At fixed bitrates, PSNR can be increased on average by 5 dB with little penalty in intelligibility by providing a nominal amount of rate to the background region. Further increases in PSNR can be achieved at the price of reduced intelligibility.
international conference on image processing | 2012
Sheila S. Hemami; Frank M. Ciaramello; Sean S. Chen; Nathan G. Drenkow; Dae Yeol Lee; Seonwoo Lee; Evan Levine; Adam J. McCann
User experiences in 2D and 3D videoconferencing are evaluated and compared. An experimental system is designed that uses video direct-feed in 2D or 3D, providing nearly life-sized across-the-table videoconferencing to two participants without compression or transmission artifacts. 3D is achieved via polarization, selected because of its high resolution and high potential for eye contact. User experience is evaluated via a subjective test with two interactive tasks. The experiment is completed by three groups, who interact in 3D, in 2D (without polarizing glasses), and in 2D while wearing glasses, serving as a control for the use of glasses. Users of the system in 3D reported an increased ability to perceive depth, but otherwise reported similar user experiences to 2D users relating to quality of interaction. Wearing 3D glasses did not adversely impact user experience.
2010 Western New York Image Processing Workshop | 2010
Frank M. Ciaramello; Jung Ko; Sheila S. Hemami
Real-time videoconferencing using cellular devices provides natural communication to the Deaf community. For this application, compressed American Sign Language (ASL) video must be evaluated in terms of the intelligibility of the conversation and not in terms of the overall aesthetic quality of the video. This work conducts an experiment to determine the subjective preferences of ASL users in terms of the trade-off between intelligibility and quality when varying the proportion of the bitrate allocated explicitly to the regions of the video containing the signer. A rate-distortion optimization technique, which jointly optimizes for quality and intelligibility according to a user-specified parameter, generates test video pairs for the subjective experiment. Preliminary results suggest that at high bitrates, users prefer videos in which the non-signer regions in the video are encoded with some nominal rate. As the total encoding bitrate decreases, users prefer video in which a greater proportion of the rate is allocated to the signer.
visual communications and image processing | 2008
Frank M. Ciaramello; Sheila S. Hemami
Sign language users are eager for the freedom and convenience of video communication over cellular devices. Compression of sign language video in this setting offers unique challenges. The low bitrates available make encoding decisions extremely important, while the power constraints of the device limit the encoder complexity. The ultimate goal is to maximize the intelligibility of the conversation given the rate-constrained cellular channel and power constrained encoding device. This paper uses an objective measure of intelligibility, based on subjective testing with members of the Deaf community, for rate-distortion optimization of sign language video within the H.264 framework. Performance bounds are established by using the intelligibility metric in a Lagrangian cost function along with a trellis search to make optimal mode and quantizer decisions for each macroblock. The optimal QP values are analyzed and the unique structure of sign language is exploited in order to reduce complexity by three orders of magnitude relative to the trellis search technique with no loss in rate-distortion performance. Further reductions in complexity are made by eliminating rarely occuring modes in the encoding process. The low complexity SL optimization technique increases the measured intelligibility up to 3.5 dB, at fixed rates, and reduces rate by as much as 60% at fixed levels of intelligibility with respect to a rate control algorithm designed for aesthetic distortion as measured by MSE.