Mat C. Hans | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mat C. Hans is active.

Explore More

Publication

Featured researches published by Mat C. Hans.

IEEE Signal Processing Magazine | 2001

Lossless compression of digital audio

Mat C. Hans; Ronald W. Schafer

Lossless audio compression is likely to play an important part in music distribution over the Internet, DVD audio, digital audio archiving, and mixing. The article is a survey and a classification of the current state-of-the-art lossless audio compression algorithms. This study finds that lossless audio coders have reached a limit in what can be achieved for lossless compression of audio. It also describes a new lossless audio coder called AudioPak, which low algorithmic complexity and performs well or even better than most of the lossless audio coders that have been described in the literature.

acm multimedia | 2003

Interacting with audio streams for entertainment and communication

Mat C. Hans; Mark T. Smith

We present a new model of interactive audio for entertainment and communication. A new device called the DJammer and its associated technologies are described. The DJammer introduces the idea of provisioning mobile users to interact cooperatively with digital audio streams. Users can augment the audio in real time and communicate the result in several ways resulting in a new form of multimedia communication across diverse devices and multiple networks. This paper describes the technologies incorporated into the DJammer, and discusses the actual implementation of the prototype DJammer. Future enhancements are also described.

international conference on acoustics, speech, and signal processing | 2002

A low-power, fixed-point, front-end feature extraction for a distributed speech recognition system

Brian Delaney; Nikil Jayant; Mat C. Hans; Tajana Simunic; Andrea Acquaviva

This work describes the optimization of a signal processing front-end for a distributed speech recognition system with the goal of reducing power consumption. Two categories of source code optimizations were used, architectural and algorithmic. Architectural optimizations reduce the power consumption for a particular system, in this case, the HP Labs Smartbadge IV prototype portable system. Algorithmic optimizations are more general and involve changes in the algorithmic implementation of the source code to run faster and consume less power. A cycle accurate energy simulation shows a reduction in power usage by 83.5% with these optimizations. The optimized source code runs 34 times faster than the original code, therefore it can run at lower processor clock speeds and voltages for further reductions in power consumption. This technique, known as dynamic voltage scaling, was implemented on the Smartbadge IV hardware for an overall reduction in power usage of 89.2%.

Real-time Imaging | 2003

Image-based photo hulls for fast and photo-realistic new view synthesis

Gregory G. Slabaugh; Ronald W. Schafer; Mat C. Hans

We present an efficient image-based rendering algorithm that generates views of a scenes photo hull. The photo hull is the largest 3D shape that is photo-consistent with photographs taken of the scene from multiple viewpoints. Our algorithm, image-based photo hulls (IBPH), like the image-based visual hulls (IBVH) algorithm from Matusik et al. on which it is based, takes advantage of epipolar geometry to efficiently reconstruct the geometry and visibility of a scene. Our IBPH algorithm differs from IBVH in that it utilizes the color information of the images to identify scene geometry. These additional color constraints result in more accurately reconstructed geometry, which often projects to better synthesized virtual views of the scene. We demonstrate our algorithm running in a realtime 3D telepresence application using video data acquired from multiple viewpoints.

international conference on image processing | 2002

Multi-resolution space carving using level set methods

Gregory G. Slabaugh; Ronald W. Schafer; Mat C. Hans

We present a multi-resolution space carving algorithm that reconstructs a 3D model of a visual scene photographed by a calibrated digital camera placed at multiple viewpoints. Our approach employs a level set framework for reconstructing the scene. Unlike most standard space carving approaches, our level set approach produces a smooth reconstruction composed of manifold surfaces. Our method outputs a polygonal model, instead of a collection of voxels. We texture-map the reconstructed geometry using the photographs, and then render the model to produce photo-realistic new views of the scene.

international symposium on wearable computers | 2003

A wearable networked MP3 player and "turntable" for collaborative scratching

Mat C. Hans; Mark T. Smith

We present a new type of wearable musical instrumentcalled the DJammer. The DJammer brings turntable likecreative functionality to the MP3 world while offering acollaborative platform for sharing the creative processover the network. It not only provides a new tool to DJsbut also empowers each listener to a new type ofcommunity experience by giving them the ability toactively participate in any jam session. In essence, theDJammer changes the 1-to-n creative process to a trulyn-to-n collaborative experience. This paper describes thetechnologies incorporated into the DJammer, anddiscusses the implementation of the prototype DJammer.Usage models and future enhancements are alsodescribed.

Second International Conference on Web Delivering of Music, 2002. WEDELMUSIC 2002. Proceedings. | 2002

Lossless audio coding with MPEG-4 structured audio

Nikolaos Vasiloglou; Ronald W. Schafer; Mat C. Hans

MPEG-4 structured audio (SA) has been proposed as a flexible standard for generalized audio coding. Originating out of Netsound software developed at MIT SA is based on MIDI-synthesis of sound, but it is enriched with DSP algorithms so as to allow emulation of other types of coders designed for speech and audio signals. We have investigated the use of structured audio for lossless coding of audio signals, and have found that certain limitations of structured audio make implementations of lossless coders less straightforward than might be desired. In particular we have used SA to implement an MPEG-4 compliant version of the lossless audio coder AudioPaK. To implement and validate our new coder we used the software system Sfront, which translates MPEG-4 SA files into efficient C programs that render the audio signal.

international conference on acoustics, speech, and signal processing | 2006

System Identification with Unbounded Loss Functions Under Algorithmic Deficiency

Majid Fozunbal; Mat C. Hans; Ronald W. Schafer

We describe and analyze a comprehensive learning model to address issues such as consistency, convergence rate, and sample complexity in the general context of system identification. The learning model is based on unbounded loss functions, and it incorporates a measure of algorithmic deficiency. We define and use a novel formulation of algorithmic solution that is an extension of the empirical risk minimization method in the sense that it uses a generic notion of side information as opposed to the commonly used input/output observation of a system. Sufficient conditions for consistency as well as closed form expressions for exponential convergence rate and sample complexity of the identification algorithm are derived

asilomar conference on signals, systems and computers | 2004

Isolated word, speaker dependent recognition under the presence noise, based on an audio retrieval algorithm

Nikolaos Vasiloglou; Ronald W. Schafer; Mat C. Hans

With rapidly increasing storage and computational capacity, a common PC can store and index hundreds of hours of speech. This suggests that new approaches based on database techniques might be useful in speech recognition and speech indexing. This paper presents a first step in such a direction. The algorithm developed relies on an indexed single-speaker database. The database consists of spoken utterances transcribed into text. The waveforms of these utterances are converted off-line to binary symbols called fingerprints through a nonlinear frequency-domain transform. The fingerprints are associated with the transcribed text. Given the fingerprint of a new waveform, the best word match from the database can be retrieved. A 3255 word database is used as a test bed. All the words from this database are mixed with white noise and time-scale modified to provide test data. The database is queried with the fingerprint of the test words and the best match is retrieved. The results of the experiments conducted are promising, showing a 99.5% recognition rate for a 20 dB signal to noise ratio (SNR).

Journal of The Audio Engineering Society | 1998