Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bowon Lee is active.

Publication


Featured researches published by Bowon Lee.


multimedia signal processing | 2009

ConnectBoard: A remote collaboration system that supports gaze-aware interaction and sharing

Kar-Han Tan; Ian N. Robinson; Ramin Samadani; Bowon Lee; Dan Gelb; Alex Vorbau; W. Bruce Culbertson; John G. Apostolopoulos

We present ConnectBoard, a new system for remote collaboration where users experience natural interaction with one another, seemingly separated only by a vertical, transparent sheet of glass. It overcomes two key shortcomings of conventional video communication systems: the inability to seamlessly capture natural user interactions, like using hands to point and gesture at parts of shared documents, and the inability of users to look into the camera lens without taking their eyes off the display. We solve these problems by placing the camera behind the screen, where the remote user is virtually located. The camera sees through the display to capture images of the user. As a result, our setup captures natural, frontal views of users as they point and gesture at shared media displayed on the screen between them. Users also never have to take their eyes off their screens to look into the camera lens. Our novel optical solution based on wavelength multiplexing can be easily built with off-the-shelf components and does not require custom electronics for projector-camera synchronization.


IEEE Journal of Selected Topics in Signal Processing | 2013

Audiovisual Voice Activity Detection Based on Microphone Arrays and Color Information

Vicente Peruffo Minotto; Carlos B. O. Lopes; Jacob Scharcanski; Cláudio Rosito Jung; Bowon Lee

Audiovisual voice activity detection is a necessary stage in several problems, such as advanced teleconferencing, speech recognition, and human-computer interaction. Lip motion and audio analysis provide a large amount of information that can be integrated to produce more robust audiovisual voice activity detection (VAD) schemes, as we discuss in this paper. Lip motion is very useful for detecting the active speaker, and in this paper we introduce a new approach for lips and visual VAD. First, the algorithm performs skin segmentation to reduce the search area for lip extraction, and the most likely lip and non-lip regions are detected using a Bayesian approach within the delimited area. Lip motion is then detected using Hidden Markov Models (HMMs) that estimate the likely occurrence of active speech within a temporal window. Audio information is captured by an array of microphones, and the sound-based VAD is related to finding spatio-temporally coherent sound sources through another set of HMMs. To increase the robustness of the proposed system, a late fusion approach is employed to combine the result of each modality (audio and video). Our experimental results indicate that the proposed audiovisual approach presents better results when compared to existing VAD algorithms.


international conference on embedded networked sensor systems | 2013

The sound of silence

Wai-Tian Tan; Mary Baker; Bowon Lee; Ramin Samadani

A list of the dynamically changing group membership of a meeting supports a variety of meeting-related activities. Effortless content sharing might be the most important application, but we can also use it to provide business card information for attendees, feed information into calendar applications to simplify scheduling of follow-up meetings, populate the membership of collaborative editing applications, mailing lists, and social networks, and perform many other tasks. We have developed a system that uses audio sensing to maintain meeting membership automatically. We choose audio since hearing the same conversation provides a human-centric notion of attending the same gathering. It takes into account walls and other sound barriers between otherwise closely situated people. It can sense participants attending remotely by teleconference. It does not require attendees to perform any explicit action when participants leave a meeting for which they should no longer have access to associated content. It works indoors and outdoors and does not require pre-populating databases with mapping information. For sensors, we require only the commonly available microphones on mobile devices. Our system exploits a new technique for matching sensed patterns of relative audio silence, or silence signatures, from mobile devices (mobile phones, tablets, laptops) to determine device co-location. A signature based on simple silence patterns rather than a detailed audio characterization reveals less information about the content of potentially private conversations and is also more robustly compared across devices that are not clock synchronized. We evaluate our method in formal indoor meetings and teleconferences and in ad hoc gatherings outdoors and in a noisy cafeteria. Across all our tests so far, our approach determines audio co-location with a worst-case accuracy of 96%, and recovery from these errors takes only a few seconds. We also describe a content sharing application supported by silence signature matching, the limitations of our approach, current status, and future plans.


international symposium on circuits and systems | 2008

On the quality assessment of sound signals

A. A. de Lima; Fabio P. Freeland; R. A. de Jesus; Bruno C. Bispo; Luiz W. P. Biscainho; Sergio L. Netto; Amir Said; Antonius Kalker; Ronald W. Schafer; Bowon Lee; M. Jam

This paper constitutes an introduction to the field of quality evaluation of sound (speech and audio) signals. The need for such an assessment is inherent to modern communications: VoIP, mobile phone, or teleconference systems require meaningful measures of performance, which may ultimately assure good service or profitable business. A brief survey on subjective and objective evaluation methods is provided. Recent developments as well as new topics to be investigated are also addressed. Experiments are conducted to illustrate how to validate quality assessment methods.


Journal of the Acoustical Society of America | 2012

A blind algorithm for reverberation-time estimation using subband decomposition of speech signals

Thiago de M. Prego; Amaro A. de Lima; Sergio L. Netto; Bowon Lee; Amir Said; Ronald W. Schafer; Ton Kalker

An algorithm for blind estimation of reverberation time (RT) in speech signals is proposed. Analysis is restricted to the free-decaying regions of the signal, where the reverberation effect dominates, yielding a more accurate RT estimate at a reduced computational cost. A spectral decomposition is performed on the reverberant signal and partial RT estimates are determined in all signal subbands, providing more data to the statistical-analysis stage of the algorithm, which yields the final RT estimate. Algorithm performance is assessed using two distinct speech databases, achieving 91% and 97% correlation with the RTs measured by a standard nonblind method, indicating that the proposed method blindly estimates the RT in a reliable and consistent manner.


ieee international conference on high performance computing data and analytics | 2013

GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm

Vicente Peruffo Minotto; Cláudio Rosito Jung; Luiz Gonzaga da Silveira; Bowon Lee

The aim of most microphone array applications is to localize sound sources in a noisy and reverberant environment. For that purpose, many different sound source localization (SSL) algorithms have been proposed, where the SRP-PHAT (steered response power using the phase transform) has been known as one of the state-of-the-art methods. Its original formulation allows two different practical implementations, one that is computed in the frequency domain (FDSP) and another in the time domain (TDSP), which can be enhanced by interpolation. However, the main problem of this algorithm is its high computational cost due to intensive grid scan in search for the sound source. Considering the power of graphics processing units (GPUs) for working with massively parallelizable compute-intensive algorithms, we present two highly scalable GPU-based versions of the SRP-PHAT, one for each formulation, and also an implementation of the cubic splines interpolation in the GPU. These approaches exploit the parallel aspects of the SRP-PHAT, allowing real-time execution for large search grids. Comparing our GPU approaches against traditional multithreaded CPU approaches, results show a speed up of 275 × for the FDSP, and 70 × for the TDSP with interpolation, when comparing high-end GPUs with high-end CPUs.


Pattern Recognition Letters | 2012

Voice activity detection and speaker localization using audiovisual cues

Dante A. Blauth; Vicente Peruffo Minotto; Cláudio Rosito Jung; Bowon Lee; Ton Kalker

This paper proposes a multimodal approach to distinguish silence from speech situations, and to identify the location of the active speaker in the latter case. In our approach, a video camera is used to track the faces of the participants, and a microphone array is used to estimate the Sound Source Location (SSL) using the Steered Response Power with the phase transform (SRP-PHAT) method. The audiovisual cues are combined, and two competing Hidden Markov Models (HMMs) are used to detect silence or the presence of a person speaking. If speech is detected, the corresponding HMM also provides the spatio-temporally coherent location of the speaker. Experimental results show that incorporating the HMM improves the results over the unimodal SRP-PHAT, and the inclusion of video cues provides even further improvements.


IEEE Transactions on Mobile Computing | 2016

Robust Acoustic Self-Localization of Mobile Devices

Diego B. Haddad; Wallace Alves Martins; Maurício V. M. Costa; Luiz W. P. Biscainho; Leonardo O. Nunes; Bowon Lee

Self-localization of smart portable devices serves as foundation for several novel applications. This work proposes a set of algorithms that enable a mobile device to passively determine its position relative to a known reference with centimeter precision, based exclusively on the capture of acoustic signals emitted by controlled sources around it. The proposed techniques tackle typical practical issues such as reverberation, unknown speed of sound, line-of-sight obstruction, clock skew, and the need for asynchronous operation. After their theoretical developments and off-line simulations, the methods are assessed as real-time applications embedded into off-the-shelf mobile devices operating in real scenarios. When line of sight is available, position estimation errors are at most 4 cm using recorded signals.


international conference on acoustics, speech, and signal processing | 2011

A system approach to residual echo suppression in robust hands-free teleconferencing

Jason Wung; Ted S. Wada; Biing-Hwang Juang; Bowon Lee; Ton Kalker; Ronald W. Schafer

This paper presents a system approach to the residual echo suppression (RES) problem in a noisy acoustic environment. We propose a method that takes advantage of our existing robust acoustic echo cancellation system in order to obtain a residual echo estimate that closely resembles the true, noise-free residual echo. To achieve improved RES during strong near-end interference (e.g., double talk), a psychoacoustic postfilter is also used. The simulation results show that our RES based on the system approach outperforms a conventional estimation method. Comparing the postfiltered output to the unprocessed one indicates that our proposed RES approach can raise the PESQ score by more than half a point.


Speech Communication | 2012

On the quality-assessment of reverberated speech

Amaro A. de Lima; Thiago de M. Prego; Sergio L. Netto; Bowon Lee; Amir Said; Ronald W. Schafer; Ton Kalker; Majid Fozunbal

This paper addresses the problem of quantifying the reverberation effect in speech signals. The perception of reverberation is assessed based on a new measure combining the characteristics of reverberation time, room spectral variance, and direct-to-reverberant energy ratio, which are estimated from the associated room impulse response (RIR). The practical aspects behind a robust RIR estimation are underlined, allowing an effective feature extraction for reverberation evaluation. The resulting objective metric achieves a correlation factor of about 90% with the subjective scores of two distinct speech databases, illustrating the systems ability to assess the reverberation effect in a reliable manner.

Collaboration


Dive into the Bowon Lee's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Luiz W. P. Biscainho

Federal University of Rio de Janeiro

View shared research outputs
Top Co-Authors

Avatar

Leonardo O. Nunes

Federal University of Rio de Janeiro

View shared research outputs
Top Co-Authors

Avatar

Sergio L. Netto

Federal University of Rio de Janeiro

View shared research outputs
Top Co-Authors

Avatar

Amaro A. de Lima

Centro Federal de Educação Tecnológica de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar

Fabio P. Freeland

Federal University of Rio de Janeiro

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bruno C. Bispo

Federal University of Rio de Janeiro

View shared research outputs
Researchain Logo
Decentralizing Knowledge