Is this you? Create Your Porfile

Howard Lei

International Computer Science Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Howard Lei is active.

Explore More

Publication

Featured researches published by Howard Lei.

Proceedings of the 3rd ACM SIGMM international workshop on Social media | 2011

Multimodal location estimation on Flickr videos

Gerald Friedland; Jaeyoung Choi; Howard Lei; Adam Janin

The following article describes an approach to determine the geo-coordinates of the recording place of Flickr videos based on both textual metadata and visual cues. The system is tested on the MediaEval 2010 Placing Task evaluation data, which consists of 5091 unfiltered test videos. The system presented in this article is less complex, uses less training data, and is at the same time more accurate than the best system presented in the evaluation in August 2010. The performance peaks at being able to classify 14% of the videos with less than 10m accuracy. The article describes the realization of the system, analyses of the different uses of multimodal cues and gazetteer information.

international conference on acoustics, speech, and signal processing | 2007

Word-Conditioned Phone N-Grams for Speaker Recognition

Howard Lei; Nikki Mirghafori

We extend the state-of-the-art by applying word-conditioning to constrain phone N-gram features used in speaker recognition. Feature-level combination of 52 word unigrams constraining phone N-grams of order 1, 2, and 3 proved to be the best approach. Our system achieves 18% and 27% improvements compared to a non word-conditioned phone N-grams system on SRE05 and SRE06, respectively. Furthermore, the system achieves 18% and 37% improvements compared to the non word-conditioned phone N-grams system when each system is combined with a GMM-based system on SRE05 and SRE06, suggesting that the word-conditioned features are more complementary. On both corpora, this approach achieves a 4.7% EER standalone, and a 3.3% EER in combination with the non word-conditioned phone N-grams and GMM-based systems. Note that the word-conditioning approach utilizes only 43% of SRE05 data.

international conference on acoustics, speech, and signal processing | 2012

Spectro-temporal Gabor features for speaker recognition

Howard Lei; Bernd T. Meyer; Nikki Mirghafori

In this work, we have investigated the performance of 2D Gabor features (known as spectro-temporal features) for speaker recognition. Gabor features have been used mainly for automatic speech recognition (ASR), where they have yielded improvements. We explored different Gabor feature implementations, along with different speaker recognition approaches, on ROSSI [1] and NIST SRE08 databases. Using the noisy ROSSI database, the Gabor features performed as well as the MFCC features standalone, and score-level combination of Gabor and MFCC features resulted in an 8% relative EER improvement over MFCC features standalone. These results demonstrated the value of both spectral and temporal information for feature extraction, and the complementarity of Gabor features to MFCC features.

acm multimedia | 2012

Name that room: room identification using acoustic features in a recording

Nils Peters; Howard Lei; Gerald Friedland

This paper presents a system for identifying the room in an audio or video recording through the analysis of acoustical properties. The room identification system was tested using a corpus of 13440 reverberant audio samples. With no common content between the training and testing data, an accuracy of 61% for musical signals and 85% for speech signals was achieved. This approach could be applied in a variety of scenarios where knowledge about the acoustical environment is desired, such as location estimation, music recommendation, or emergency response systems.

ieee sensors | 2005

Pulse and DC Electropolishing of Stainless Steel for Stents and Other Devices

Anshuman Bhuyan; Brandon J. Gregory; Howard Lei; Seow Yuen Yee; Yogesh B. Gianchandani

This paper describes optimized conditions for the electropolishing of austenitic type 304 and 316L stainless steels in commercially-available EPS 4000 solution (based on a mixture of phosphoric and sulfuric acids) for use in cardiac stenting applications. Electropolishing parameters such as electrolyte temperature and concentration, current density, polishing duration, use of pulsed current and ultrasonic agitation have been explored and optimal conditions have been found. Quality of the polishing was determined on the average surface roughness, amount of thickness reduction, and overall surface appearance. Samples polished in an ultrasonic bath with pulsed currents of 50 Hz, and 60degC achieved the lowest surface roughness with little or no evidence of surface defects which were present in other recipes. Similar results were seen in both types 304 and 316L stainless steels

international symposium on multimedia | 2013

An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content

Benjamin Elizalde; Howard Lei; Gerald Friedland

Audio-based video event detection (VED) on user-generated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than a sound, such as music, clapping or singing. The difficulty of video content analysis on UGC lies in the acoustic variability and lack of structure of the data. The UGC task has been explored mainly by computer vision, but can be benefited by the used of audio. The i-vector system is state-of-the-art in Speaker Verification, and is outperforming a conventional Gaussian Mixture Model (GMM)-based approach. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper employs the i-vector-based system for audio-based VED on UGC and expands the understanding of the system on the task. It also includes a performance comparison with the conventional GMM-based and state-of-the-art Random Forest (RF)-based systems. The i-vector system aids audio-based event detection by addressing UGC audio characteristics. It outperforms the GMM-based system, and is competitive with the RF-based system in terms of the Missed Detection (MD) rate at 4% and 2.8% False Alarm (FA) rates, and complements the RF-based system by demonstrating slightly improvement in combination over the standalone systems.

international conference on acoustics, speech, and signal processing | 2011

User verification: Matching the uploaders of videos across accounts

Howard Lei; Jaeyoung Choi; Adam Janin; Gerald Friedland

This article presents an attempt to link the uploaders of videos based on the audio track of the videos. Using a subset of the MediaEval [10] Placing Tasks Flickr video set, which is labeled with the uploaders name, we conducted an experiment with a similar setup as a typical NIST speaker recognition evaluation run. Based on the assumption that the audio might be matched in various ways (speaker, channel, environmental noise, etc.), we trained one of ICSIs simplified speaker recognition systems on the audio tracks of the Flickr videos. Note that since the selection of videos is essentially random, the audio track can contain any sounds. We obtain an equal error rate of 36.7% on 312 videos with 11,550 trials. The result has implications for audio research, security applications, and raises privacy concerns.

international conference on biometrics | 2009

Towards Structured Approaches to Arbitrary Data Selection and Performance Prediction for Speaker Recognition

Howard Lei

We developed measures relating feature vector distributions to speaker recognition (SR) performances for performance prediction and potential arbitrary data selection for SR. We examined the measures of mutual information, kurtosis, correlation, and measures pertaining to intra- and inter-speaker variability. We applied the measures on feature vectors of phones to determine which measures gave good SR performance prediction of phones standalone and in combination. We found that mutual information had an -83.5% correlation with the Equal Error Rates (EERs) of each phone. Also, Pearsons correlation between the feature vectors of two phones had a -48.6% correlation with the relative EER improvement of the score-level combination of the phones. When implemented in our new data-selection scheme (which does not require a SR system to be run), the measures allowed us to select data with 2.13% overall EER improvement (on SRE08) over data selected via a brute-force approach, at a fifth of the computational costs.

Multimedia Data Mining and Analytics | 2015

Content-Based Privacy for Consumer-Produced Multimedia

Gerald Friedland; Adam Janin; Howard Lei; Jaeyoung Choi; Robin Sommer

We contend that current and future advances in Internet scale Multimedia analytics , global Inference , and linking can circumvent traditional Security and Privacy barriers. We therefore are in dire need of a new research field to address this issue and come up with new solutions. We present the privacy risks, Attack vectors , details for a preliminary experiment on Account linking , and describe mitigation and educational techniques that will help address the issues.

Thrombosis and Haemostasis | 2002