Sara H. Basson
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sara H. Basson.
international conference on acoustics, speech, and signal processing | 1990
Charles Jankowski; Ashok Kalyanswamy; Sara H. Basson; Judith Spitz
The creation of the network TIMIT (NTIMIT) database, which is the result of transmitting the TIMIT database over the telephone network, is described. A brief description of the TIMIT database is given, including characteristics useful for speech analysis and recognition. The hardware and software required for the transmission of the database is described. The geographic distribution of the TIMIT utterances is described, followed by information regarding calibration signals used to readily determine the effect of the transmission setup and the distortions introduced by the telephone network.<<ETX>>
Ibm Systems Journal | 2005
Keith Bain; Sara H. Basson; Alexander Faisman; Dimitri Kanevsky
Accessibility in the workplace and in academic settings has increased dramatically for users with disabilities, driven by greater awareness, legislative mandate, and technological improvements. Gaps, however, remain. For persons who are deaf and hard of hearing in particular, full participation requires complete access to audio materials, both for live settings and for prerecorded audio and visual information. Even for users with adequate hearing, captioned or transcribed materials offer another modality for information access, one that can be particularly useful in certain situations, such as listening in noisy environments, interpreting speakers with strong accents, or searching audio media for specific information. Providing this level of access through fully automated means is currently beyond the state of the art. This paper details a number of key advances in audio access that have occurred over the last five years. We describe the Liberated Learning Project, a consortium of universities worldwide, which is piloting technologies to create real-time access for students who are deaf and hard of hearing, without intermediary assistance. In support of this project, IBM Research has created the ViaScribeTM tool that converts speech recognition output to a viable captioning interface. Additional inventions and incremental improvements to speech recognition for captioning are described, as well as future directions.
conference on computers and accessibility | 2002
Keith Bain; Sara H. Basson; Mike Wald
The LIBERATED LEARNING PROJECT (LLP) is an applied research project studying two core questions:1) Can speech recognition (SR) technology successfully digitize lectures to display spoken words as text in university classrooms?2) Can speech recognition technology be used successfully as an alternative to traditional classroom notetaking for persons with disabilities?This paper addresses these intriguing questions and explores the underlying complex relationship between speech recognition technology, university educational environments, and disability issues.
Interactions | 2007
Sara H. Basson; Peter G. Fairweather; Vicki L. Hanson
and potential of speech technology after observing HAL’s conversations with Dave Bowman in 2001: A Space Odyssey. Progress in the accuracy of speech recognition and the clarity of speech synthesis coaxed many to believe that tasks performed with technology—especially the computer—would be accomplished more quickly, more easily, and, perhaps, less painfully through speech-enabled interfaces. Technologies diffuse at different rates—instantly, like the ATM or plodding, like the fax (Scottish inventor Alexander Bain built the first one in 1843!). No one knows how long it will take for speechenabled devices to reach everyday use. The vision of controlling electrical appliances, managing the home or workplace environment, or engaging the mounting complexity of modern automobiles should be embraced by mainstream populations and could be particularly beneficial to older adults. Reduced sensory, motor, and cognitive abilities are normal correlates of aging. This can impair older adults’ ability to interact successfully with their environment, especially with new forms of technology. Service and product designers might respond to older adults’ declining abilities with interfaces made simpler or more redundant through the use of alternate modalities such as speech recognition and synthesis. While text-to-speech and speech recognition engines are built upon 40 years of research and development, our understanding of how to use these technologies is still in its infancy. With our aging population, the opportunity to experiment with speech-enabled interfaces becomes an interesting case for how these technologies can be harnessed for users who have difficulties with traditional interfaces. Such need notwithstanding, we must be mindful that effective design with speech is not obvious. Measurement of interaction patterns and performance involving speech has been a litany of surprise.
Archive | 1999
John C. Thomas; Sara H. Basson; Daryle Gardner-Bonneau
Speech technologies have been a blessing to many people with disabilities. They have allowed people with severe physical impairments to do meaningful work, blind people to access computer technology, and people with speech impairments to communicate, for example. This chapter champions the concept of universal access — employing technologies in designs that serve both those with disabilities and those without. It also discusses the ways in which speech technologies are currently being used in assistive devices, and the problems associated with the current technology. Finally, the authors describe how methodologies and techniques from the disciplines of human-computer interaction (a.k.a. user interface design, usability engineering, and human factors engineering) can be used to better design applications to serve people with disabilities and the population at large.
conference on computer supported cooperative work | 2013
Mahelaqua; Sara H. Basson; Nitendra Rajput; Kundan Shrivastava; Saurabh Srivastava; John C. Thomas
Heavy penetration of mobile devices in rural areas enables access to information services to low-literate users. While connectivity solves a major issue, concepts such as browsing and searching are not intuitive to such users. This paper presents the design of a multimodal (speech+icons) voice browser for low-literate users in rural India. We conducted a field study to determine their current communication styles and preferences. Based on this, we designed three browsers, each influenced by a specific way of communicating -- persona based roles, storytelling and direct interaction. The prototypes were evaluated in a task-based study with 62 low-literate users and were compared with a baseline Interactive Voice Response (IVR) browser. Our results suggest that there is clear acceptance and understanding of the necessary concepts if the browser is designed using the design constructs of persona based roles and storytelling.
Archive | 2008
John C. Thomas; Sara H. Basson; Daryle J. Gardner-Bonneau
Speech technologies have been a blessing to many people with disabilities. They have allowed people with severe physical impairments to do meaningful work, blind people to access computer technology, and people with speech impairments to communicate, for example. This chapter champions the concept of universal access - employing technologies in designs that serve both those with disabilities and those without. It also discusses the ways in which speech technologies are currently being used in assistive devices, and problems associated with current technology. Additionally, the authors describe how methodologies and techniques from the disciplines of human-computer interaction (a.k.a. user interface design, usability engineering, and human factors engineering) can be used to better design applications to serve people with disabilities and the population at large. Finally, the role of technical standards in facilitating accessibility is discussed, and the status of current standards development efforts is described.
Journal of the Acoustical Society of America | 1990
Sara H. Basson; Benjamin Chigier; Charles Jankowski; Judith Spitz; Dina Yashchin
The NYNEX artificial intelligence speech technology group is pursuing automation of operator services using speech technology. In order to automate portions of the directory assistance (DA) transaction, speaker‐independent recognition of isolated city names is currently under development. In the servioe of this project, a large database of New England city names has been collected. The city name responses over the telephone network (CITRON) database were collected to reflect speaker productions in a goal‐oriented man‐machine interaction. To collect this database, real customers were presented with digitized prompts requesting the target city name when they called DA. A total of 27 467 calls were collected over a 1‐month period. The CITRON database will be characterized along the following dimensions: total number of tokens, duration of tokens, number of phones and syllables per token, stress characteristics of each token, phonetic distribution of tokens, and derivable speaker characteristics. These result...
Journal of the Acoustical Society of America | 1989
Sara H. Basson; Judith Spitz; Charles Jankowski
Intonation of naturally produced telephone digit strings appears to conform to particular patterns. Concatenating strings of digits produced with “default” intonation highlights the importance of maintaining appropriate intonation patterns for intelligible, natural‐sounding speech. The purpose of this experiment was to quantify the acoustic events resulting in natural‐sounding telephone number digit strings. Acoustic features of approximately 1000 digits produced in the context of telephone digit strings were measured. The data were gathered at MIT by presenting volunteer speakers with lists of seven‐digit numbers to read. Shifts in fundamental frequency of the vowel, changes in overall energy, and digit duration as a function of position in the digit string were calculated. These results can prove useful for speech recognition and generation. For automatic speech recognition of digit strings, prosodic contours provide cues about word boundary locations and serve as a source of information for error detec...
Journal of the Acoustical Society of America | 1985
Judith L. Klavens; Maria X. Edelstein; Sara H. Basson
Intonation contouring of synthetic speech improves both intelligibility and comprehension [Slowiaczek and Nusbaum, to appear], while flattening appears to interfere with speech perception [Larkey and Danley (1983)]. Given the proven importance of F0, several problems remain in defining the domain of prosodic contour. Among the issues are pauses and intonational domains. These are particularly critical in an unlimited text‐to‐speech system where input is often unpunctuated long complex sentences. This paper reports on current work to determine pause structure of synthetic speech as a component of specifying the domain of prosodic contouring. The system uses a deterministic bottom‐up parser to give a syntactic analysis of a sentence. Based on this and other information, a pause structure is computed algorithmically. The pause‐parsed structure then serves as input for later stages in the application of F0. Syntactically based pause insertion is compared with a simple function/content word‐based pause‐inserti...