Mark Cartwright
Northwestern University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mark Cartwright.
human factors in computing systems | 2015
Mark Cartwright; Bryan Pardo
A natural way of communicating an audio concept is to imitate it with ones voice. This creates an approximation of the imagined sound (e.g. a particular owls hoot), much like how a visual sketch approximates a visual concept (e.g a drawing of the owl). If a machine could understand vocal imitations, users could communicate with software in this natural way, enabling new interactions (e.g. programming a music synthesizer by imitating the desired sound with ones voice). In this work, we collect thousands of crowd-sourced vocal imitations of a large set of diverse sounds, along with data on the crowds ability to correctly label these vocal imitations. The resulting data set will help the research community understand which audio concepts can be effectively communicated with this approach. We have released the data set so the community can study the related issues and build systems that leverage vocal imitation as an interaction modality.
international conference on acoustics, speech, and signal processing | 2016
Mark Cartwright; Bryan Pardo; Gautham J. Mysore; Matthew D. Hoffman
Automated objective methods of audio evaluation are fast, cheap, and require little effort by the investigator. However, objective evaluation methods do not exist for the output of all audio processing algorithms, often have output that correlates poorly with human quality assessments, and require ground truth data in their calculation. Subjective human ratings of audio quality are the gold standard for many tasks, but are expensive, slow, and require a great deal of effort to recruit subjects and run listening tests. Moving listening tests from the lab to the micro-task labor market of Amazon Mechanical Turk speeds data collection and reduces investigator effort. However, it also reduces the amount of control investigators have over the testing environment, adding new variability and potential biases to the data. In this work, we compare multiple stimulus listening tests performed in a lab environment to multiple stimulus listening tests performed in web environment on a population drawn from Mechanical Turk.
intelligent user interfaces | 2014
Mark Cartwright; Bryan Pardo; Josh Reiss
A typical audio mixer interface consists of faders and knobs that control the amplitude level as well as processing (e.g. equalization, compression and reverberation) parameters of individual tracks. This interface, while widely used and effective for optimizing a mix, may not be the best interface to facilitate exploration of different mixing options. In this work, we rethink the mixer interface, describing an alternative interface for exploring the space of possible mixes of four audio tracks. In a user study with 24 participants, we compared the effectiveness of this interface to the traditional paradigm for exploring alternative mixes. In the study, users responded that the proposed alternative interface facilitated exploration and that they considered the process of rating mixes to be beneficial.
acm multimedia | 2014
Mark Cartwright; Bryan Pardo
While programming an audio synthesizer can be difficult, if a user has a general idea of the sound they are trying to program, they may be able to imitate it with their voice. In this technical demonstration, we demonstrate SynthAssist, a system that allows the user to program an audio synthesizer using vocal imitation and interactive feedback. This system treats synthesizer programming as an audio information retrieval task. To account for the limitations of the human voice, it compares vocal imitations to synthesizer sounds by using both absolute and relative temporal shapes of relevant audio features, and it refines the query and feature weights using relevance feedback.
workshop on applications of signal processing to audio and acoustics | 2017
Justin Salamon; Duncan MacConnell; Mark Cartwright; Peter Li; Juan Pablo Bello
Sound event detection (SED) in environmental recordings is a key topic of research in machine listening, with applications in noise monitoring for smart cities, self-driving cars, surveillance, bioa-coustic monitoring, and indexing of large multimedia collections. Developing new solutions for SED often relies on the availability of strongly labeled audio recordings, where the annotation includes the onset, offset and source of every event. Generating such precise annotations manually is very time consuming, and as a result existing datasets for SED with strong labels are scarce and limited in size. To address this issue, we present Scaper, an open-source library for soundscape synthesis and augmentation. Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”. To increase the variability of the output, Scaper supports the application of audio transformations such as pitch shifting and time stretching individually to every event. To illustrate the potential of the library, we generate a dataset of 10,000 sound-scapes and use it to compare the performance of two state-of-the-art algorithms, including a breakdown by soundscape characteristics. We also describe how Scaper was used to generate audio stimuli for an audio labeling crowdsourcing experiment, and conclude with a discussion of Scapers limitations and potential applications.
Proceedings of the ACM on Human-Computer Interaction | 2017
Mark Cartwright; Ayanna Seals; Justin Salamon; Alex C. Williams; Stefanie Mikloska; Duncan MacConnell; Edith Law; Juan Pablo Bello; Oded Nov
Audio annotation is key to developing machine-listening systems; yet, effective ways to accurately and rapidly obtain crowdsourced audio annotations is understudied. In this work, we seek to quantify the reliability/redundancy trade-off in crowdsourced soundscape annotation, investigate how visualizations affect accuracy and efficiency, and characterize how performance varies as a function of audio characteristics. Using a controlled experiment, we varied sound visualizations and the complexity of soundscapes presented to human annotators. Results show that more complex audio scenes result in lower annotator agreement, and spectrogram visualizations are superior in producing higher quality annotations at lower cost of time and human labor. We also found recall is more affected than precision by soundscape complexity, and mistakes can be often attributed to certain sound event characteristics. These findings have implications not only for how we should design annotation tasks and interfaces for audio data, but also how we train and evaluate machine-listening systems.
conference on computers and accessibility | 2016
Robin Brewer; Mark Cartwright; Aaron Karp; Bryan Pardo; Anne Marie Piper
Older adults and people with vision impairments are increasingly using phones to receive audio-based information and want to publish content online but must use complex audio recording/editing tools that often rely on inaccessible graphical interfaces. This poster describes the design of an accessible audio-based interface for post-processing audio content created by visually impaired seniors. We conducted a diary study with five older adults with vision impairments to understand how to design a system that would allow them to edit content they record using an audio-only interface. Our findings can help inform the development of accessible audio-editing interfaces for people with vision impairments more broadly.
Journal of the Acoustical Society of America | 2011
Mark Cartwright; Bryan Pardo
Commercial software synthesizer programming interfaces are usually complex and tedious. They discourage novice users from exploring timbers outside the confines of “factory presets,” and they take expert users out of their flow state. We present a new interface for programming synthesizers that enables both novices and experts to quickly find their target sound in the large, generative timber spaces of synthesizers. The interface does not utilize knobs and sliders that directly control synthesis parameters. It instead utilizes a query-by-example based approach combined with relevance feedback and active learning to create an interactive, personalized search that allows for exploration. We analyze the effectiveness of this new interface through a user study with four populations of varying types of experience in which we compare this approach to a traditional synthesizer interface. [Work supported by the National Science Foundation Graduate Research Fellowship.]
international symposium/conference on music information retrieval | 2013
Mark Cartwright; Bryan Pardo
7th Sound and Music Computing Conference, SMC 2010 | 2010
Arefin Huq; Mark Cartwright; Bryan Pardo