New NSF Grant Will Fund Research for AI-Supported Audio Captioning
Written by:
Michael Giorgio
Published:
Monday, July 28, 2025
Words have meaning – and so do sounds. They can signal danger, establish a setting or create a mood. But what if you have a complete or partial hearing loss?
Accessibility technology has thus far made great strides in captioning words when viewing video content but overlooks the nuances of visualizing elements of environmental sounds, music or speaking style. Even then, how these subtle audio cues are communicated is dependent on the preferences and needs of the individual.
Now, through an NSF (National Science Foundation) research grant, Assistant Professors Mark Cartwright and Sooyeon Lee in the Ying Wu College of Computing’s Department of Informatics in collaboration with Magdalena Fuentes from NYU’s Tandon School of Engineering aim to solve the complex challenge of translating rich hearing experiences into accessible formats while respecting the different ways that deaf and hard-of-hearing (DHH) individuals prefer to receive information.
The $799,997 grant award will support the development of adaptive artificial intelligence systems that can determine which non-speech sounds are important for understanding video content and present them in ways tailored to individual viewer needs and preferences.
The project will combine human-computer interaction and machine learning research to understand stakeholders' needs, create novel captioning datasets and models, and ultimately develop a steerable and adaptable AI-powered audio captioning system that will prioritize and decode sound into text and visuals to support the needs and preferences of millions of people who are DHH or experience a decline in hearing capabilities.
Cartwright stated, “This work will create publicly available tools that will promote equality and full civic engagement in digital education, entertainment and overall quality of life.”