Developers from Google’s research laboratory presented a paper (PDF) about interactive television applications that leverage ambient audio analysis at the Euro Interactive Television conference last week. The research paper, which describes a system for providing contextually relevant web content for television consumers, received the best paper award. The described system uses a computer microphone to analyze the audio being emitted by a television, and uses that data to determine what show the user is watching. The system can then provide the user with web content that relates to the show:
We introduce four applications for mass personalization: personalized content layers, ad hoc social communities, real-time popularity ratings and virtual media library services. Using the ambient audio originating from the television, the four applications are available with no more effort than simple television channel surfing.
Designed to maximize user privacy while minimizing dependency on unique hardware, the system described in the paper seems interesting and feasible. In order to protect user privacy, the software uses “summary statistics” automatically generated from ambient audio rather than transmitting an actual recording. The actual audio cannot be extrapolated from the summary statistic data, so the system doesn’t “overhear” or transmit user conversations.
The paper also describes several different services that can be provided to users based on their current television behavior. Video bookmarking, for instance, would allow a user to preserve and archive the summary statistics generated by the current ambient audio. These bookmarked summary statistics could potentially be used later to retrieve the particular show and jump to a particular time index. An ad-hoc peer community feature could enable individuals that are watching the same show to communicate and interact with each other. The paper also describes how the system could be used to provide consumers with personalized information regarding the show, and also generate real-time television viewing statistics.
Feasibility is also examined in the paper. The system generates 32-bit “fingerprints” from 12ms audio frames within short query clips and compares those against the similar statistics stored in a remote database. The system is apparently highly effective, and can supposedly match a show accurately from ambient audio even in the presence of additional noise. According to the paper, the database can store a full year of broadcast fingerprints in only one gigabyte of space.
Although the audio query system has a creepy 1984 vibe that may put some people off, I think that a service built on this technology could potentially be very popular. The paper does a nice job addressing privacy concerns, illustrating the tangible benefits of the system, and illuminating ways that it could leverage other Google technologies, but it is important to remember that this is still just research, not necessarily the makings of a future Google service.