BubbleSynth

BubbleSynth: Soap to Sound

ABSTRACT
BubbleSynth translates motion tracking and blob detection data
collected from floating soap bubbles into sound, by using the
size, position, age, and velocity of the bubbles to adjust the
parameters of a harmonic sine wave via Open Sound Control
(OSC). Audio samples, which are provided by SuperCollider,
may be triggered in real time either by rule sets that link bubble
tracking data to specific sounds, or by physical interaction with
the bubbles. The system has been designed to provide a
framework that allows any motion tracking data to be ported in,
while the SuperCollider backend can be changed merely by
adding new SynthDefs.

Keywords
OpenCV, SuperCollider, blob detection, openFrameworks,
bubbles

1. INTRODUCTION
BubbleSynth uses open source software and commodity
hardware to track the position, age, velocity, and size of soap
bubbles produced by a bubble machine or bubble wand, and
translate the information it collects into sound.
The system comprises a bubble emission source, a black
backdrop, a webcam, a light source, a projector, and a Mac
Mini. Bubbles are tracked using OpenCV inside of
OpenFrameworks, and the tracking information is provided to
SuperCollider with ofxSuperCollider, an OpenFrameworks
extension for communicating with the SuperCollider server via
Open Sound Control (OSC). Rule sets determine how each
bubble will affect the harmonic sine wave and other samples.
The system is believed to be the first of its kind.

2. MOTION TRACKING AND BLOB
DETECTION
BubbleSynth’s motion tracking relies on background
subtraction and thresholding. It begins with a reference image
of the system’s black background. As each bubble emerges
from the bubble machine or a user’s bubble wand, it is
illuminated by a bank of upturned stage lights (known
colloquially as “shinbusters”), which allows OpenCV to detect
elements above a certain brightness threshold and different
from its reference image, and subtract the background. The
resulting difference mat isolates the foreground elements,
called “blobs,” calculates their area, and determines where their
centers occur.
Once the blobs have been located, the system determines
the XY position of the centroid of each one. The X coordinate
is mapped to frequency, with the X position plus 440 sent to
SuperCollider via OSC as the new frequency value of the sine
wave. The Y coordinate, meanwhile, is mapped to amplitude.
SuperCollider uses the Y position of the centroid to adjust the
amplitude of the wave, such that bubbles at the top of the
screen signal full amplitude, and the amplitude decreases as
they fall.
The number of blobs allowed to adjust the sine wave can be
controlled dynamically, as can the number of blobs tracked.
This makes it easier to trigger specific audio samples, and
makes the triggers more clearly related to the direct action of
the user, as noted in Section 4.2. If no blobs are currently
detected, the amplitude is adjusted to zero.
The number of blobs is limited only by computational
power, which could be greatly increased by moving the
computation to the GPU.

2.1 Visible light vs. infrared or ultraviolet
The decision to use visible light to illuminate the bubbles past
the threshold value came after experiments with an infrared
(IR) camera and infrared dye that fluoresces in IR indicated that
the amount of dye and IR energy needed to track a thin-film
bubble was prohibitive. Additional attempts were made to use
ultraviolet (UV) dye and commercially available blacklights,
but the amount of blacklight needed was beyond the scope and
budget of the project.

3. SuperCollider Backend
As noted in Section 2, the motion tracking data collected by
OpenCV is sent to SuperCollider through OSC. The
SuperCollider server utilizes a set of SynthDefs to produce the
desired audio. The system has been designed as a framework,
so that anything that provides tracking data can be ported in,
while the backend can be changed merely by providing new
SynthDefs.
One SynthDef is the basic harmonic sine wave, adjusted by
variables drawn from the position, velocity, size and age of the
tracked bubbles. The SynthDefs also go through several filters
to add effects like reverb and pan, and a decay filter, which
produces different sounds depending on how long the bubbles
last. A bubble that dies immediately produces a different sound
than a bubble that persists longer. Specific samples created by
individual SynthDefs may also be triggered by user actions
communicated to SuperCollider via OSC.
The samples for BubbleSynth were carefully chosen to
produce a classic sci-fi atmosphere. They include some
percussive SynthDefs, some chords, and some verbal samples
from the 1984 film Dune, directed by David Lynch and based
on the novel by Frank Herbert. The deep, ambient throb of a
spaceship engine weaves the samples into a cohesive whole.

4. MODES
BubbleSynth includes three different modes: generative, direct
control, and group.

4.1 Generative Mode
In generative mode, the system runs independently, without
direct user interaction. Sound samples are triggered by the size,
position, and velocity of the tracked bubbles.
A rule set determines which samples to trigger based on the
“death of bubbles,” whether they pop while still in the tracked
field of view or drift out of tracking range.
OpenCV sorts the array of blobs in such a way that the
identity of each bubble changes from frame to frame, as it
renumbers the blobs by size at the beginning of each frame.
The largest blob, or Queen Blob, is Blob 0. The position of the
Queen Blob governs the rule set. If the number of blobs on the
current frame is less than the number of blobs on the previous
frame, and the Queen Blob is in the top third of the screen, then
BubbleSynth plays the chord sample. If the Queen Blob is
positioned below the top third, but above the lowest tenth of the
screen (between 33.3 percent and 90 percent), BubbleSynth
plays its percussion sample. On the rare occasion that the
Queen Blob is in the bottom 10 percent of the screen, it triggers
a spoken word sample, “The sleeper must awaken.” This
mapping produces a balance between chords and percussion,
without overloading the composition with spoken words.
When the number of blobs goes from one to zero, the final
spoken word sample is triggered, so when the space is empty,
listeners hear, “The spice must flow.”

4.2 Direct Control Mode
Direct control mode encourages users to physically pop
bubbles, playing BubbleSynth like a traditional musical
instrument. A tracked projection colors the bubbles to tell
BubbleSynth which bubbles are available for popping. The
colored projection appears only as specular highlights on the
bubbles, making it easy for users to differentiate tracked
bubbles from non-tracked bubbles. The projection on the wall
behind the bubbles is below the system’s brightness threshold,
so it does not create a feedback situation.
Direct control mode uses the same rule sets as generative
mode, but fewer bubbles are emitted, so the user can see a more
direct relationship between popping a bubble and hearing a
sample.

4.3 Group Mode
In group mode, multiple users try to keep a large bubble alive
within the tracking area, listening to the resulting change in the
music. Alternatively, users may create their own bubbles while
others manipulate the bubble field with custom bubble wands.

5. FUTURE DIRECTIONS
In its current iteration, BubbleSynth uses a webcam to collect
motion-tracking data. This limits the system to 30 frames per
second, and can lead to motion blur artifacts, which reduce the
efficiency of the motion tracking. The limited field of view may
also result in some bubbles “falling off” of the top or bottom of
the screen. Future builds may include cameras with a higher
frame rate and expanded viewing area.
There is also a possibility in future builds of calculating the Z
coordinates of tracked bubbles, which would eliminate the need
for the black background screen. The most likely tool for
accomplishing this would be a Kinect camera, but it remains to
be seen whether Kinect can track a bubble, or if its IR will go
right through it.