Audio support in Minart 0.5.0
Today I finally released version 0.5.0 of Minart, a simple cross-platform Scala library that I've been working on for procedural art generation and interactive applications.
The big new feature in this release (which is the reason it took me so much time to release it) is audio support.
Now, most libraries would probably just use the high level APIs provided by the language and call it a day, but unfortunately since I want the code to be cross platform (JVM, JS and Native) and allow for procedural audio manipulation, I ended up having to implement everything on top of the low level APIs.
And, since I'm not an audio expert, it was quite hard to figure out what abstractions I needed to build on top of that. I came with some solutions (which I'll detail in this post), but I'm still not entirely happy, so a lot might change in the future. For example, the current version is limited to Mono.
Without further ado, here's the current design:
Audio Waves
The simplest unit of audio is the AudioWave
. It's basically a Double => Double
function that represents an infinite audio wave.
This concept is analogous to the Plane
abstraction in Minart (which is an infinite image).
Since this is basically just a function, it contains some of the operations one would expect, like map
, zipWith
and flatMap
.
Note, however, that AudioWave
is not an endofunctor (and, by extension, not a monad), so the signatures are a bit different than the usual ones.
Still, this is enough to manipulate audio in a functional way.
On top of those there's also an iterator(sampleRate: Double): Iterator[Double]
operator, that returns an iterator of audio samples, and a clip(start: Double, end: Double): AudioClip
operation, that returns a finite clip from this wave.
Speaking of clips:
Audio Clip
An AudioClip
is just an audio wave with an associated duration. This concept is analogous to the SurfaceView
abstraction in Minart.
This is probably the most common type that it's used by an application, as it is the result of loading an audio file.
Again, you can manipulate audio clips with the usual map
, zipWith
and flatMap
operations. You can also convert an audio clip into a wave with the repeating
operation.
It's possible to convert a sequence of samples into an audio clip with AudioClip.fromIndexedSeq(data: IndexedSeq[Double], sampleRate: Double)
.
The implementation is pretty naive, for time t
I just pick the sample at index floor(t * sampleRate)
.
I'm not sure if I should do some interpolation, but this seems good enough, so it's probably not worth it to waste CPU cycles on that.
Oscillator
Another abstraction is the Oscillator
.
Oscillators are basically wave factories: you give them a frequency and an amplitude and you get back an AudioWave
.
Audio Player and Audio Queues
The AudioPlayer
allows an user to play audio clips and waves, similarly to how a Canvas
allows to draw graphics.
Internally, an AudioPlayer
has a MultiChannelAudioQueue
(which is just a collection of SingleChannelAudioQueue
s with some mixing logic).
Audio is enqueued into a channel with the play
operation and channels can be stopped with the stop
operation. Pretty simple.
Some examples
So, here's how simple example that uses all concepts:
// First, an example with audio waves.
// Here we define a function from a time in milliseconds to a frequency
val song = (t: Double) => {
val note =
if (t < 0.3) 0 // A
else if (t < 0.5) 4 // C#
else if (t < 0.7) 7 // E
else 12 // A
math.pow(2, note / 12.0) * 440 // Convert the notes to frequencies (equal temperament)
}
// and here we generate a sin wave with the frequencies from our song and clip the first second
val arpeggio: AudioClip =
AudioWave((t: Double) => math.sin(song(t) * 6.28 * t)).take(1.0)
// Now, here's a similar example, but using the provided oscillators
val bass =
Oscillator.sin
.generateClip(duration = 0.5, frequency = 220, amplitude = 1.0)
.append(Oscillator.sin.generateClip(duration = 0.5, frequency = 330, amplitude = 1.0))
.append(Oscillator.sin.generateClip(duration = 1.0, frequency = 220, amplitude = 1.0))
// And here we our arpeggio (going up and down) with a low root note
val testSample =
arpeggio
.append(arpeggio.reverse)
.zipWith(bass, (high, low) => high * 0.7 + low * 0.3)
// Finally, we play the song
val audioPlayer = AudioPlayer.create(AudioPlayer.Settings())
audioPlayer.play(testSample)
Limitations and Future Improvements
Stereo Support
One of the current main limitations is that only mono audio is supported.
I want to at least support stereo, but I'm not sure where the split should happen.
- Should the split be in the
AudioWave
or in theAudioClip
? - Should there be a
MonoAudioWave
/StereoAudioWave
or just aStereoAudioWave
(and the mono one simply has the same data for left and right)? - What does sampling a
StereoAudioWave
return? AnIterator[(Double, Double)]
? - What do operations like
map
look like? Should there be amapAllChannels(f: Double => Double)
?
Audio Mixing
Another limitation, which is kind of related, is that there is no control of the audio mix.
Sure, you can map
the wave to reduce the volume, but once it's playing, it's over.
It shouldn't be too hard to add some mixing operations to the audio player (e.g. setVolume(chanel: Int, volume: Double)
), but if I'm making it a mixer, maybe one should also be able to pan audio.
And if that's the case, maybe all abstractions could be mono, with the AudioPlayer
having an helpful playStereo(left: AudioClip, right: AudioClip, leftChannel: Int, rightChannel: Int)
which would just play both clips and override the pan?
Oscillators
Finally, while the oscillator abstraction is nice in theory, it has a lot of practical problems.
For one, it's not very easy to go cleanly from one frequency to another. Maybe the oscillator could also take a time => frequency
function to generate a wave, but that seems to be much more complicated to implement.
Also, I feel like I should be able to build an oscillator from audio samples, but right now that's not easy at all. And, on top of that, I think I might want to use 3 samples (attack, sustain and release), which makes things even more complicated.
In general, I still think having an oscillator abstraction is useful (other procedural audio toolkits have that, which is a good indicator), but I might need to rethink the API.