Audio support in Minart 0.5.0

João Costa @JD557@blog.joaocosta.eu Follow

Portuguese software engineer at Kevel.

My instance is running on a small server so please #nobot

Web: https://www.joaocosta.eu/

GitHub: https://github.com/JD557

Twitter: https://twitter.com/JD557

Itch.io: https://jd557.itch.io

Audio support in Minart 0.5.0

Apr 02, 2023

Today I finally released version 0.5.0 of Minart, a simple cross-platform Scala library that I've been working on for procedural art generation and interactive applications.

The big new feature in this release (which is the reason it took me so much time to release it) is audio support.

Now, most libraries would probably just use the high level APIs provided by the language and call it a day, but unfortunately since I want the code to be cross platform (JVM, JS and Native) and allow for procedural audio manipulation, I ended up having to implement everything on top of the low level APIs.

And, since I'm not an audio expert, it was quite hard to figure out what abstractions I needed to build on top of that. I came with some solutions (which I'll detail in this post), but I'm still not entirely happy, so a lot might change in the future. For example, the current version is limited to Mono.

Without further ado, here's the current design:

Audio Waves

The simplest unit of audio is the AudioWave. It's basically a Double => Double function that represents an infinite audio wave. This concept is analogous to the Plane abstraction in Minart (which is an infinite image).

Since this is basically just a function, it contains some of the operations one would expect, like map, zipWith and flatMap. Note, however, that AudioWave is not an endofunctor (and, by extension, not a monad), so the signatures are a bit different than the usual ones. Still, this is enough to manipulate audio in a functional way.

On top of those there's also an iterator(sampleRate: Double): Iterator[Double] operator, that returns an iterator of audio samples, and a clip(start: Double, end: Double): AudioClip operation, that returns a finite clip from this wave.

Speaking of clips:

Audio Clip

An AudioClip is just an audio wave with an associated duration. This concept is analogous to the SurfaceView abstraction in Minart.

This is probably the most common type that it's used by an application, as it is the result of loading an audio file.

Again, you can manipulate audio clips with the usual map, zipWith and flatMap operations. You can also convert an audio clip into a wave with the repeating operation.

It's possible to convert a sequence of samples into an audio clip with AudioClip.fromIndexedSeq(data: IndexedSeq[Double], sampleRate: Double). The implementation is pretty naive, for time t I just pick the sample at index floor(t * sampleRate). I'm not sure if I should do some interpolation, but this seems good enough, so it's probably not worth it to waste CPU cycles on that.

Oscillator

Another abstraction is the Oscillator.

Oscillators are basically wave factories: you give them a frequency and an amplitude and you get back an AudioWave.

Audio Player and Audio Queues

The AudioPlayer allows an user to play audio clips and waves, similarly to how a Canvas allows to draw graphics.

Internally, an AudioPlayer has a MultiChannelAudioQueue (which is just a collection of SingleChannelAudioQueues with some mixing logic). Audio is enqueued into a channel with the play operation and channels can be stopped with the stop operation. Pretty simple.

Some examples

So, here's how simple example that uses all concepts:

  // First, an example with audio waves.
  // Here we define a function from a time in milliseconds to a frequency
  val song = (t: Double) => {
    val note =
      if (t < 0.3) 0               // A
      else if (t < 0.5) 4          // C#
      else if (t < 0.7) 7          // E
      else 12                      // A
    math.pow(2, note / 12.0) * 440 // Convert the notes to frequencies (equal temperament)
  }

  // and here we generate a sin wave with the frequencies from our song and clip the first second
  val arpeggio: AudioClip =
    AudioWave((t: Double) => math.sin(song(t) * 6.28 * t)).take(1.0)

  // Now, here's a similar example, but using the provided oscillators
  val bass =
    Oscillator.sin
      .generateClip(duration = 0.5, frequency = 220, amplitude = 1.0)
      .append(Oscillator.sin.generateClip(duration = 0.5, frequency = 330, amplitude = 1.0))
      .append(Oscillator.sin.generateClip(duration = 1.0, frequency = 220, amplitude = 1.0))

  // And here we our arpeggio (going up and down) with a low root note
  val testSample =
    arpeggio
      .append(arpeggio.reverse)
      .zipWith(bass, (high, low) => high * 0.7 + low * 0.3)

  // Finally, we play the song
  val audioPlayer = AudioPlayer.create(AudioPlayer.Settings())
  audioPlayer.play(testSample)

Limitations and Future Improvements

Stereo Support

One of the current main limitations is that only mono audio is supported.

I want to at least support stereo, but I'm not sure where the split should happen.

Should the split be in the AudioWave or in the AudioClip?
Should there be a MonoAudioWave/StereoAudioWave or just a StereoAudioWave (and the mono one simply has the same data for left and right)?
What does sampling a StereoAudioWave return? An Iterator[(Double, Double)]?
What do operations like map look like? Should there be a mapAllChannels(f: Double => Double)?

Audio Mixing

Another limitation, which is kind of related, is that there is no control of the audio mix. Sure, you can map the wave to reduce the volume, but once it's playing, it's over.

It shouldn't be too hard to add some mixing operations to the audio player (e.g. setVolume(chanel: Int, volume: Double)), but if I'm making it a mixer, maybe one should also be able to pan audio.

And if that's the case, maybe all abstractions could be mono, with the AudioPlayer having an helpful playStereo(left: AudioClip, right: AudioClip, leftChannel: Int, rightChannel: Int) which would just play both clips and override the pan?

Oscillators

Finally, while the oscillator abstraction is nice in theory, it has a lot of practical problems.

For one, it's not very easy to go cleanly from one frequency to another. Maybe the oscillator could also take a time => frequency function to generate a wave, but that seems to be much more complicated to implement.

Also, I feel like I should be able to build an oscillator from audio samples, but right now that's not easy at all. And, on top of that, I think I might want to use 3 samples (attack, sustain and release), which makes things even more complicated.

In general, I still think having an oscillator abstraction is useful (other procedural audio toolkits have that, which is a good indicator), but I might need to rethink the API.

Release v0.5.0 · JD557/minart GitHub