Audio basics
Play sounds and music with reliable runtime controls.
Audio basics
ExoJS audio is built on two media types: Sound for short, pooled, decoded-buffer playback and Music for long, streamed, seekable tracks. Both extend AbstractMedia and share the same volume, loop, playback rate, mute, bus, and fade controls. The difference is how the browser handles the underlying data.
Sound vs. Music
| Sound | Music | |
|---|---|---|
| Backing | Decoded AudioBuffer | HTMLAudioElement (streamed) |
| Best for | Short SFX, UI sounds, footstep pools | Background tracks, ambient loops, long audio |
| Seekable | No — getTime() always returns 0 | Yes — setTime() seeks to a position |
| Pooled | Yes — poolSize controls concurrent sources | No — one source, one stream |
| Default bus | app.audio.sound | app.audio.music |
Use Sound when you need many overlapping instances of the same short clip. Use Music for long, seekable content that you want the browser to stream from disk rather than decode entirely into memory.
Loading and playing
Both types are loaded through the Loader, like textures:
import { Music, Scene, Sound } from '@codexo/exojs';
class AudioScene extends Scene {
async load(loader) {
await loader.load(Sound, { laser: 'audio/laser.ogg' });
await loader.load(Music, { theme: 'audio/theme.ogg' });
}
init(loader) {
this.laser = loader.get(Sound, 'laser');
this.theme = loader.get(Music, 'theme');
this.theme.setLoop(true).setVolume(0.6).play();
}
}
A Sound instance is a single object that pools multiple underlying AudioBufferSourceNode instances internally. Call play() as many times as you need — the pool handles concurrent voices. A Music instance has one HTMLAudioElement; play() and pause() toggle that single stream.
AudioContext auto-unlock
Browsers require a user gesture before Web Audio starts. ExoJS handles this automatically: the AudioContext is created on first use and resumed (un-suspended) on the first mousedown, touchstart, or touchend event observed on document.
For Sound, play() requests queue until the context is ready. Music uses HTMLAudioElement playback and is still subject to browser autoplay policy, so starting music from a user gesture remains the safest pattern.
If you need to know when the context is ready, import isAudioContextReady from @codexo/exojs and poll it.
Audio needs a user gesture
Browsers won’t start audio until the user interacts with the page. Trigger the first sound or music from a click, tap, or key press — the embedded examples below ask for a click for exactly this reason.
Playback controls
All media share the same control surface:
sound.setVolume(0.8); // 0..2, default 1
sound.setLoop(true); // loop the clip
sound.setPlaybackRate(1.5); // 0.1..20, 2 = double speed / one octave up
sound.setMuted(true); // mute without forgetting volume
sound.play();
sound.pause();
sound.stop(); // stop + reset time
sound.toggle(); // play if paused, pause if playing
All setters return this for chaining. play() accepts optional per-call overrides:
sound.play({ volume: 0.5, loop: true, playbackRate: 1.2 });
Sound.play() also accepts a replace flag ({ replace: true }) that stops all other pooled sources before playing this one — useful for “one voice at a time” scenarios like voice-over lines.
Volume and fading
The engine exposes linear gain (0..2, where 1 is “as authored”). dB conversion is up to you — the raw gain value is what the GainNode receives.
fadeIn(ms) and fadeOut(ms) ramp the output gain over a duration in milliseconds:
// Fade out over 800ms, then pause
music.fadeOut(800, { stopAfter: true });
// Fade in over 500ms
music.fadeIn(500);
The crossFade utility fades from one media instance to another in parallel:
import { crossFade } from '@codexo/exojs';
await crossFade(this.trackA, this.trackB, 2000);
// trackA is now silent (and paused by default), trackB is at full volume
The sound pool
Sound instances are pooled. poolSize (default 8) controls the maximum number of simultaneous AudioBufferSourceNode instances. When the pool is full and you call play() again, the oldest active source is evicted based on poolStrategy:
'fifo'(default) — first-in, first-out. Steady-state playback.'lru'— evicts the source closest to its natural end.'priority'— usesSound.priority(current single-sound behavior is equivalent to FIFO).
For rapid-fire SFX (gunshots, footsteps, UI clicks), set poolSize higher and use the default FIFO strategy:
const gunshot = loader.get(Sound, 'gunshot');
gunshot.poolSize = 24;
// ... hold spacebar to fire rapidly ...
gunshot.play(); // next oldest gets evicted when pool is full
Pitch variation for a richer sound is one line — randomise playbackRate:
const cents = Math.random() * 300 - 150; // -150 to +150 cents
sound.play({ playbackRate: Math.pow(2, cents / 1200) });
Buses
The audio manager exposes three built-in buses: app.audio.master, app.audio.music, and app.audio.sound. Each media instance routes to its default bus (music for Music, sound for Sound). Buses form a tree — music and sound are children of master, and master connects to the audio destination.
Set a media instance’s bus explicitly:
import { AudioBus } from '@codexo/exojs';
const voiceBus = new AudioBus('voice-over', { parent: app.audio.master });
app.audio.registerBus(voiceBus);
sound.bus = voiceBus;
Buses have independent volume, muted, and pan controls, plus a filter chain. The Audio effects chapter covers bus filters in detail.
Audio sprites
A Sound can define named sub-regions (“sprites”) for one-shot playback from specific offsets:
sound.defineSprite('impact', { start: 0.5, end: 0.8 });
sound.defineSprite('whoosh', { start: 1.2, end: 1.6 });
sound.playSprite('impact');
sound.playSprite('whoosh', { volume: 0.7 });
This is useful when you bake multiple sound effects into a single file and address them by name rather than by offset.
Examples
Click the canvas to play a loaded Sound — the minimal audio example.
Two looping music tracks crossfading back and forth with crossFade().
Try it
Playground
API
Where to go next
The next chapter, Spatial audio, covers 2D positional audio — how to place sounds in world space so they pan and attenuate based on the listener’s position.