WebSpeech API — Human-Machine Relations 101

Tomasz Buga
6 min readJan 5, 2023

This a short introduction to sound processing, speech recognition… and a lot of digressions about game development and my own life.

Digital sine waves

I’m trained as a sound engineer. Sound always fascinated me alongside computers. I’ve never had enough time to study Digital Signal Processing and/or learn C++ to code some VSTs* for my DAW**.

Nevertheless, I’ve had plenty of time to work with web code on a professional level, as a part of my career development. In the meantime, I’ve been doing a lot of side projects.

One of which was a platformer game based on the real-time interpretation of the audio input (e.g., a musical track). The gameplay was simple — a player needs to jump from platform to platform and reach the last one, while platforms bounce imitating the sound spectrum.

To achieve that I used Unity’s built-in FFT*** solution and an implementation tutorial from YouTube. Below, you can find part of the code responsible for gathering data on the audio spectrum.

*Virtual Studio Technology a.k.a. audio plug-ins
**Digital Audio Workstation, e.g. Ableton Live, Avid ProTools
***Fast Fourier transform

More on the topic of Unity’s AudioListener: https://docs.unity3d.com/ScriptReference/AudioListener.html

I still felt like there is so much more to be done in terms of using sound.

Something useful.

Do you even WebSpeech API?

Here’s an example of something useful.

My nephew had to learn English. And, he was not quite there yet with pronunciation. That’s how he became a Product Owner of my new app — Learn To Speak.

Once, I knew that I’m not willing to sacrifice my evenings to correct my nephew’s pronunciation, I had to figure out a programmable and simple speaking challenge.

And — implement it.



Tomasz Buga

Software Development Engineer in Tests. Passionate about programming. Experienced, former employee of the insurance industry. Graphic designer by choice.