Latest news from Hitomi

Keeping track of multi-channel audio

The challenge

Perhaps the most irritating defect in video systems – from the viewer’s perspective – is when the audio is out of synchronisation. The mouth is moving, but it bears little resemblance to what the voice is saying. So, since the beginning of talkies, a century ago, we have needed a means of synchronising pictures and sound.

The original solution was the clapperboard, something that makes a distinctive image and a very sharp sound, which can then be synchronised at any point in the post production. For single camera productions, it is still an adequate solution.

The challenge today is maintaining synchronisation in multi-camera productions, recording multi-channel sound. Not only is there more to synchronise, you have to track which audio channel goes where.


Electronic equivalents of the clapperboard have existed for some time: certainly as far back as the Valid and Valid8 system from Vistek/Pro-bel. GLITS – Graham’s Line Identification Tones System, developed by Graham Haines of the BBC – gave engineers the ability to detect left and right channels in a stereo pair through the pattern of silences in a 1kHz tone. Valid8 extended the GLITS principle with more frequencies, to identify eight sets of stereo pairs.

GLITS- An early example being EBU R49-1999 where short gaps are put in the first (Left) channel to indicate to the sound engineer there should also be a second (right) channel. The GLITs tone expanded this concept to having 1 gap in the first channel and two in the second to identify a stereo pair.

BLITS – Black & Lane’s Identification Tones Surround, by Martin Black and Keith Lane of Sky – developed the idea further to accommodate 5.1 channel surround sound, using different frequencies as well as mark/space ratios.

BLITS with identifiers - Staggered voice synthesised idents can be carried in each audio channel as part of the BLITs or GLITs sequence which makes for an easy method for channel identification without requiring training in recognition of these ‘technical’ tone sequences.

For its MatchBox synchronisation system, Hitomi has based its audio identification on BLITS, while still supporting GLITS and legacy implementations of Valid and Valid8.

MatchBox depends upon intelligent sound analysis to identify each audio channel automatically, showing the results graphically. The engineer no longer has to cycle through all the channels and identify the frequencies, gaps and sequences. MatchBox also uses voice synthesised idents as part of the BLITs or GLITs sequence to help the engineer quickly recognise channels.

Lip sync

Because GLITs and BLITs use tone sequences with precise gaps at fixed offsets, MatchBox can use this to identify precise moments in time in the audio. To get even more precision, MatchBox also uses phase and frequency information to identify cohesion and sub-sample phase offsets between all the channels.

While MatchBox is fully compatible with Valid8 GLITs, the preferred line-up tone is BLITs. This follows the general practice in broadcast today, with BLITs specified in many delivery technical standards.

But there is also an underlying technical reason. To extend GLITs to multi-channel audio, Valid8 introduced additional frequencies, some below the 1kHz of the original. BLITs use higher frequencies, which allows us considerably more precision in time marker detection.

This is how MatchBox achieves an audio/video synchronisation to less than 1ms, and a phase measurement between audio channels of 0.01 of an audio sample. Audio phase is critical to audio quality as the comb filter effects are distracting to the average viewer and listener, with the smallest offset between channels ruining the audio experience.

Some technical specifications state a phase coherency limit of 0.2 of a sample: Hitomi recommends 0.1 of a sample. To be able to detect errors an order of magnitude smaller is clearly beneficial.

Keeping track

7.1 channel surround sound is now commonplace, and the Ultra HD standards allow for as many as 64 audio channels to be embedded. Identifying and aligning audio by listening to a succession of tone sequences does not seem like a practical way forward.

The Hitomi MatchBox system can identify 16 channels in a single video source and present a visualisation of the channel order. It also incorporates voice synthesis to assist with human channel identification.

But as audio becomes more complex, line-up became more than just the validation of audio channels. There is so much more one might want to know about an audio feed. Where was it produced? What is the intended destination? What is the status of the channel?

Hitomi MatchBox Glass introduced a data channel within the audio that will survive most downstream processing and conversion. This data channel can then be decoded by a MatchBox Analyser and the information used downstream in production and post production workflows.

At Hitomi, we are working hard to create solutions that easily integrate into modern workflows, and simplify audio/video timing measurement. While integrating with legacy Valid and Valid8 timing tools, Hitomi is taking the concept of the electronic clapper board to the next level with MatchBox Glass.

This means adopting the best standards for audio source marking and identification, and where those standards do not exist creating innovative approaches to solve real broadcast issues.

Comments are closed for this post, but if you have spotted an error or have additional info that you think should be in this post, feel free to contact us.



Get the latest updates in your email box automatically.