Precise Sound Separation
based on Sound Objects
Our technology is a new form of sound decomposition. To its advantages belongs a possibility to depict an audio signal precisely sample by sample in such a way than enables free manipulation of its all internal components. Sound Objects describing the signal inform about the amplitude, frequency and phase in every moment of time. Such signal decomposition allows for its accurate representation, analysis as well as separation and synthesis of its internal components, even incase of signals overlapping in frequency and time.
Modelled after the human hearing organ, our decomposition and analysis of sound signal employs especially designed bank of filters and our patented sound-object-tracking-and-distinguishing system.
Unlike the traditional methods of sound processing such as fast Fourier transform (FFT) and its derivatives (DCT, CQT), our technology engages uniquely designed bank of filters in order to obtain uncommonly high accuracy of signal representation.
Application of the filters
In our solution a digital acoustic signal is sent to the Bank of Filters, which resembles cochlea of the inner ear in its function. The filters are selectively tuned to frequencies from 16,35 Hz (C1) to 22 030 Hz (f7),which is in line with equal temperament system of tuning where pitch is perceived as the logarithm of frequency. Owing to the implementation of 500 filters in full frequency range we profit from a resolution of 4 filters per semitone.
The proces of Sound Objects extraction is something more than computing a spectrogram. The new spectrum created as a result of signal vectorization depicts the frequency relations of Sound Objects change in time, with preservation of the phase continuity. The new spectrogram allows for precise separation of overlapping-in-time frequencies, which differ barely >4% (1 semitone). Moreover, it can be characterized by a unchangeable resolution in all 500 logarithmic frequency ranges.
As an outcome of the previously described bank of filters, there emerges the new sound spectrum which in the sphere of time and frequency depicts the amplitude of the sound components. In contrast to the widely-known “blurred” spectrograms, the solution allows for precise extraction and identification of particular Sound Objects coming from different sound sources. In such a decomposed signal there coincide from several to tens of main components and their partials. Often these elements have similar or even overlapping frequencies. To distinguish between them, the system has to “listen closely” and assign them to a particular group of objects.
By the use of our algorithms we are able to separate particular Sound Objects from the previously attained high-resolution spectrogram and begin to group them according to their source.
Vectorized sound object formation
Sound Objects, due to their vector form, can be unrestrainedly modified (parametrized), both at the level of a singular objects as well as their whole group depicting a particular sound source (i.e. harmonics of an instrument). The sound files saved in the format of sound objects share the characteristics of both original audio (high sound quality and original timbre) as well as midi files (unrestricted possibilities of manipulation and editing of its components). Simple modification of sound object parameters allows to change the timbre of instruments or voices, alter the intonation of speech or the phonemes with their linking, which can be applied in contemporary Human Computer Interfaces (HCI).
What is a Sound Object?
Sound Objects are obtained as a result of sound signal decomposition into its sinusoidal components. Each of the derived sine waves is characterized by a variable frequency, variable amplitude and a firm phase continuity. Sound Objects allow for precise information extraction: depicting particular sound sources (speakers and musical instruments), particular words, tones or ambiences as well as deciphering characteristic features and emotional states of a speaker.
The conversion of an audio signal Sound Objects into parametrical form of Sound Objects allows for automatic selection and elimination of redundant objects with no impact on the quality of the recording, which allows for a significant compression of the file size.
Aim of compression
The main goal of our research and development works is to improve the quality and to expand the array of possibilities of sound processing systems. One of a coexisting consequences of the transformation is data compression. However, in contrast to dominant-on-the-market lossy formats , such as MP3, the resulting compression is variable and dependent of the number of Sound Objects constituting the signal. One second of the wav mono 16 bit 44100 format inscription absorbs 88,2 MB, which provides compression of 42. The compression of the MP3 format equals 15-25.
Decomposition of an audio signal into Sound Objects allows for their automatic grouping according to the known sound sources such as musical instruments, vocals, hums, hisses and other noises.
Aim of separation
Precise parametrization of Sound Objects in terms of frequency, phase, amplitude and precise position in time allows for grouping them together in order to establish their source.
Relations among fundamentals and their timbre depicting harmonics as well as moments in time exhibiting the emergence of sound let us determine the kind and the number of sources (whether it is an instrument, noise or speech or whether it comes from one speaker or more).
Such a precise separation of Sound Objects into undistorted groups gives a chance to perform further analysis of the separated group for the systems that are capable of a clean signal processing but are unable to analyze a complex signal.
SYNTHESIS AND EDITING
The vector form of Sound Object depiction enables their unlimited synthesis and editing. The possibilities derived from such mode of representation vs regular spectrogram can be compared to the versatilities of vector vs bitmap graphics.
Acoustic signal built from Sound Objects exhibits advantageous features of the signals:
WAV according to its high quality of playback
MP3 according to its high level of compression
MIDI according to its high susceptibility for component modification
Encoding the sound in the vector form of Sound Objects enables an unrestrained change of their parameters and what is more allows for cutting, pasting, dividing and separating sound signal components and combining them into another audio file.