Recently I have been conducting some rudimentary tests, not really
having a firm idea where I was going with them, but sometimes good
things pop out of the woodwork while you’re actually looking for
something else. Unfortunately, nothing useful popped out of the
woodwork on this occasion, but if nothing else, I thought I’d get a post
out of it… :)
As those of you who read these posts regularly
will know, we are on the brink of releasing a brand new product whose
core competence is to produce high quality PCM versions of DSD files.
The product is done - it’s just the nuts and bolts of the product launch
that we are fiddling with. In developing this product, part of its
optimization has been achieved purely on a mathematical basis, relying
on detailed measurements and models. But, as always, the final product
is fine tuned based upon what we hear, regardless of how it measures.
One thing that we consistently hear is that DSD versions of a track
usually sound better than their PCM counterparts. Occasionally, there
is not much to choose between them, but we rarely seem to get PCM that
sounds better than its DSD counterparts, and I have spent a long time
poring over reasons why that might be. I have some arm-waving ideas,
but they are not yet well-developed enough for prime time, and will need
major development at our end if they are ever going to get there. In
the meantime, I confine myself to looking at the things I am able to
look at and, and see if something interesting just happens to jump out.
Not much has, as yet.
Surprisingly, one of the things I have
not done yet is the null-transform test. This is where you take a
waveform A and transform it into another waveform B. Then, you
transform B back into a copy of A, which I will call A*. It is easy to
show whether A and A* are identical. You simply invert one of them and
add the two together. If the two waveforms are identical, then the
result will be absolute digital silence (a “null transform”). If the
two are not identical, then the resultant waveform will comprise the
differences between the two. Examining, or even just listening to these
waveforms, can often tell you a lot about the nature of the
differences. If A is a WAV file and B is a FLAC file, then the result
should be a true null transform, where A and A* are identical. But if B
is an MP3 file, then the result will most certainly not be null. I set
about initiating a transform where B is a DSD file, just to see what
gives.
I decided to start off with a high-quality 24-bit
44.1kHz PCM file. I then up-sampled it to 24-bit 176.4kHz PCM using the
SoX Linear SRC engine. I did that to ensure that there is no part of
the signal within the frequency range where DSD’s shaped noise floor
begins to rise sharply. That became my reference A file. I ran it
through our ultra-high resolution (-300dB noise floor) FFT analyzer to
make sure it contained no rogue frequency peaks, and sure enough it did
not. I next used Korg Audiogate to create a DSD128 (5.6MHz) DSD B
version. Finally I used DSD Master to create a 24-bit 176.4kHz PCM A*
copy.
Listening to all three copies, A, B, and A*, I was
struck by how similar to one another they all sounded. Frankly, I did
wonder whether I would be able to tell them apart in a blind test,
although some differences did emerge. I felt A had a touch more
‘sparkle’ to it. But the interesting part would be to do the null test
by inverting A* and adding it to A. This presented some tricky
problems. First, if there is any net gain (or loss) in the
transformation, then this will show up massively as a difference in the
null test and we don’t want that. Unfortunately, although Korg’s
Audiogate does have a “gain” setting, this does not (for various
entirely legitimate reasons) necessarily translate to an absolute peak
signal value in the DSD file. And since I cannot be sure what the
signal reference level is in the DSD “B” version, I can’t set a correct
gain setting in DSD Master. So I left that on “Normalize”, which
produces an A* PCM file normalized to 0dB - in other words with the
maximum possible resolution.
Loading the A and A* files
into Audacity to do the null transform, it was a simple matter to invert A*. Then I needed to “normalize” the A file. Finally, I needed to
time align the two files. For various reasons, the transforms had left
the files non-aligned temporally. By looking for sharp peaks in the
music waveform (fortunately I was able to find one quite easily), I was
able to use Audacity’s drag tool to visually align the signals to the
nearest sample. Finally, I nulled the two files together. Immediately,
Audacity showed me that there was a very substantial residual. Looking
at Audacity’s FFT of the null signal, I could see that its 20-20kHz
band had essentially the same shape as the spectrum of the original A
file, but was depressed by about 40-60dB. There was nothing in the
spectrum of the null signal that seemed to be telling me anything
obviously useful.
Playing the null signal using Audacity’s
built-in player, it sounded just like a scratchy version of the original
with the volume turned down, and with the bass largely missing. This
impression was validated playing it on my reference system. These
observations confirmed that the differences were not just in the
ultrasonic noise spectrum added by the conversion stage to DSD, but were
substantially within the audio frequency range, and furthermore were
less evident in the deep bass than elsewhere in the audio spectrum. Far
too early to say, but these were suggestive (to me at any rate) of an
accumulation of phase errors.
I must confess I expected a
better result from the perspective of the qualitative data. The
listening tests showed that A and A* sounded very close to one
another indeed, yet the null signal showed the presence of quite a
substantial difference signal.
We are going to have to repeat
this sometime using our own software. Audacity is great, but we didn’t
write it, and I don’t even know if it is doing exactly what I am
assuming it is at any point in time. Plus, I want to do a more accurate
job of level matching and time alignment before nulling, a job which is
best done by fine tuning the levels and timings to minimize the
magnitude of the null signal. All quite processor-intensive. Also, I
would like to use our own SDM to produce the DSD “B” files so that we
are at least in control of the choice and characteristics of the
filters, but since we don’t yet have our own SDM that aspect remains
tricky.
All this to say that the net result of my null test was
a null result. But I thought it was at least an interesting peg in the
ground. If there is any interest, I may make the resultant files
available for download next time (I couldn’t do that this time because I
don’t have the rights to distribute the files I was using).