Usb and Audio
USB is a very practical channel for transferring audio from a computer to a DAC or other hi-fi component. USB pots are available on every computer which one can buy nowadays and it’s a well-established and robust standard. Alas, widespread feeling is that the quality of audio transferred on USB is quite poor compared to even the worst S/PDIF connection. Also, many users experience a performance cap when using the USB connection of their DAC, compared to the S/PDIF inputs of the same equipment. This is very often true, but it doesn’t depend on the USB standard itself, rather on the way audio is transferred on an USB bus. To better understand what happens, a little base knowledge may be useful. The USB standard allows for various data transfer modes between the host (that is, the computer) and the device (that is the DAC or D/D interface). Some of them are not practically usable in audio, while others are. The simplest mode is the interrupt mode, used by devices which need to communicate low amounts of data to the host from time to time. Think of a mouse, for example: this device only needs to send a few bytes every 50-100ms, with very long idle periods. So, there’s no need for this device to allocate bandwidth, nor to use very high transfer speed. What happens is that every time the mouse needs to send its bytes to the host, it requires the host attention by an interrupt, then the host requires the data and the mouse send them. That’s all. This mode, of course is not suitable for audio, because of the long latency and the low effective bandwidth available. Things get better with isochronous mode. This mode has been studied for applications in which a steady stream of data must be sent from the host to the device. The USB controller allocates a suitable bandwidth to the device which receives the data, ensuring that the average data rate is same as the specific need. The advantage of this transfer mode are numerous: guaranteed bandwidth, low latency and a CRC mechanism for error detection. Also, a standard for audio transfer in isochronous mode is available, called USB Audio 1.1, which relies on standard drivers included with all operating systems. This means plug’n’play capability for audio devices which comply with this standard. Alas, also its limits are relevant: first of all, even if it works at full speed and high speed, the device IC’s specifically studied for audio streaming only work in full speed (12Mbps), allowing for no more than 96kHz sampling rate transfer. Moreover, the CRC error detection mechanism is hardly useful, as the device has not enough bandwidth to solicitate the host for wrong data re-transmission. But the worst limit in isochronous mode is the short term latency unstability which leads to jitter in the transferred signal. This may be highly reduced by using adaptive transfer control. But the speed transfer limits remain. An alternative to isochronous mode is the bulk mode, when the host may use almost all the available bandwidth to transfer data to the device. This means up to nearly 480Mbps on a 2.0 compliant port. This is quite enough for high resolution audio, far beyond the 192kHz limit. The bulk mode is difficult to use as there’s no audio USB interface IC which can use it. The only way is to use a generic USB 2.0 device interface IC and program it to handle a streaming connection using bulk mode. This also required specific drivers. The development effort is relevant and the use of a device based on bulk transfer is not very easy, but the results are outstanding. In bulk mode, there’s no timing control by the user: it only sends data when it wants or when the device require them. This leaves to the developer freedom to handle the transfer timing by the device side, rather than by the host side as happens with the isochronous mode. The device can be provided with very precise clocks and, being the master in the transfer, virtually isolates the latency variability from the output timing. An interface using bulk mode with a precise local clock may achieve extremely low jitter. Another problem related to USB is the fact that the device is often bundled to the host’s ground potential. Also, it often uses the USB 5V supply, which is also very poorly regulated and carries a lot of digital noise. One thing must be understood, anyway. The bad quality of the “USB sound” cannot be due to the poor quality of the USB 5V supply itself. Somebody says that the jitter on an USB transfer is due to the noise on the bus and that, for this reason, an USB audio device will never give high performance. That’s not true. We can transfer audio data on an USB bus in bulk mode obtaining a very low jitter, regardless of the quality of the USB port supply. The ground noise is another story: decoupling the USB interface ground form the device ground (by means of magnetic or optical isolators) is a good practice. M2Tech devices all work in bulk mode and all have outputs which are galvanically isolated from the host ground. The bulk mode allows for data transfer rates up to 192kHz (hiFace and hiFace Evo) or even 384kHz (Young).
A player is a program which runs on a computer and allows for playback of music files through an audio device. Several different file formats exist, and some players cannot play all of them. Also players with different performances exist. Some players are free, others can be purchased. Thus, users should pay attention to various features when choosing a player. Following is a list of players which can be used under Windows, MacOS and Linux. The list is not exhaustive, nor the order of listing is related to the players quality in any way. We’ll appreciate feedbacks by users and developers to integrate the list and correct mistakes we’ll eventually make in the descriptions.
Windows Media Player
This is the standard player in Windows. It reads various formats and is very easy to use, but can only work in direct sound mode, which is not the best for quality. It comes for free with the operating system.
It’s a free player for Windows. It can operate in direct sound mode and in kernel streaming mode; moreover, it has compatibility with ASIO and WASAPI. It’s very easy to use. 384kHz supported in kernel streaming mode. www.foobar2000.org.
Another free player which is widespread. It may operate both in direct sound mode and in kernel streaming mode installing a plug-in written by Steve Monks (http://www.stevemonks.com/ksplugin/). Support for 192kHz. It can be downloaded from www.winamp.com.
JRiver Media Center
This is a non-free player which is highly appreciated by Windows users. It’s been developed for the best quality, but it also includes a very user-friendly interface for contents management. Support for 384kHz. www.jriver.com.
This player has been explicitly developed in Netherland for highest audio performances. www.xxhighend.nl.
Another free player which is actually built upon Winamp, with a different skin. They are so similar that the same Steve Monks plug-in for kernel streaming can be used for both players. www.mediamonkey.com.
This is a very basic open source player which is available for Windows, MacOS and Linux. www.videolan.org/vlc/.
It is the built-in player for Mac. It’s operation is restricted to Mac audio formats like AAC. It’s user interface is very user friendly and many users choose it for the ease of use. A version for Windows also exists. www.apple.com/itunes.
The most famous player for MacOS. Available in three different versions, the full-feature package allows for 384kHz capability. It requires an iLock key to operate. Amongst the interesting features, the automatic output sampling frequency switching. It may be purchased on www.sonicstudio.com/amarra/index.html.
This is a plug-in for iTunes which supports 384kHz. Less expensive than Amarra, it also has the automatic sampling frequency switching feature. Go to www.channld.com/puremusic/ to purchase it.
Another player for Mac with automatic sampling frequency switching feature. It may be used as a plug-in for iTunes, keeping the latter’s user interface while playing files with the former. It may be purchased on sbooth.org/Decibel/#db_extras.
It’s an free open source player for MacOS. It supports a long list of formats, so it’s very useful for Mac users who can’t realy on iTunes for decoding non-Apple formats. cogx.org.
It’s a player/content manager similar to iTunes, available for both Windows and Mac, with higher sound quality and versatility. The player is free and the company also offer a very complete download platform for music. www.getsongbird.com.
Both Windows and MacOs allow for different playback modes of music files. We can generally classify the various playback modes in two categories: “user friendly” modes and “high quality” modes. The former category relates to modes conceived to simplify the user’s life. These modes heavily rely on the operating system’s kernel structures to virtualize the relationship between the source (that is, the player) and the recipient (the audio card) of the audio samples.
The latter relates to modes which try to keep the audio samples as pure as possible, in order to preserve the sound quality of the original file.
Let’s take a look at the various modes in both Microsoft and MacIntosh environments.
Windows: direct sound
It’s the basic playback mode for Windows. It relies on the kernel mixer, a piece of the OS kernel which acts as an interface between the various sound sources in a PC (PC sound, MIDI, players, etc.) and the sound card’s driver. As the various sources can produce sounds at different sampling frequencies, the kernel mixer is programmed to resample the various audio streams to a single, unique sampling frequency to allow for mixing them together before sending them to the audio card. Alas, this unique sampling frequency is generally a low one, typically 44.1kHz. Moreover, the kernel mixer allows for various processing (dynamic compression, tone control, loudness control, spatialiazing…) which are not all clearly visible to the user, nor excludable.
For this reason, direct sound mode should be avoided when searching for the best sound quality: in fact, it’s not bit-perfect.
Windows: kernel streaming
Kernel streaming is the most “audiophile” playback mode available in Windows. When a kernel streaming compatible player deals with a kernel streaming compatible device (and driver), data go from the player directly to the driver, by means of a memory buffer which is written by the player and read by the driver. This way, the kernel mixer is totally bypassed and no processing is performed on the data, except that done by the player itself. The kernel streaming mode has another advantage: it requires very few CPU time, so it allows every PC, even the less powerful one, to handle high resolution files without hiccups. Kernel streaming is inherently bit-perfect.
WASAPI (Windows Audio Standard API) is a application protocol interface developed for Windows Vista and Seven (it doesn’t exist for Windows XP) which allows for bypassing the kernel mixer without the need for a kernel streaming player or driver. The sound quality that can be obtained using WASAPI is comparable to that obtained in kernel streaming mode. One difference is that WASAPI operates in floating point format, so two conversion are needed (integer-to-floating from the player to WASAPI and floating-to-integer from WASAPI to the driver). This means that WASAPI loads the CPU a little more than plain kernel streaming. On the other hand, WASAPI allows for a wider choice of players. WASAPI is bit-perfect, provided conversion are correctly done.
Windows: ASIO and ASIO4ALL
ASIO stands for Audio Stream Input/Output and is another kernel mixer bypass protocol. As for WASAPI, it allow for bypassing the kernel mixer. Even if it is available for all latest Windows OSes releases (XP to Seven), it was mainly thought for Windows XP. ASIO is particularly useful to overcome the kernel mixer limitations with USB Audio 1.1 devices. In fact, these devices can only operate up to 48kHz/16bit in direct sound mode, while they can generally reach 96kHz/24bits in ASIO mode. To use ASIO, the player needs to be compatible with this standard. ASIO is bit-perfect.
ASIO4ALL is a software component to obtain bit-perfect playback with generic WDM devices, that is audio cards provided with a WDM driver. It may be though of as an universal ASIO driver for WDM devices. It provides ASIO operability to devices not ASIO- compatible.
MacOS: Core Audio
Conceptually, Core Audio is the Mac equivalent for the Windows kernel mixer, even if it’s actually closer to WASAPI in the way it works. The main difference is that it may be bit-perfect whereas the kernel mixer generally changes data somehow. The main feature of the Core Audio is that is works in floating point format, so data format conversions are necessary, on both player and driver side.
MacOs: “hog” mode
One of the main drawbacks of the normal Core Audio operation is that it allows for more than one client (player or sound generator) to access a device driver at the same time. That’s obtained at the cost of a detriment of the sound quality. Moreover, this way only one audio card at a time can be used, no matter if two or more players sources are running. To avoid this, an exclusive driver access mode, called “hog” mode, has been developed. When accessed in hog mode by a player, a driver refuses service to other players or sound generators, thus giving all its time to the first player which has gained the access to the device. This way, one or more devices can operate at the same time with different players when all of them (except one) are operating in hog mode.
Digital Audio Connections
Digital Audio Connections
Digital audio equipment are provided with digital inputs and/or outputs to transfer data, the way an amplifier is provided with analog inputs and/or outputs.
Digital connections are very different form analog ones and cannot be mixed. That is, an analog input cannot receive the signal from a digital output and vice versa. D/A and A/D converters are the means to connect, respectively, a digital source to an analog amplifier or an analog source to a digital amplifier
A digital channel usually brings the information (samples) for two audio channels on one single wire. This is obtained by time-multiplexing the digital stream: first one sample for the left channel, then one sample for the right channel and so on. This explains why we only need one cable to carry two digital channel (there is an exception we’ll discuss below), while we need two cables to carry two analog channels.
One thing must be understood. A digital connection is classified by the physical format of the data being transferred and by the logical format of the same data. In the ISO/OSI stack, the former is often indicated as “physical layer”, while the latter is the “data link layer”. As an example form our daily life, the physical layer stands to our voice as the data link layer stand to the language we speak.
Differently from human beings, who use a single physical layer (the voice) and many data link layers (the languages), digital connections use several different (even if often similar) physical layers and mainly only two data link layers (which we can describe as dialects of the same base language). The “language” is the IEC60958, its “dialects” are the consumer mode and the professional mode, which differ in some ancillary information details.
The “voices” are S/PDIF, AES/EBU, Toslink, and ST.
Let’s talk about the IEC60958, first, then about the various connections types on which the IEC60958 streams travel..
The logical format
IEC60958 describes a time-multiplexed logical channel in which many channels are transferred one after the other in time slots which stay inside a sampling frequency period. That is, if we have two channels (as is generally the case for stereo audio), every channel has a slot which lasts one half the sampling frequency period. This way, every sampling frequency period we have one sample for the left channel and one sample for the right channel.
32 bits are transferred per channel. The time slot for one channel is divided in fields. 20 bits are for the data. 4 more bits are used to extend the data field up to 24 bits.One thing must understood clearly: it is not possible to transfer more than 24 audio bits per channel with this standard. D/A converters which claim to be able to convert 32 bits using S/PDIF connections only, are not true 32bit devices: they probably use 32 bits internally, for computation, but the real resolution is limited to that of the incoming data, that is 24 bits.
What about the remaining 8 bits per channel? 4 bits are the frame start markers, called “preambles”. Preambles are often used by S/PDIF receivers to synchronize their PLL to recover clocks from the S/PDIF stream. The remaining 4 are “service” bits: between them one parity bits to recognize transmission errors, a user bit which can be used to carry auxiliary information (for example, a narrowband voice channel for communications) and the so-called channel status bit. Channel status bits a collected in 24 bytes. The content of these 24 bytes give information about the audio samples being transferred: for example, the resolution (that is the number of meaningful bits in the audio and aux field), the sampling frequency, the source type (CD, recorder, radio, mixer, etc.) and so on. It is important to know that the difference between the consumer mode and the professional mode only relates to the meaning of the channel status bytes, not to the physical format chosen for a given transmission.
The physical formats
Now, let’s talk about the various physical standards. The most used in consumer audio is probably the S/PDIF (Sony/Philips Digital Interchange Format). The data bits are used to modulate the bit clock using the biphase-mark numerical modulation. The sampling frequency is used to locate the preambles in the stream. The signal has a voltage of 500mVpp on a loaded line and the line impedance is 75 Ohms. Please note that when measuring the open output voltage, 1Vpp is generally measured, as is typical for matched transmission lines. Two connectors types are used: RCA and BNC, both on coaxial cable. One thing must be noted: Except a WBT model (NextGen), no 75 Ohms RCA connectors exist, so it’s not possible to obtain a true 75 Ohms matched connection with RCA’s. On the contrary, BNC’s come with certified 75 Ohms impedance, so a true matched connection is possible using BNC’s. S/PDIF generally works up to 192kHz. Alloed distance is up to 5m.
S/PDIF actually descends from a professional connection standard called AES/EBU. AES/EBU requires a 110 Ohms balanced line with XLR connectors and at least 2.0Vpp on a loaded line. As said before, the voltage measured on an open line will be twice the standard value. Many think that, being AES/EBU mainly used in professional setups (studios, live stages), it always transports professional mode streams. That’s not a rule: there are many consumer equipment which have AES/EBU ports in which consumer mode data streams are sourced or received.
Another flavour for AES/EBU requires a 75 Ohms BNC connector instead of a 110 Ohms XLR. As said before, in digital domain two channels are carried on a single wire. Nevertheless, AES/EBU also gives the opportunity to carry one single channel on one wire, thus using two wires to carry a stereo stream. This is generally referred to as AES3 or dual AES. This choice allow for reducing the data throughput for the single wire by half, thus using, for example, two 96kHz connections in place of a 192kHz one. AES/EBU, like S/PDIF, generally works up to 192kHz. Thanks to the balanced line and the high output level, an AES/EBU output port can support long cable runs (several meters).
Toslink (Toshiba Link) is an optical interface based on optical fiber. Its main advantage is that allows for galvanic isolation between source and destination. It’s operation is generally limited to 96kHz,even if there are a few units around with high-performance Toslink interfaces able to operate up to 192kHz. The main drawback of the Toslink is the high inherent jitter compared to the best S/PDIF connections. For this reason, most users try not to use it, unless the source or the receiver only have Toslink ports. The standard allows for 5m fiber length.
ST, also known as AT&T, is a very high performance optical connection mainly used for telecommunications. Some high-end equipment have ST ports, but there’s not a real standard about this connection (the signal levels and bandwidth depend on the driver and receiver circuits). The good thing is that the bandwidth largely exceeds the need for 192kHz and distances up to 1600m can be driven. Due to the use of a high quality optical fiber and very high performance transmitters and receivers, the jitter on an ST connection may be very low, better than that of a S/PDIF connection.
HDMI is being used to transport audio data even on audio-only devices. The performance of an HDMI connection is very high end exceeds the needs for a two-channel link.
I2S is an interface standard which was conceived to connect integrated circuits on a PCB board inside a piece of equipment. Due to the fact that data and clocks are transferred on separate wires, the intrinsic jitter of an I2S connection is very low. That’s why some manufacturers and many DIY’ers are using it to connect a digital source to a DAC. Of course, we’re here talking about a logical format (a “language”) so designers are free to choose the physical format they consider more suitable to accommodate this standard. Some use LVDS, other unbalanced lines… It must be understood that no real physical standard exist for the I2S.
Which is the best?
Hard to tell, because the quality of a connection depends on the transmitter and receiver quality, and also on the cable being used.
For sure, we can say that Toslink is the less adequate to high-end performance, because of it’s frequency limits and high jitter.
AES/EBU is good when long cable runs need to be driven, but the physical characteristics of the balanced cable (narrow band, large group delay) make it less than perfect for a low-jitter, wideband transmission.
S/PDIF, with its coaxial cable, particularly with BNC connector, is THE choice for the short distances typical of consumer hi-fi.
ST is the best for sure, provided both source and destination are provided with ST ports with same specifications.
HDMI and I2S are actually an alternative for each other and should be chosen wherever available. It must be noted, however, that a straight comparison with the other standard is not fair, as these latter standards use separate wires for data and clocks, whereas the former ones have one single wire on which data and clocks travel together.
Drivers are low level programs which put a hardware device in communication with the software which uses it through the operating system.
Driver generally rely heavily on operating system’s kernel structures which are conceived to simplify the programmer’s job and to ensure the fulfilment of standards’ requirements. This leads to two consequences: the load on the CPU is sometimes heavy, thus the requirement to the hardware computational power relevant, and the programmer is not free to develop the driver in a way that maximizes certain performance. On the other hand, devices using standard drivers are often plug and play: users don’t need to install any specific driver as the operating system of their computer is already provided with the standard one.
The case of USB audio is exemplary. The standard USB Audio drivers rely on the kernel mixer in Windows and on the Core Audio in MacOs. Both are pieces of kernel which manage the various audio streams from different sources: the MIDI synthesizer, players, external inputs. To harmonize the various streams, sampling rate conversions and resolution truncations are often performed, to sound quality detriment.
Moreover, the compliance to a certain standard keeps the programmer from overcoming the standard’s limits (as per sampling frequency, resolution and preservation of the data contents), thus the sound coming out of a device using standard drivers is often below the user’s expectations. Conversely, users don’t bother with driver installation as the USB Audio 1.1 standard drivers are already included in the OS package and installed with the operating system itself.
M2Tech had developed devices and proprietary drivers which purposely do not comply with any standard, this giving the programmer total freedom. The result is a driver which operates with a minimum CPU load, ensuring continuity in playback even with not-so-powerful computers, giving higher audio performance than standard drivers (up to 384kHz/32bit with the Young).
In Italy we say: if nice you want to appear, a little you must suffer. In this case, high sound quality comes at the cost of the need for custom driver installation. This is not uncommon: many printers, video cards and other devices also require installation.
Installation is not so difficult for any user who is acquainted with his/her computer. The procedure is similar for Windows-based computers and MacIntosh ones. First, the driver package must be downloaded from M2Tech website. The drivers can be found in the page of the product being used, as well as in the Download/All section. The package is contained in a compressed (zip or dmg) file. The first operation is to extract the file from the zip or from the dmg (this is slightly different depending on the operating system). Windows users should pay attention: double-clicking on the zip filder doesn’t mean extracting the files: the windows which opens is just a peek on the zip file contents. You may launch the setup program from it, but it won’t work as the files it searches are still compressed. Files needs to be extracted to a temporary folder, first. Then, the setup program can be launched.
After launching the setup program (two different setup programs are provided for Windows 32 and 64 bits versions), the procedure is quite simple and the installation program requires a couple of minutes. After installation completion, the device can be connected to the computer for the first configuration.
Some things must be kept in mind. First, every operating system has its driver: it’s not to possible to use the Windows 7 32 bits driver for a PC running Windows XP, nor it’s possible to install a 10.6 driver on a Mac provided with Tiger or Leopard OSX.
Second, devices made to use proprietary drivers cannot operate without driver, nor can they work with standard driver: at USB negotiation, the computer is told by the device about which driver to use. If that specific driver is not found, the computer “give up” the negotiation and advices the user about an unsuccessful USB installation.
Considering the computational power o today’s computers and the availability of solutions like WASAPI and ASIO which allow for bypassing the kernel, why bothering with a proprietary driver?
It’s a matter of performance. Our proprietary driver can operate in kernel streaming mode under Windows, a mode in which the kernel mixer is natively bypassed and the CPU load is kept very low. Think of a small netbook or a MacMini playing 384kHz files to our Young DAC… Even more powerful computers cannot operate at 384kHz with standard drivers (so far, at least…).
Another reason is that standard drivers are out of control of our programmers. With proprietary drivers, we can continuously improve their performance.
M2Tech drivers for hiFace and hiFace Evo are presently available for the following operating systems: Windows XP, Windows Vista 32 & 64 bits, Windows 7 32 & 64 bits, Tiger, Leopard, Snow Leopard.
M2Tech drivers for Young are presently available for the following operating systems: Windows XP, Windows Vista 32 & 64 bits, Windows 7 32 & 64 bits, Snow Leopard and GNU/Linux.