The goal is, in a nutshell, to be able to run a couple of open live microphones and instrument inputs, all through plug-ins, and do a full mix, which can be recorded while performing, and also send that stream to Skype. What this gives me is the ability to make a call into a live streaming radio show, processing my voice and instruments as I see fit for the live audio, and get a high-quality multi-track recording that I can play with later, for post-production. I'd like to be able to use this setup for Skype, and I'd also like to use a similar setup for deejaying a live show using a controller -- for Traktor, maybe, or Itch, although I have not worked that all out yet. I could stream the results to UStream, or some other streaming service -- or up to Skype, although Skype's half-duplex nature and lag makes it impossible to use for live jams.
Now here's the basic difficulty I have with a lot of audio applications on MacOS X. CoreAudio is fantastic, and gives me an incredible amount of control -- but that control stops at the individual applications that I want to route together. There is not a "standard" way to choose audio routing, or a standard Apple-provided way to patch audio together. My Ensemble has a lot of outputs and inputs, but Skype doesn't. The settings for choosing audio input to Skype looks like this:
So, I've got to send my processed live audio back into my audio interface's channel 1 input. See this highly sophisticated computer rendering which explains the routing:
This is how I record the Bloodthirsty Vegetarians podcast with Rich Wielgosz. I'm using a condenser microphone plugged into one of the microphone inputs of the Ensemble, with phantom power on. It can't be just any of the inputs though -- it can't be the first input, since I'm going to use that to send the mixed audio to Skype. That microphone is going into a channel strip in Logic Pro. On that channel strip I've got an instance of Izotope Alloy set up using a preset that gives me an "NPR" sound -- a smile curve and some mild compression and gating. What Logic is recording, though, is the raw audio from the microphone. After we're done with the Skype call, I take that raw mic audio file and run it through Izotope RX to remove some faint computer fan or heating system noise, turn it into a FLAC, and upload it to a place where Rich can get it. He then puts it together with _his_ mic audio, lines them up, edits them while keeping both channels in sync, applies whatever compression and gating effects he thinks best, and of course drops in the intro, outro, any sound effects, and the songs we're talking about.
Because of the hardware, I had to order a special short cable to do this: a TRS 1/4" (for the channel 8 analog output) to XLR male. It will actually work to use a short 1/4" TS to TS or TRS to TRS cable, and run it into the "effects return" input for channel 1, which is designed to use as a "return" from an external compressor, but this is not ideal (it loses the balanced connection and does something to the voltage that might change the level; I have to admit I'm not quite nerdy enough at the hardware level to explain exactly what, in terms of dBu or volts. If you can, use the right cable.
Make sure you turn on low-latency mode with a setup like this, so you don't hear noticeable delay in your headphones, and of course if you have monitor speakers on outputs 1-2 you'll want them turned down so you don't get feedback. I've also found that I need to turn the headphone level down to well below my normal listening level in order to avoid headphone bleed from the Skype call. This might normally not be noticeable -- if I'm recording a vocal track to use for a song, while listening to the backing, any bleed will likely be in sync and covered up in the final mix. But when using Skype, there are some delays involved, and if you're not talking over a music bed, your co-host might be audible. If you both have headphone bleed, it will be out of sync due to the Skype latencies -- trust me, it's a mess, so keep your headphone levels as low as you can bear.
So that works, although to me it seems frustratingly inelegant -- these are pretty good analog-to-digital and digital-to-analog converters, but it doesn't seem sensible that I have to give up my channel 1 input and do an extra set of conversions, even if the extra audio is only feeding a fairly low-quality Skype connection. It's also all a little touchy -- there are lots of settings, and sometimes I will have changed them, or they will have changed out from under me; every once in a while, my audio interface goes crazy. So before I record a show -- every single time -- I always have to do a Skype test call, to verify that the audio stream going up to Skype sounds OK. Sometimes it sounds perfectly fine on my end, but the audio input Skype is hearing is full of glitches. In that case what usually works is to just power-cycle my Apogee Ensemble.
There does seem to be a better way, or at least a different way. I've tried solutions with an extra software-only audio bus: Soundflower, or Jack. But Soundflower is broken on MacOS X, and has been for a long time. It seems to have a design issue that can't be fixed with a simple patch. I was able to get Jack working, but it is complicated and fiddly and I don't like the GUI, and it uses CPU. I could add an extra audio interface to my Mac Pro -- like a separate USB-based audio interface. But my Ensemble cost a lot of money, so I'm not really inclined to buy more hardware, that likely sounds worse and still involves those extra D/A and A/D conversion steps. But there is another option. My Mac Pro has built-in digital audio, in the form of TOSLINK S/PDIF I/O. I've long been curious as to whether I could use this input for this sort of thing, and avoid an extra D/A and A/D step. So this past weekend I bought a TOSLINK cable. And it turns out I can make it work -- but with, you guessed it, a few minor issues. Like so:
The cable I got is nothing special -- it wasn't a glass cable, or a super-expensive one. Just make sure it has standard TOSLink optical at both ends, and not the Mini TOSLINK. The Ensemble optical I/O uses the standard-sized connectors. If you're going into a laptop or Mac Mini instead of a Mac Pro, you'll want a cable that has a standard TOSLINK connector at one end and the Mini connector at the other end. There are adapters available, but it is probably best just to get the right cable. Note that these cables only transmit data in one direction -- from an output to an input. If you want to get audio back out of the Mac back into the Ensemble this way -- say, you wanted to record system beeps or something via your Ensemble, you'd need a second cable. I did not bother with that, but I'm sure there might be some interesting possibilities there.
The hard part was figuring out where to find all the relevant settings. Here are the settings that worked for me. First, the Maestro control panel. Note the I/O Allocation must be 10x10 or above; with 8x8, the digital I/O isn't supported at all. The format for the optical out must be set to SPDIF (I think it defaults to giving you ADAT, which won't work).
This is what my Maestro mixer settings look like: first, my mixer is turned off (you can have it on if you want to do some kind of live monitoring of your direct inputs).
The input routing matrix is set up as follows: note that I'm not using any digital inputs on the Ensemble:
The outputs look like this. Take particular note of the way the S/PDIF outs are on the first two optical outs, not the ones labeled SPDIF Out 1/2 -- even though Maestro labels them "SPDIF Coax L/R" on the column of outputs on the left. I had to mess with this a bit to get it to work -- Maestro is slightly buggy when editing these -- and if it isn't right the only clue something is wrong is that it won't work.
When troubleshooting this, note that the little meter in the Sound panel under System Preferences comes in very handy -- it will show you if anything is coming in on the optical S/PDIF. You can leave this open and make some noise and tweak settings:
Here are my Audio MIDI Setup app settings:
And a live setup in Logic:
That "Solo Safe" problem has showed up, which has only just started appearing in my projects and is driving me batty, but never mind that, let's look at the routing. It's similar to the setup above for analog, except that I'm using a second microphone, and routing out to 9/10. I have the output level down a bit but that was just me experimenting. Note it is panned hard left, since Skype only listens to the first channel on its selected input anyway, although this is not strictly needed and changes nothing.
This setup lets me do other interesting things too -- for example, I can add my iSight camera and record a video, where the audio is coming through the Ensemble with all Logic's processing. This adds a slight bit of latency, though; I might have a visible lip-sync problem.
Things that don't work, or don't work reliably:
Note that since I'm doing compression in Logic, I have "Automatically Adjust Microphone Settings" turned off. Apparently Skype may have a tendency to turn it back on, though: check out this blog entry on the subject. This is only an annoyance. It will affect the quality of our Skype call, but not what you're recording.
Editing the routings in Apogee's Maestro app is somewhat flakey. You can easily get it so it starts displaying incorrect or confusing things. If it does this, try resetting the routing and starting over. It also always seems to display "SPDIF Coax L/R" when it is actually routing to the optical outputs.
It seems to me that I ought to be able to set up Logic to route out to "channel 9" only -- just the left side of the digital S/PDIF TOSLink stereo channel. However, if I choose a mono output from my Logic channel strip, it doesn't work at all. It does seem to work if I pan that channel strip hard left. Somehow when I tell it to send out channel 9, this seems to immediately turn the S/PDIF output into an incompatible format and the digital input on my Mac starts ignoring it.
Supposedly it might be possible to decouple the clock rate and format of the S/PDIF connection from the sample rate of my Logic project. I'd like to, for example, record at 24-bit, 96KHz, while sending audio to Skype at 16-bit/44.1KHz. There are some options to do sample-rate conversion (SRC select and SRC rate) and these are described (inadequately) in the Ensemble manual. So far I have not been able to get this to work, even going from 48KHz to 44.1KHz. It seems like the digital audio input is always clocked by the rate of my Logic project and they never "decoupled." Maybe this decoupling only works for "CD Mode" and maybe "CD Mode" only actually works on the coax digital output. I do seem to be able to set the sample size to 16 bit on the digital input side while leaving it at 24 bit on the Logic side, but I don't think that buys me anything useful.
I have not really experimented all that much -- maybe I can just leave my project at 24/96 and the audio that Skype gets will automatically be down-sampled by CoreAudio. The extra processing ought to be pretty insignificant on a Mac Pro, and since Skype audio quality is not fantastic anyway, dither should be pretty irrelevant. The FireWire bandwidth could be an issue, though; it seems to work for me to drop on iSight camera onto the machine with the current FireWire utilization, but 10x10 at 96KHz might use enough that the camera won't work, or the connection to the Ensemble will flake out (you won't get anything as friendly as an error message).
Supposedly it might be possible to send the system audio output and "beeps" to the speakers, and tell Skype to use the Ensemble as its output. That doesn't actually work for me, though. I can tell Skype to use the Ensemble as its "speakers," but it actually always sends incoming audio to the selected system audio output no matter what the menu shows. Also,when I change these settings in Skype, it seems that I always need to quit and then re-launch it before they take effect. (Skype's user interface is something I would charitably describe as "artistic," but this seems like a real bug. My experience attempting to report Skype bugs has caused me to swear off on ever attempting to report another one).
If you figure out any of these issues, or you get a similar setup working for yourself (or fail to), leave a comment. Comments are moderated. Thanks!
Update 03 May 2012
I am continuing to use this setup and it is working very reliably for my podcast recording. I made a change today -- I added a second TOSLink cable going the "opposite" way -- there is no difference between the ends or the direction of the cable itself, but it goes between the output of the Mac Pro (the glowing one) and the input of the Ensemble (the non-glowing one). Then I had to make a change in the Maestro setup, adding an input, very much like the way I set up the output (like the output, it says "Coax" even though the actual configuration uses the optical ports). Then it was just a simple matter to set the system sound output to the optical port and add a channel in Logic -- which I do not want to run to the output going to Skype. This setup actually gives me the ability to send and receive audio via Skype, recording the Skype audio as a separate track if I wanted to, without using any analog I/O on the Ensemble, except of course the stereo output on channels 1 and 2, which are feeding the headphone output. With this setup I could also record at a higher bit rate -- I can set the project to 24-bit, 96KHz, and use Audio MIDI Setup to set the Mac optical output to match. This allows me to record my live sources at 96K for later processing, while CoreAudio seems to know how to do the right sample-rate conversions behind the scenes. Very cool! Now I need to get a couple of different optical cables and try something similar with my Mac Mini.