Voice/Data Comm 101
 
 
home page general  website information contact me at lamarheller@earthlink.net copyright information
 
Voice over Frame Relay Tutorial

by Ray Horak
CommWeb.com
05/02/01

In order to understand Voice over Frame Relay Fundamentals, it would be a really good idea for you to understand Frame Relay itself. So, the recommended prerequisite for this lesson is the last lesson, Frame Relay Basics. Actually, it wouldn't hurt to read it again before we get started with VoFR.

VoFR is a pretty wacky idea, or so it seemed only a couple of years ago. (Actually, it's still a pretty wacky idea.) As we discussed in Frame Relay Basics, Frame Relay is a protocol for access to a packet network. Frame Relay was designed for LAN-to-LAN internetworking across the WAN.

LAN traffic, by definition, has no expectations of QoS (Quality of Service), and Frame Relay offers none. Neither are there any mechanisms for error control or strong congestion control.

Sure, you can oversize the CIRs across the PVCs to improve the likelihood of smooth performance. Sure, there are some protocols like MPLS (MultiProtocol Label Switching) that can be run inside the network to improve performance. Sure, there is continuing work by the standards bodies on the QoS/GoS issues. Sure, some carriers offer multiple GoS (Grade of Service) levels -- at additional cost. But you can never get away from the fact that Frame Relay is a highly-shared packet data network service designed for traffic associated with applications that are reasonably tolerant of latency and loss.

Frame Relay is a best effort network service. Period. That's OK -- Frame Relay was designed as a tradeoff between cost and performance. That tradeoff is the essence of optimization, and Frame Relay was optimized for LAN-to-LAN internetworking. (Forgive me, please, for repeating myself, but the point is worth repeating.) So far, we've said nothing that would suggest that Frame Relay is even remotely appropriate for voice. Before we discuss VoFR, let's explore the basics of conventional voice networking.

Voice The Conventional Way

We won't get too basic here, but voice is analog in its native form. That's a good thing, since we humans are inherently analog creatures. Around the end of WWII (World War II, for you youngsters), the networks began a transition from analog to digital technology.

Digital offers a lot of advantages, including greater bandwidth, better error performance and enhanced management and control. Contemporary switches of all types are digital in nature, and so is a lot of terminal equipment. Most transmission facilities also are digital, with the notable exception of copper local loops serving residential and small business applications. That makes the WAN virtually 100% digital, at least from edge-to-edge.

Therefore, analog voice has to be coded (i.e., converted) into a digital format at some point after leaving your lips and prior to entering the WAN, and decoded (i.e., reconverted) back into an analog format as or after it exits the WAN, and certainly before it reaches your ear. Those conversion processes are accomplished by a matching pair of codecs, with the conventional method being PCM (Pulse Code Modulation), standardized by the ITU-T as G.711.

PCM specifies that the amplitude (i.e., volume) of the analog signal be sampled 8,000 times a second, at precise intervals of 125ms (microseconds, or millionths of a second), which is exactly 1/8000th of a second. Each sample is coded into an eight-bit digital value. The resulting eight-bit bytes are interleaved by multiplexers, and sent across channelized digital circuits (e.g., T-carrier) to be directed and redirected by circuit switches, sent across circuits (e.g., SONET) that interconnect the switches, and ultimately decoded on the receiving end of the transmission. The decoded signal, now in analog form, is only an approximation of the original analog signal, but it's thoroughly understandable to the human ear.

It's not quite that simple, of course. Timing is critical. The network must be in a position to accept, switch, transport, and deliver every voice byte precisely every 125ms. That means that latency (i.e., delay) must be minimal and jitter (i.e., variability in delay) must be virtually zero. That translates into a network based on circuit-switching and channelized T-carrier and/or SONET.

Taken together, this approach ensures that, once the call is set up, the associated bandwidth is committed for the entire duration of a circuit-switched call, absolutely and without question.

Voice The Frame Relay Way

Now that your memory is refreshed about both traditional voice and Frame Relay, it should be clear that the two have very little in common. Voice speaks to circuit-switches, not frame switches and routers. Voice speaks to channelized, not unchannelized, circuits. Voice speaks to sound bytes and silence bytes created, accepted, multiplexed, switched, transported and delivered at a regular and frequent pace (every 125ms); not to frames that are created whenever there's some data around, and that move through the network whenever it's available.

Voice speaks to committed bandwidth from call setup to call teardown, not to bandwidth that's available whenever it happens to be available. Voice expects perfection, and most data wouldn't even appreciate it. Packet data thrives over a voice network, but voice trembles at the thought of traveling over a packet data network. VoFR? It just won't work!

Actually, it will work, and quite nicely -- if everything works just right. The trick is to keep latency and jitter within limits. While that can be tough in a network that is built around applications that can tolerate both, it's possible to pull a trick or two within the network.

One approach is to Increase the CIR (Committed Information Rate) over each PVC (Permanent Virtual Circuit) that will carry voice traffic and not mark the voice traffic as DE (Discard Eligible). Another approach is to set up special PVCs just for voice. Yet another approach is to work with a carrier that offers PVC of different levels of delay/priority. Some carriers offer as many as three PVC levels:

1. Top priority for delay-sensitive traffic (e.g., voice and SDLC).

2. No priority for traffic that can tolerate some level of delay (e.g., LAN traffic).

3. Low priority for applications that can tolerate significant levels of delay (e.g., Internet access and e-mail).

This approach makes use of some priority queuing mechanism or route selection mechanism such as MPLS (MultiProtocol Label Switching). Any of these approaches can work, but it's all on a best effort basis. There are no guarantees in the Frame Relay world.

Whether any of these approaches are used or not, the real key to sending voice over any packet data network is compression -- and VoFR is no exception. Compression offers several advantages, one of which is the reduction of raw bandwidth required to support the information transfer.

If we follow the trail of a voice signal over a typical digital network, we will see that it begins life as an analog signal, which is converted by a codec (coder/decoder) into a PCM format. VoFR makes use of codecs that not only convert the analog signal into a digital format, but that also compress the signal, thereby requiring less than the 64 Kbps demanded by PCM.

The standard approaches (there also are proprietary techniques) specified by the Frame Relay Forum in its FRF.11 are ADPCM and CS-ACELP, which I'll briefly explain below:

  • ADPCM (Adaptive Differential Pulse Code Modulation), a technique also used in some PSTN networks, offers voice coding at 40, 32, 24 and 16 Kbps. At the most common implementation rate of 32 Kbps, the compression rate is 2:1 (2 to 1), which exactly halves the bandwidth required for voice. ADPCM imposes delay of 1.0ms, which also is imperceptible. At 32 Kbps, ADPCM yields voice quality that is very close to that of PCM. At the higher compression rates of 24 Kbps and 16 Kbps, there is a corresponding drop in quality.
  • CS-ACELP (Conjugate Structure-Algebraic Code Excited Linear Prediction) runs at 8 Kbps, a compression rate of 8:1. There are two versions, which vary in terms of computational complexity, either of which offers raw voice quality that is similar to that of ADPCM at 32 Kbps. Compression delay, however, is in the range of 10.0ms, which is decidedly perceptible.
  • LD-CELP (Low Delay-Code Excited Linear Prediction) runs at 16 Kbps, a compression rate of 4:1. Compression delay is 3.0ms-5.0ms, and raw voice quality is similar to that of ADPCM at 32 Kbps.

Regardless of the compression technique employed, the VoFR process is basically the same, although the specifics vary a bit. Let's use CS-ACELP as an example.

On the transmit side, 80 PCM voice samples, representing 10ms of a voice stream, are gathered together to form a set of voice data comprising 640 bits. That data set is run through the CS-ACELP compression algorithm by a codec embedded in a FRAD (Frame Relay Access Device), and reduced to 160 bits to be transmitted.

This is accomplished by consulting a code book that provides an abbreviated representation of an approximation of the data. The result is packed inside a Frame Relay frame, with an appropriate DLCI, not marked DE, and is presented to the network.

If multiple voice conversations are to be supported between the same two sites on the enterprise Frame Relay network, VoFR includes a sub-channel provision that allows multiple compressed voice data sets to be packed into a given frame. On the receiving end, the process is reversed, and all is well -- at least as far as compression and decompression are concerned.

The complication, of course, comes from the fact that the frames are not delivered by the network at the same pace that they entered it. Additionally, some frames may be lost in transit. Remember that Frame Relay is a highly-shared packet network characterized by unpredictable levels of congestion. Latency is guaranteed, as is variability in delay (i.e., jitter).

Loss isn't guaranteed, but it's highly likely, over time. VoFR adjusts to latency, jitter and loss through various intelligent continuity algorithms employed by the receiving codec. These fill in the voids by stretching the voice frames received earlier and blending them with those received later. This logic is embedded in predictive decompression algorithms such as CS-ACELP and LD-CELP, which take advantage of the 10ms delay built into the compression/decompression processes to make the necessary predictions and do their stretching and blending. VoFR also makes use of various techniques for echo cancellation, as echo becomes perceptible when delay exceeds 15ms-20ms.

The actual end-to-end process is a little more involved, of course. In an enterprise-wide VoFR application, both the calling and called parties sit behind a PBX.

The caller in San Francisco, for example, picks up the phone and dials the extension number of a co-worker in New York. The PBX checks its options for routing the call, courtesy of its LCR (Least Cost Routing) software. Over a special link, the PBX consults with the FRAD to determine the availability of the Frame Relay network.

If the FRAD is unaware of any significant congestion in the network and if bandwidth is available within the CIR, it accepts the call, and the PBX sends the voice traffic to the FRAD in PCM format. The FRAD compresses the voice data, packs it into a frame every 10ms, and off we go.

If network congestion levels remain low during the course of the call, conversation quality remains pretty good, but never as good as it is over the good old PSTN. If network congestion levels increase, so do latency and jitter, and frame loss may result. Voice quality suffers, and fond memories of the PSTN haunt the balance of the conversation.

So What...and Why?

So, voice data can be compressed in order to use shared bandwidth more efficiently, and this can be done with little loss in voice quality, if everything goes just right. Further, the decompression process can be sophisticated enough to smooth out some of the problems associated with latency, jitter and loss of voice data over a packet data network -- within limits.

At this point, you have to ask yourself why in the world you would go to all of this trouble and expense (Yes, there is additional expense for things like special codecs, PBX-to-FRAD links, increased CIRs and special voice PVCs.) to run voice over a packet data network when the end result is uncertain quality that will never be as good as the PSTN, and which can be terrible when the network suffers congestion.

There is only one answer, and that is cost. VoFR is free, or at least cheap in comparison to the PSTN. Domestic VoFR calls effectively are free, assuming that the Frame Relay network is already in place for other purposes, that bandwidth within the CIR is available at the moment, that the CIR hasn't been increased in consideration of VoFR, and that special VoFR PVCs haven't been set up.

There is no usage-sensitive element to the Frame Relay pricing algorithm. Rather, Frame Relay generally is priced on the basis of port speeds, PVCs and CIRs. However, you still have to wonder if VoFR is worth the trouble in the face of PSTN voice calling at rates in the range of $0.04-$0.06 per minute.

However, the cost issue takes on real significance in a multinational enterprise. Calls to Japan or South Africa, for example, may well be in the range $0.50-$0.60 per minute. VoFR starts to look real good at these prices.

In closing, I have to say that I'm not trying to pick a fight with the proponents of VoFR. I'm just calling it the way I see it. I may lose a few friends over this column, but my Mother, who loves me unconditionally, will be proud of me for telling the truth.