Voice over Frame Relay Tutorial
by Ray Horak
CommWeb.com
05/02/01
In order to understand Voice over Frame Relay Fundamentals, it
would be a really good idea for you to understand Frame Relay
itself. So, the recommended prerequisite for this lesson is the
last lesson, Frame
Relay Basics. Actually, it wouldn't hurt to read
it again before we get started with VoFR.
VoFR is a pretty wacky idea, or so it seemed only a couple of
years ago. (Actually, it's still a pretty wacky idea.) As we discussed
in Frame
Relay Basics, Frame Relay is a protocol for access
to a packet network. Frame Relay was designed for LAN-to-LAN internetworking
across the WAN.
LAN traffic, by definition, has no expectations of QoS (Quality
of Service), and Frame Relay offers none. Neither are there any
mechanisms for error control or strong congestion control.
Sure, you can oversize the CIRs across the PVCs to improve the
likelihood of smooth performance. Sure, there are some protocols
like MPLS (MultiProtocol Label Switching) that can be run inside
the network to improve performance. Sure, there is continuing
work by the standards bodies on the QoS/GoS issues. Sure, some
carriers offer multiple GoS (Grade of Service) levels -- at additional
cost. But you can never get away from the fact that Frame Relay
is a highly-shared packet data network service designed for traffic
associated with applications that are reasonably tolerant of latency
and loss.
Frame Relay is a best effort network service. Period. That's
OK -- Frame Relay was designed as a tradeoff between cost and
performance. That tradeoff is the essence of optimization, and
Frame Relay was optimized for LAN-to-LAN internetworking. (Forgive
me, please, for repeating myself, but the point is worth repeating.)
So far, we've said nothing that would suggest that Frame Relay
is even remotely appropriate for voice. Before we discuss VoFR,
let's explore the basics of conventional voice networking.
Voice The Conventional
Way
We won't get too basic here, but voice is analog in its native
form. That's a good thing, since we humans are inherently analog
creatures. Around the end of WWII (World War II, for you youngsters),
the networks began a transition from analog to digital technology.
Digital offers a lot of advantages, including greater bandwidth,
better error performance and enhanced management and control.
Contemporary switches of all types are digital in nature, and
so is a lot of terminal equipment. Most transmission facilities
also are digital, with the notable exception of copper local loops
serving residential and small business applications. That makes
the WAN virtually 100% digital, at least from edge-to-edge.
Therefore, analog voice has to be coded (i.e., converted) into
a digital format at some point after leaving your lips and prior
to entering the WAN, and decoded (i.e., reconverted) back into
an analog format as or after it exits the WAN, and certainly before
it reaches your ear. Those conversion processes are accomplished
by a matching pair of codecs, with the conventional method being
PCM
(Pulse Code Modulation), standardized by the ITU-T
as G.711.
PCM specifies that the amplitude (i.e., volume) of the analog
signal be sampled 8,000 times a second, at precise intervals of
125ms (microseconds, or millionths of a second), which is exactly
1/8000th of a second. Each sample is coded into an eight-bit digital
value. The resulting eight-bit bytes are interleaved by multiplexers,
and sent across channelized digital circuits (e.g., T-carrier)
to be directed and redirected by circuit switches, sent across
circuits (e.g., SONET) that interconnect the switches, and ultimately
decoded on the receiving end of the transmission. The decoded
signal, now in analog form, is only an approximation of the original
analog signal, but it's thoroughly understandable to the human
ear.
It's not quite that simple, of course. Timing is critical. The
network must be in a position to accept, switch, transport, and
deliver every voice byte precisely every 125ms. That means that
latency (i.e., delay) must be minimal and jitter (i.e., variability
in delay) must be virtually zero. That translates into a network
based on circuit-switching and channelized T-carrier and/or SONET.
Taken together, this approach ensures that, once the call is
set up, the associated bandwidth is committed for the entire duration
of a circuit-switched call, absolutely and without question.
Voice The Frame Relay
Way
Now that your memory is refreshed about both traditional voice
and Frame Relay, it should be clear that the two have very little
in common. Voice speaks to circuit-switches, not frame switches
and routers. Voice speaks to channelized, not unchannelized, circuits.
Voice speaks to sound bytes and silence bytes created, accepted,
multiplexed, switched, transported and delivered at a regular
and frequent pace (every 125ms); not to frames that are created
whenever there's some data around, and that move through the network
whenever it's available.
Voice speaks to committed bandwidth from call setup to call teardown,
not to bandwidth that's available whenever it happens to be available.
Voice expects perfection, and most data wouldn't even appreciate
it. Packet data thrives over a voice network, but voice trembles
at the thought of traveling over a packet data network. VoFR?
It just won't work!
Actually, it will work, and quite nicely -- if everything works
just right. The trick is to keep latency and jitter within limits.
While that can be tough in a network that is built around applications
that can tolerate both, it's possible to pull a trick or two within
the network.
One approach is to Increase the CIR
(Committed Information Rate) over each PVC
(Permanent Virtual Circuit) that will carry voice traffic and
not mark the voice traffic as DE (Discard Eligible). Another approach
is to set up special PVCs just for voice. Yet another approach
is to work with a carrier that offers PVC of different levels
of delay/priority. Some carriers offer as many as three PVC levels:
1. Top priority for delay-sensitive traffic (e.g., voice and
SDLC).
2. No priority for traffic that can tolerate some level of
delay (e.g., LAN traffic).
3. Low priority for applications that can tolerate significant
levels of delay (e.g., Internet access and e-mail).
This approach makes use of some priority queuing mechanism or
route selection mechanism such as MPLS
(MultiProtocol Label Switching). Any of these approaches can work,
but it's all on a best effort basis. There are no guarantees in
the Frame Relay world.
Whether any of these approaches are used or not, the real key
to sending voice over any packet data network is compression --
and VoFR is no exception. Compression offers several advantages,
one of which is the reduction of raw bandwidth required to support
the information transfer.
If we follow the trail of a voice signal over a typical digital
network, we will see that it begins life as an analog signal,
which is converted by a codec (coder/decoder) into a PCM format.
VoFR makes use of codecs that not only convert the analog signal
into a digital format, but that also compress the signal, thereby
requiring less than the 64 Kbps demanded by PCM.
The standard approaches (there also are proprietary techniques)
specified by the Frame Relay Forum in its FRF.11 are ADPCM and
CS-ACELP, which I'll briefly explain below:
- ADPCM (Adaptive Differential Pulse Code Modulation),
a technique also used in some PSTN networks, offers voice coding
at 40, 32, 24 and 16 Kbps. At the most common implementation
rate of 32 Kbps, the compression rate is 2:1 (2 to 1), which
exactly halves the bandwidth required for voice. ADPCM imposes
delay of 1.0ms, which also is imperceptible. At 32 Kbps, ADPCM
yields voice quality that is very close to that of PCM. At the
higher compression rates of 24 Kbps and 16 Kbps, there is a
corresponding drop in quality.
- CS-ACELP (Conjugate Structure-Algebraic Code Excited
Linear Prediction) runs at 8 Kbps, a compression rate of 8:1.
There are two versions, which vary in terms of computational
complexity, either of which offers raw voice quality that is
similar to that of ADPCM at 32 Kbps. Compression delay, however,
is in the range of 10.0ms, which is decidedly perceptible.
- LD-CELP (Low Delay-Code Excited Linear Prediction)
runs at 16 Kbps, a compression rate of 4:1. Compression delay
is 3.0ms-5.0ms, and raw voice quality is similar to that of
ADPCM at 32 Kbps.
Regardless of the compression technique employed, the VoFR process
is basically the same, although the specifics vary a bit. Let's
use CS-ACELP as an example.
On the transmit side, 80 PCM voice samples, representing 10ms
of a voice stream, are gathered together to form a set of voice
data comprising 640 bits. That data set is run through the CS-ACELP
compression algorithm by a codec embedded in a FRAD
(Frame Relay Access Device), and reduced to 160 bits to be transmitted.
This is accomplished by consulting a code book that provides
an abbreviated representation of an approximation of the data.
The result is packed inside a Frame Relay frame, with an appropriate
DLCI, not marked DE, and is presented to the network.
If multiple voice conversations are to be supported between the
same two sites on the enterprise Frame Relay network, VoFR includes
a sub-channel provision that allows multiple compressed voice
data sets to be packed into a given frame. On the receiving end,
the process is reversed, and all is well -- at least as far as
compression and decompression are concerned.
The complication, of course, comes from the fact that the frames
are not delivered by the network at the same pace that they entered
it. Additionally, some frames may be lost in transit. Remember
that Frame Relay is a highly-shared packet network characterized
by unpredictable levels of congestion. Latency is guaranteed,
as is variability in delay (i.e., jitter).
Loss isn't guaranteed, but it's highly likely, over time. VoFR
adjusts to latency, jitter and loss through various intelligent
continuity algorithms employed by the receiving codec. These fill
in the voids by stretching the voice frames received earlier and
blending them with those received later. This logic is embedded
in predictive decompression algorithms such as CS-ACELP and LD-CELP,
which take advantage of the 10ms delay built into the compression/decompression
processes to make the necessary predictions and do their stretching
and blending. VoFR also makes use of various techniques for echo
cancellation, as echo becomes perceptible when delay exceeds 15ms-20ms.
The actual end-to-end process is a little more involved, of course.
In an enterprise-wide VoFR application, both the calling and called
parties sit behind a PBX.
The caller in San Francisco, for example, picks up the phone
and dials the extension number of a co-worker in New York. The
PBX checks its options for routing the call, courtesy of its LCR
(Least Cost Routing) software. Over a special link, the PBX consults
with the FRAD to determine the availability of the Frame Relay
network.
If the FRAD is unaware of any significant congestion in the network
and if bandwidth is available within the CIR, it accepts the call,
and the PBX sends the voice traffic to the FRAD in PCM format.
The FRAD compresses the voice data, packs it into a frame every
10ms, and off we go.
If network congestion levels remain low during the course of
the call, conversation quality remains pretty good, but never
as good as it is over the good old PSTN. If network congestion
levels increase, so do latency and jitter, and frame loss may
result. Voice quality suffers, and fond memories of the PSTN haunt
the balance of the conversation.
So What...and Why?
So, voice data can be compressed in order to use shared bandwidth
more efficiently, and this can be done with little loss in voice
quality, if everything goes just right. Further, the decompression
process can be sophisticated enough to smooth out some of the
problems associated with latency, jitter and loss of voice data
over a packet data network -- within limits.
At this point, you have to ask yourself why in the world you
would go to all of this trouble and expense (Yes, there is additional
expense for things like special codecs, PBX-to-FRAD links, increased
CIRs and special voice PVCs.) to run voice over a packet data
network when the end result is uncertain quality that will never
be as good as the PSTN, and which can be terrible when the network
suffers congestion.
There is only one answer, and that is cost. VoFR is free, or
at least cheap in comparison to the PSTN. Domestic VoFR calls
effectively are free, assuming that the Frame Relay network is
already in place for other purposes, that bandwidth within the
CIR is available at the moment, that the CIR hasn't been increased
in consideration of VoFR, and that special VoFR PVCs haven't been
set up.
There is no usage-sensitive element to the Frame Relay pricing
algorithm. Rather, Frame Relay generally is priced on the basis
of port speeds, PVCs and CIRs. However, you still have to wonder
if VoFR is worth the trouble in the face of PSTN voice calling
at rates in the range of $0.04-$0.06 per minute.
However, the cost issue takes on real significance in a multinational
enterprise. Calls to Japan or South Africa, for example, may well
be in the range $0.50-$0.60 per minute. VoFR starts to look real
good at these prices.
In closing, I have to say that I'm not trying to pick a fight
with the proponents of VoFR. I'm just calling it the way I see
it. I may lose a few friends over this column, but my Mother,
who loves me unconditionally, will be proud of me for telling
the truth.