Viterbi Decoder IP: convolutional FEC for Wi-Fi, DVB, LTE, satellite

FLAGSHIP

K=7 rate-1/2, placed and routed on UltraScale+

The flagship configuration is the constraint-length-7, rate-1/2 soft-decision decoder (the 802.11a / CCSDS / IEEE standard code). It was placed and routed on a Zynq UltraScale+ RFSoC device; every figure below comes from Vivado Design Timing Summary and post-route utilization reports.

527 MHz

Honest-ceiling clock (median of a place-directive sweep), Zynq UltraScale+ (xczu28dr) MEASURED

527 Mbps

One decoded bit per clock at the closed clock, continuous streaming MEASURED

0 DSP

Zero DSP48 blocks: the whole decoder lives in logic and block RAM MEASURED

2,604 LUT

667 flip-flops, 2 block RAMs: a compact, predictable footprint MEASURED

Measured on two devices, both real silicon

The same generated core was implemented on a high-performance UltraScale+ part and on the low-cost Zynq-7010 (the ADALM-Pluto SDR device). We publish both, including the speed the low-cost part actually reaches:

Result	Value	Provenance
K=7 closed clock, Zynq UltraScale+ (xczu28dr-2)	527 MHz	Vivado P&R honest ceiling, median of place sweep MEASURED
K=7 closed clock, low-cost Zynq-7010 (-1)	162 MHz	Vivado P&R honest ceiling, clears 160 MHz MEASURED
K=7 footprint, Zynq-7010	1,918 LUT / 740 FF / 2 BRAM / 0 DSP	Post-route utilization MEASURED
Half-hardware layout (fold 2), Zynq-7010	1,367 LUT @ 160 MHz	-29% logic at the full-rate clock MEASURED
K=9 256-state decoder, UltraScale+	405 MHz / 11,326 LUT / 8 BRAM	Vivado P&R honest ceiling, stronger coding gain MEASURED

Open vs encrypted: where the common commercial Viterbi core ships as encrypted RTL, our decoder ships as readable, synthesizable source that is verified bit-for-bit against a reference implementation (ka9q/libfec, the decoder behind gr-ieee802-11), so you can audit, port, and trust every gate.

Fold knob: trade logic for throughput

A single fold parameter time-multiplexes the add-compare-select engine across F clocks, shrinking the butterfly logic for channels that do not need full line rate. The 256-state K=9 core, on UltraScale+:

K=9 configuration	LUT	Clock	Throughput
fold 1, full rate (1 bit/clock)	10,885	405 MHz	405 Mbps MEASURED
fold 2, rotating-read (clock-first)	9,113	351 MHz	175 Mbps MEASURED
fold 4	5,116	323 MHz	81 Mbps MEASURED
fold 8, smallest area	2,627	322 MHz	40 Mbps MEASURED

Two fold engines, picked per fold: fold 2 uses the rotating-read engine (no select mux on the metric read) to hold a near-full-rate clock: 351 MHz at half the butterfly hardware; fold 4/8 use the area-optimal engine for density, shrinking the 256-state core 76% (10,885 -> 2,627 LUT) at fold 8 so many low-rate channels fit one device. The K=7 (64-state) core folds the same way: 2,301 -> 737 LUT (fold 1 -> 8) on Zynq-7010, 527 -> 347 MHz on UltraScale+. Every fold point is verified bit-for-bit against the reference decoder. Clock is the median of a place-directive sweep; LUT and clock are both post-place-and-route.

Folding without the clock penalty. On the low-cost Zynq-7010 the half-hardware layout (fold 2) now closes at 160 MHz, the same full-rate clock as the unfolded rate-1/2 decoder, so a folded channel keeps full throughput-per-clock while spending half the add-compare-select logic (1,367 LUT / 1,073 FF). A streaming read pattern (a fixed window over a rotating metric bank) removes the per-step selection that used to hold a time-folded clock about 13 MHz below the full decoder. Bit-exact against the reference, and validated in a full Wi-Fi receiver across every modulation and coding rate, packet after packet with no reset between them. MEASURED Read the case study.

Code rates 1/2 through 1/7, one core

The decoder spans the full convolutional-FEC rate range. Lower rates add redundancy for more coding gain (a larger free distance); the same architecture serves them all at a near-constant clock. Higher rates (2/3, 3/4) are reached by puncturing the rate-1/2 base, so they reuse the rate-1/2 core unchanged.

Code rate	Free distance	Clock, Zynq-7010	Clock, UltraScale+
1/2 (base code)	10	160 MHz MEASURED	527 MHz MEASURED
1/3	15	167 MHz MEASURED	449 MHz MEASURED
1/4	20	162 MHz MEASURED	449 MHz MEASURED
1/5	24	163 MHz MEASURED	433 MHz MEASURED
1/6	30	164 MHz MEASURED	454 MHz MEASURED
1/7	33	163 MHz MEASURED	442 MHz MEASURED

Higher free distance means stronger error correction: at the same signal level the 1/3 code roughly halves the residual errors of 1/2, and the lower rates push further. Each rate is verified bit-for-bit against the reference decoder, zero DSP, two block RAMs. A branch-metric pipeline keeps the deep rates (1/4 and below) at full clock on the low-cost part. Zynq-7010 deep-rate clocks are the deployable pipelined configuration; UltraScale+ figures are single-run place-and-route (the rate-1/2 527 MHz is the place-sweep ceiling).

CATALOG

Configurations

K=7 rate-1/2 (the standard code)

Wi-Fi · DVB · LTE · CCSDS

The constraint-length-7, 64-state soft-decision decoder. Higher code rates (2/3, 3/4) are supported by a depuncture front-end, so the core stays rate-agnostic. One decoded bit per clock, continuous streaming.

Clock (UltraScale+)527 MHz MEASURED

Logic / DSP2,604 LUT / 0 DSP MEASURED

K=9 256-state

stronger coding gain

The constraint-length-9, 256-state code for links that need roughly an extra decibel of coding gain. Generated from the same architecture: the state count is a single parameter, verified bit-exact end to end.

Clock (UltraScale+)405 MHz MEASURED

Block RAM8 MEASURED

Three latency / area layouts

pick your trade

The survivor-path memory ships in three forms: a low-latency register-exchange layout, a uniform-latency windowed layout (2 block RAMs), and a half-block-RAM layout, covering minimum latency, minimum area, and the balanced middle.

Latency (windowed)4×traceback + 1

Initiation intervalII = 1

Continuous free-running

no per-frame reset

One cold start, then the decoder runs a continuous packet stream with no reset between frames. It re-anchors on each frame's tail bits on its own. It also tolerates idle gaps between packets, freezing and resuming with no loss. Simpler to integrate: no per-frame init handshake.

Frames back-to-back, no resetbit-exact MEASURED

Idle gaps between packetstolerated MEASURED

K=9 pulls ahead of K=7, and the lead grows with signal quality

The longer constraint length gives the K=9 code a larger free distance, so its error-rate waterfall is steeper: as the signal improves, the K=9 decoder's margin over K=7 widens. Measured over matched traffic, rate-1/2 BPSK on an AWGN channel with 4-bit soft decisions:

Eb/N0	K=7 bit-error rate	K=9 bit-error rate	K=9 advantage
2.0 dB	1.1×10^-2	6.2×10^-3	1.8× fewer errors MEASURED
2.5 dB	3.1×10^-3	1.2×10^-3	2.7× fewer errors MEASURED

Both codes already beat uncoded transmission by a wide margin: the classic rate-1/2 waterfall. K=9 then roughly halves K=7's residual errors at 2 dB, and does better still as the channel improves: the extra coding gain its constraint length is meant to deliver, confirmed bit by bit. Need a different constraint length, polynomial, soft-bit width, or traceback depth? See IP customization.