A soft-decision Viterbi decoder for the convolutional codes that front almost every wireless and satellite link: Wi-Fi, DVB, 3GPP/LTE, WiMAX, and CCSDS deep-space. Generated from a Python algorithm and resource-optimized by AI: folding shrinks the logic to fit a small FPGA, and the right micro-architecture keeps the clock. The result is a whole family of resource-versus-clock operating points, from full-rate down to one quarter of the logic, each verified bit-for-bit, at zero DSP.
The flagship configuration is the constraint-length-7, rate-1/2 soft-decision decoder (the 802.11a / CCSDS / IEEE standard code). It was placed and routed on a Zynq UltraScale+ RFSoC device; every figure below comes from Vivado Design Timing Summary and post-route utilization reports.
The same generated core was implemented on a high-performance UltraScale+ part and on the low-cost Zynq-7010 (the ADALM-Pluto SDR device). We publish both, including the speed the low-cost part actually reaches:
| Result | Value | Provenance |
|---|---|---|
| K=7 closed clock, Zynq UltraScale+ (xczu28dr-2) | 527 MHz | Vivado P&R honest ceiling, median of place sweep MEASURED |
| K=7 closed clock, low-cost Zynq-7010 (-1) | 162 MHz | Vivado P&R honest ceiling, clears 160 MHz MEASURED |
| K=7 footprint, Zynq-7010 | 1,918 LUT / 740 FF / 2 BRAM / 0 DSP | Post-route utilization MEASURED |
| Half-hardware layout (fold 2), Zynq-7010 | 1,367 LUT @ 160 MHz | -29% logic at the full-rate clock MEASURED |
| K=9 256-state decoder, UltraScale+ | 405 MHz / 11,326 LUT / 8 BRAM | Vivado P&R honest ceiling, stronger coding gain MEASURED |
Open vs encrypted: where the common commercial Viterbi core ships as encrypted RTL, our decoder ships as readable, synthesizable source that is verified bit-for-bit against a reference implementation (ka9q/libfec, the decoder behind gr-ieee802-11), so you can audit, port, and trust every gate.
A single fold parameter time-multiplexes the add-compare-select engine across F clocks, shrinking the butterfly logic for channels that do not need full line rate. The 256-state K=9 core, on UltraScale+:
| K=9 configuration | LUT | Clock | Throughput |
|---|---|---|---|
| fold 1, full rate (1 bit/clock) | 10,885 | 405 MHz | 405 Mbps MEASURED |
| fold 2, rotating-read (clock-first) | 9,113 | 351 MHz | 175 Mbps MEASURED |
| fold 4 | 5,116 | 323 MHz | 81 Mbps MEASURED |
| fold 8, smallest area | 2,627 | 322 MHz | 40 Mbps MEASURED |
Two fold engines, picked per fold: fold 2 uses the rotating-read engine (no select mux on the metric read) to hold a near-full-rate clock: 351 MHz at half the butterfly hardware; fold 4/8 use the area-optimal engine for density, shrinking the 256-state core 76% (10,885 -> 2,627 LUT) at fold 8 so many low-rate channels fit one device. The K=7 (64-state) core folds the same way: 2,301 -> 737 LUT (fold 1 -> 8) on Zynq-7010, 527 -> 347 MHz on UltraScale+. Every fold point is verified bit-for-bit against the reference decoder. Clock is the median of a place-directive sweep; LUT and clock are both post-place-and-route.
Folding without the clock penalty. On the low-cost Zynq-7010 the half-hardware layout (fold 2) now closes at 160 MHz, the same full-rate clock as the unfolded rate-1/2 decoder, so a folded channel keeps full throughput-per-clock while spending half the add-compare-select logic (1,367 LUT / 1,073 FF). A streaming read pattern (a fixed window over a rotating metric bank) removes the per-step selection that used to hold a time-folded clock about 13 MHz below the full decoder. Bit-exact against the reference, and validated in a full Wi-Fi receiver across every modulation and coding rate, packet after packet with no reset between them. MEASURED Read the case study.
The decoder spans the full convolutional-FEC rate range. Lower rates add redundancy for more coding gain (a larger free distance); the same architecture serves them all at a near-constant clock. Higher rates (2/3, 3/4) are reached by puncturing the rate-1/2 base, so they reuse the rate-1/2 core unchanged.
| Code rate | Free distance | Clock, Zynq-7010 | Clock, UltraScale+ |
|---|---|---|---|
| 1/2 (base code) | 10 | 160 MHz MEASURED | 527 MHz MEASURED |
| 1/3 | 15 | 167 MHz MEASURED | 449 MHz MEASURED |
| 1/4 | 20 | 162 MHz MEASURED | 449 MHz MEASURED |
| 1/5 | 24 | 163 MHz MEASURED | 433 MHz MEASURED |
| 1/6 | 30 | 164 MHz MEASURED | 454 MHz MEASURED |
| 1/7 | 33 | 163 MHz MEASURED | 442 MHz MEASURED |
Higher free distance means stronger error correction: at the same signal level the 1/3 code roughly halves the residual errors of 1/2, and the lower rates push further. Each rate is verified bit-for-bit against the reference decoder, zero DSP, two block RAMs. A branch-metric pipeline keeps the deep rates (1/4 and below) at full clock on the low-cost part. Zynq-7010 deep-rate clocks are the deployable pipelined configuration; UltraScale+ figures are single-run place-and-route (the rate-1/2 527 MHz is the place-sweep ceiling).
The constraint-length-7, 64-state soft-decision decoder. Higher code rates (2/3, 3/4) are supported by a depuncture front-end, so the core stays rate-agnostic. One decoded bit per clock, continuous streaming.
The constraint-length-9, 256-state code for links that need roughly an extra decibel of coding gain. Generated from the same architecture: the state count is a single parameter, verified bit-exact end to end.
The survivor-path memory ships in three forms: a low-latency register-exchange layout, a uniform-latency windowed layout (2 block RAMs), and a half-block-RAM layout, covering minimum latency, minimum area, and the balanced middle.
One cold start, then the decoder runs a continuous packet stream with no reset between frames. It re-anchors on each frame's tail bits on its own. It also tolerates idle gaps between packets, freezing and resuming with no loss. Simpler to integrate: no per-frame init handshake.
The longer constraint length gives the K=9 code a larger free distance, so its error-rate waterfall is steeper: as the signal improves, the K=9 decoder's margin over K=7 widens. Measured over matched traffic, rate-1/2 BPSK on an AWGN channel with 4-bit soft decisions:
| Eb/N0 | K=7 bit-error rate | K=9 bit-error rate | K=9 advantage |
|---|---|---|---|
| 2.0 dB | 1.1×10-2 | 6.2×10-3 | 1.8× fewer errors MEASURED |
| 2.5 dB | 3.1×10-3 | 1.2×10-3 | 2.7× fewer errors MEASURED |
Both codes already beat uncoded transmission by a wide margin: the classic rate-1/2 waterfall. K=9 then roughly halves K=7's residual errors at 2 dB, and does better still as the channel improves: the extra coding gain its constraint length is meant to deliver, confirmed bit by bit. Need a different constraint length, polynomial, soft-bit width, or traceback depth? See IP customization.
Every decoded bit is checked against a reference model and a cycle-accurate model before it is checked in hardware. The clock and resource figures here are post-route Vivado results on the stated devices, not estimates. The full RTL micro-architecture and the measured trade space are in the technical report.