CASE STUDY · 5G NR SSB RECEIVER

An AI-generated 5G receiver that reads real cells

A 5G NR cell-search receiver went from a reference design to FPGA hardware that no one wrote by hand. The convincing test: point it at a live signal captured off the air, and it reads three cells to the same broadcast message.

AlgoSilicon engineering · 2026

When a handset powers on, the first thing it does is find a base station. It does not know which cells are nearby or what their identities are, so it scans the spectrum for a small synchronization block each cell transmits periodically. In 5G that block is the SSB: it bundles the primary and secondary synchronization signals and the broadcast channel across four OFDM symbols and 240 subcarriers.

Re-implementing that cell-search receiver on an FPGA by hand is tedious enough that the project sits on the shelf. Instead we pointed our algorithm-to-silicon flow at the reference design and let it generate the receiver directly, targeting the Kria KV260. Then we fed it a real 5G downlink captured off the air, a 30 kHz-spacing signal carrying three cells.

Generated in layers, checked at every layer

The flow does not write hardware in one leap. It builds the receiver in three layers and has a machine check each layer against the one above it: a floating-point golden model whose output matches the 3GPP definitions and the MATLAB 5G Toolbox; a cycle-accurate fixed-point model checked bit-for-bit against the golden model; and the Verilog RTL, checked bit-for-bit against the cycle model. Every adjacent pair of layers is reconciled by machine, not by assertion.

What the receiver does

The input is a stream of baseband I/Q samples. Turning that into a cell identity and a decoded broadcast message takes four stages. Because the receiver does not know the cell group at power-on, it correlates against all three primary-synchronization candidates at once. Each is a complex matched filter, and a complex matched filter factors into three real sub-correlations, so the front end runs nine folded correlators in parallel to find the synchronization peak. An energy measure and a peak finder then cut the block out, a 256-point FFT recovers the 240 subcarriers, and the broadcast decode recovers the message.

The receiver signal chain split across two compute tiers: detection, extraction, and the 256-point FFT on the KV260 fabric; equalization, polar decode, and MIB parsing on the host; a feed-forward cut at the spectrum
The receiver chain, split at the narrowest interface. The synchronization and demodulation front end runs at the line rate on the KV260 fabric; the control-heavy broadcast decode runs on the host, with the fixed-point spectrum as the hand-off.

Split across two tiers so it runs in real time

The chain is not one monolithic block on the FPGA. Detection and demodulation are fixed-rate streaming work that moves every clock, which suits the fabric. The broadcast decode is control-heavy: it searches 336 secondary-synchronization candidates and runs a polar list decoder, which is cheaper on a CPU. So the design is cut at the narrowest interface in the chain, the recovered spectrum, and the two halves each run where they are efficient. The front end emits a 16-bit fixed-point spectrum; the host still decodes the broadcast message with a clean CRC, because the pilot equalization and the polar code carry enough coding gain to absorb the quantization noise.

The deployment datapath on the KV260, redrawn from the real Vivado integration: host processor and DDR, through the DMA movers and an AXIS switch, to the receiver front-end, samples down and spectra up
The deployment datapath, redrawn from the real Vivado integration to read on a phone: host processor and DDR, through the DMA movers and the AXIS switch, to the generated receiver front-end on the KV260 fabric.

Read from three real cells off the air

Validation runs two ways. Against the MATLAB 5G Toolbox as an independent judge, thousands of standard-compliant blocks, each carrying a known cell identity and broadcast message, are decoded and compared bit for bit. And against the real captured signal: the receiver read all three cells in the capture to one consistent broadcast message, system frame number 788 and 30 kHz spacing for each. That consistency is itself the check, because cells synchronized in one area should broadcast the same message.

The work also corrected an error in the reference script, which parsed the scrambled payload without descrambling and produced a wrong, per-cell-different frame number. Adding the descrambling and de-interleaving the standard specifies is what makes the three cells agree.

How weak a signal it still decodes

The real capture is one operating point. To measure robustness, 1,460 diverse vectors from the MATLAB toolbox cover all 1,008 cell identities, every SSB position, any broadcast message, two subcarrier spacings, plus a noise sweep, carrier offset, and frequency-selective fading. On a clean channel, 300 of 300 random configurations decode bit-exact. Under the noise sweep the block error rate is zero down to -2 dB per resource element and only starts to climb near -6 dB, which is the noise floor a low-rate broadcast channel is designed to reach.

Block error rate versus SNR per resource element: zero errors down to -2 dB, climbing near -6 dB
Broadcast decode robustness against noise, measured over the vector sweep. The decode holds error-free down to the coding limit before it degrades.

Leaner than the commercial reference

The nine correlators are the algorithmically required cost of blind detection, and they are where the resources go. Each one is generated as the family's most efficient folded FIR, so on the same part the detector uses 38% fewer look-up tables and 75% fewer flip-flops than the commercial high-level-synthesis reference, with one block RAM against seven. The full receiver, detector plus demodulation, closes place-and-route at 256 MHz on the KV260, and the deployed overlay meets timing at the 200 MHz link rate.

Detector resources on the same part: this design versus the commercial HLS reference, 38% fewer LUTs, 75% fewer FFs, one block RAM against seven
Detector resources on the same part, from the place-and-route report: the generated correlators are decisively leaner than the commercial HLS reference.

On real silicon, the reference signal was streamed through the KV260 front end over DMA and the computed spectra were read back and compared to the reference model: 2,540 beats, zero mismatch.

See it run on real captured signals

The analyzer below replays a stream of diverse synchronization blocks, the same host decode chain that runs on the KV260 spectra, with no server and no radio attached. As the channel worsens the constellation spreads from a tight QPSK to a noise cloud, the signal-quality history climbs, and the broadcast decode holds bit-exact down to the coding limit before it finally fails.

Live signal-quality analyzer replaying the SNR sweep: the PBCH constellation spreading from a tight QPSK to a noise cloud, EVM and SNR readout, and the decoded Master Information Block
The live analyzer: signal quality and SNR, the PBCH constellation spreading under noise, the decoded Master Information Block, and the decode stream. Launch the interactive demo.

Why this is repeatable

This was a 5G receiver, but nothing in the method is specific to 5G. From the standard definition, to a fixed-point model with timing, to the generated circuit, each layer is bit- equivalent to the one above it, and the equivalence is checked by machine. Running the front end on the fabric and the decode on the host is a cut at the narrowest interface in the chain, a split-by-compute idea that holds for any long, uneven signal-processing pipeline. And the leaner resource number is not hand-tuned: the flow maps each correlator to its most efficient hardware structure, which is why it comes in below what a commercial tool synthesizes. What carries over is the method, not this one receiver.

See the Cellular PHY product Request the datasheet