

#### RFNoC<sup>™</sup> Deep Dive: FPGA Side Jonathon Pendlum 5/28/2014



# Outline

Ettus Research<sup>TM</sup> A National Instruments Company

- Getting started with RFNoC FPGA
- Anatomy of a Computation Engine
- FFT RFNoC Block walkthrough
- Design Tips

### Hardware / Software Prerequisites

- Hardware:
  - PC or Laptop, decent amount of RAM (4+ GB)
  - USRP X300, X310, or E310
  - Xilinx JTAG debugger module (optional)
    - Only needed for E310, X3x0 series has built in USB JTAG
- Software:
  - Linux based OS
  - Xilinx Vivado 2014.4
    - E310 can use free Webpack version
  - Modelsim (optional)
  - UHD
  - GNU Radio (optional, but highly recommended)
    - gr-ettus





**Ettus** 

### **FPGA Directory Structure**

- Ettus Researc
- usrp3 -- Third gen devices (X300, E310, B200)
  - **tools** -- Build infrastructure software
  - sim -- Simulations
  - lib -- HDL shared between devices
    - **ip** -- IP cores shared between devices
    - rfnoc -- NoC Shell, AXI Wrapper, NoC blocks, basic blocks
  - top -- Device specific files & toplevel
    - **x300** 
      - **ip** -- Device specific IP cores
      - x300.v
      - rfnoc\_ce\_auto\_inst\_x300.v
      - Makefile

# **Building RFNoC FPGA Images**

Ettus

- Make sure Vivado 2014.4 installed
  - Helpful if installed in default directory (in /opt/Xilinx)
- Go to toplevel directory
  - usrp3/top/x300, e300
- source setupenv.sh
  - Sets up Xilinx tools
- make X310\_HGS\_RFNOC (or E310\_RFNOC)
  - To launch Vivado GUI: make GUI=1 X310\_HGS\_RFNOC
- Build takes about an hour

# **Building RFNoC FPGA Images**



- Output in build directory
  - usrp\_x310\_fpga\_HGS\_RFNOC.bit
  - Includes a report file with utilization / timing info
- Program bitfile with usrp\_x3xx\_fpga\_burner
- make clean
- make cleanall -- also removes build-ip



Ο

Researc

 $\bigcirc$ 

**Ettus** 



Ο

Researc

 $\bigcirc$ 

**Ettus** 



Ο

Researc

**Ettus** 

### **CHDR Packet Protocol**

Ettus

 $\bigcirc$ 

| Pkt<br>Type                              | Has<br>Time | EOB | Seq #   | Length (in bytes) | SRC SID | DST SID |  |  |  |
|------------------------------------------|-------------|-----|---------|-------------------|---------|---------|--|--|--|
| 63 - 62                                  | 61          | 60  | 59 - 48 | 47 - 32           | 31 - 16 | 15 - 0  |  |  |  |
| Fractional Time (Optional, Has Time = 1) |             |     |         |                   |         |         |  |  |  |
| Payload                                  |             |     |         |                   |         |         |  |  |  |
|                                          |             |     |         |                   |         |         |  |  |  |

#### Packet type based on bits 63, 62, & 60

| 63 | 62 | 60 | Packet Type         |  |
|----|----|----|---------------------|--|
| 0  | 0  | 0  | Data                |  |
| 0  | 0  | 1  | Data (End of Burst) |  |
| 0  | 1  | 0  | Flow Control        |  |
| 1  | 0  | 0  | Command             |  |
| 1  | 1  | 0  | Response            |  |
| 1  | 1  | 1  | Response (Error)    |  |

### **Stream IDs**



| SID      |          |       |  |  |  |  |
|----------|----------|-------|--|--|--|--|
| Crossbar | Crossbar | Block |  |  |  |  |
|          | Port     | Port  |  |  |  |  |
| 15 - 8   | 7 - 4    | 3 - 0 |  |  |  |  |

- 16 bit Stream ID
  - 256 unique crossbar (or device) IDs
  - 16 ports per crossbar
  - 16 ports per block
- Example Crossbar ID: 2, Port: 1, Block Port: 0
   SID: 2.1.0



Ο

Researc

A National Instruments

**Ettus** 

0 4

Ettus

 $\bigcirc$ 



Ettus

 $\bigcirc$ 

A National Instruments Company



• Research

**Ettus** 



Ettus Research

 $\bigcirc$ 



Ettus



Ettus

 $\bigcirc$ 





Ο

Researc

 $\bigcirc$ 

**Ettus** 



Ο

Researc

**Ettus** 













Ο

Researc

**Ettus** 

Ó  $\diamond$ 

### **Register Space**



- Settings Bus 8-bit address = 256 regs
- NoC Shell Regs: 0 127
  - 0 63: Flow control (4 per block port, x16)
  - 127: NoC Shell readback address
- NoC Shell Readback Regs:
  - 0: NoC ID
  - 1: Window Size
- User Regs: 128 255
  - 128: Next destination (AXI Wrapper, set by software)
  - 129: Control data (tlast not asserted)
  - 130: Control data (tlast asserted)
  - 255: User readback address



Ο

Researc

**Ettus** 

Ó  $\diamond$ 

# **Basic Building Blocks**

Ettus

- Many basic functions already written
  - All located in usrp3/lib/rfnoc
- AXI-Stream interfaces
- Notable blocks:
  - Muliplication: mult, mult\_add, cmult, mult\_rc
  - Add: cadd
  - Clipping / Rounding: axi\_round, axi\_clip
  - FIFO: axi\_fifo
  - Delay: delay, delay\_type2, delay\_type3
  - Utility: split\_stream, axi\_join, packet\_resizer

# **Existing RFNoC blocks**

#### Ettus A National Instruments Company

#### Function:

- FFT, Windowing, FIR, Vector Averaging
- AddSub Block ports
- Utility:
  - Packet resizer, Split stream, Keep one in N, FIFO
  - Null source and sink
- In development:
  - OFDM receiver & transmitter, Polyphase filter bank
  - Radio lite, separate DSP block

# **NoC Block FFT Walkthrough**

Ettus Research<sup>m</sup> A National Instruments Company

- usrp3/lib/rfnoc/noc\_block\_fft.v
- Based on Xilinx AXI-Stream FFT core
- FFT size 32 4096
  - Up to 64K
- Configurable output
  - Complex, Magnitude, Magnitude Squared
- Built in FFT shift
- No bubble states, i.e. throughput = clock rate



Ο



Ο



0



Ο



Ο

#### **NoC Block FFT**



Ο

#### **NoC Block FFT**



Ο

#### **NoC Block FFT**



Ο

#### **RFNoC Simlib**

**AXI Crossbar** rfnoc\_sim\_lib.v DUT **NoC Shell** Computation SendAxi RecvAxi Engine Task Task **AXI Wrapper** Testbench

**Ettus** 

0 4

- Sets up crossbar and NoC Shell for testbench
- Provides useful tasks such as sending packets

#### **RFNoC Simlib**



0 9 0

Research

 $\bigcirc$ 

A National Instruments C

Ettus

Ó

 $\cap$ 

60

# **Adding Computation Engines**

- Ettus
- How do we add CEs to the build system?
  - 1. Place new CE in usrp3/lib/rfnoc
  - 2. Add block(s) to Makefile.inc
  - 3. Edit rfnoc\_auto\_ce\_inst\_x300.v
    - Or rfnoc\_auto\_ce\_inst\_x310.v, rfnoc\_auto\_ce\_inst\_e310.v
  - 4. Run make X300\_HGS\_RFNOC
    - 1. Or X310\_HGS\_RFNOC, E310\_RFNOC

## **RFNoC FPGA Design Tips**

- Ettus
- Avoid bubble cycles to improve throughput
- When splitting an AXI stream, use FIFOs (i.e. split\_stream\_fifo.v)
  - Prevents back up in one stream from blocking another
  - May need to adjust FIFO depth
- When combining streams, make sure to account for different path latency
  - Inserting a FIFO on the shorter path can prevent bubble cycles (waiting for data to "catch up")
  - Manual intervention, best done in simulation

# **RFNoC FPGA Design Tips**

- Be mindful of dead lock situations
  - Mostly avoided by carefully following "do not deassert tvalid without tready" rule
- Watch out for tready, tvalid combinatorial paths

- Subtle source of timing issues (when chaining multiple blocks)
- Consider inserting register: axi\_fifo.v SIZE=0
- Ignore output without a tvalid (when debugging)
- Crossbar statistics tool: firmware/usrp3/x300/x300\_debug.py

# Appendix



#### **Flow Control**

- Fine and course grained flow control
- Fine grained: AXI-Stream
- Course grained: Credit based flow control
  - Every endpoint (or consumer) has a receive window

- Every source (or producer) knows window size
- Producers send packets until consumer window is full
- Consumer notifies (acks) producer as window empties
  - i.e. gives credits back to producer
- Some similarities to TCP flow control

## **Flow Control Cont.**

Ettus Research<sup>TM</sup> A National Instruments Company

- A few additional design rules
  - 1. One consumer per producer
    - Otherwise very difficult to keep track of window
    - Use block ports to allow a CE to receive multiple streams
  - 2. Producers must buffer entire packet before releasing
    - Prevents deadlock in crossbar
    - Prevents slow packets from causing congestion
  - 3. All routing based on first line of CHDR header
    - Higher performance
  - 4. Samples / data always dropped at producer
    - Deterministic -- ensures no lost packets in the middle
    - Single point to restart stream

#### **AXI Stream**



- Part of ARM AMBA standard
- Simple handshake protocol:
  - Upstream block asserts tvalid
  - Downstream block asserts tready
  - Data is consumed when tvalid & tready == 1
  - tlast used to delimit packets
- Once tvalid is asserted, it cannot be deasserted without at least one tready cycle
- Why is AXI Stream so useful?
  - No need for complicated strobes data flows through

## **Settings Bus**

- Common bus in USRP FPGA designs
- Implements control & status registers
- set\_addr, set\_data, set\_stb, rb\_data
  - Why no rb\_addr? It is a control register
- Control packet payload sets addr & data
  - [63:0] = {24'd0, set\_addr[7:0], set\_data[31:0]}

- Response packet payload has readback data
  - All control packets receive a response packet
  - Usually write readback address control register to read a status register

# **Block Ports**

- CEs support up to 16 block ports
- Allows CEs to accept multiple streams
  - Time share a function (i.e. FFT)
  - Use data from multiple sources (i.e. addsub block)
- Each block port has dedicated NoC shell output
  - Sample data packets only (str\_src\_t\*, str\_sink\_t\*)
    - Bit width widened to 64\*N where N = number of block ports

- Individual sample data FIFOs for each block port
- Command packets share interface (cmdout\_t\*, ackin\_t\*)
- Resource utilization can increase quickly
- Shared input bus
  - Does not allow receiving two packets simultaneously

#### **Generating IP with Vivado**



# Demo

# **Adding IP cores**



- Find \*.xci file in build directory
  - Example: build-E310/project\_1/project\_1.srcs/sources\_1/ ip/xfft\_0/xfft\_0.xci
- Make IP specific directory in *usrp3/lib/ip* and add \*.xci file
  - Example: usrp3/lib/ip/xfft\_0/xfft\_0.xci
- Edit usrp3/lib/ip/Makefile.inc
- Add IP specific Makefile.inc in IP subdirectory
  - Use existing Makefile.inc as template
  - Example: usrp3/lib/ip/xfft\_0/Makefile.inc
- Note: We are likely to change this approach to make adding "OOT" RFNoC modules easier

# **Simulating our Custom Block**

- Modelsim simulation infrastructure
  - Tested with ModelSim SE on Linux
- Makefile based easy to add new testbenches

- Example testbenches in usrp3/lib/rfnoc
  - rfnoc\_fir\_filter\_tb
  - rfnoc\_fft\_vector\_iir\_tb
- Note: Big changes coming soon
  - More generic infrastructure
  - Support for Vivado simulator

## In system debugging

- Vivado ILA: Integrated Logic Analyzer
  - Real time logic analyzer inserted into design

**Ettus** 

- Integrated into Vivado GUI
- Uses logic resources (Slices, BRAM)
- Multiple ways to instantiate, see Xilinx docs
- Easiest to mark nets & registers for debug in HDL
  - Verilog:

     (\* keep = "true", dont\_touch = "true", mark\_debug = "true" \*) wire a;
     (\* keep = "true", dont\_touch = "true", mark\_debug = "true" \*) reg b;
  - VHDL:

ATTRIBUTE MARK\_DEBUG : string; ATTRIBUTE MARK\_DEBUG of a : SIGNAL IS "TRUE";