Difference between revisions of "RFNoC Frequently Asked Questions"

From Ettus Knowledge Base
Jump to: navigation, search
(Add DRAM throughput numbers.)
m
Line 49: Line 49:
 
|}
 
|}
  
=== What data rates can I expect on each USRP? ===
+
=== What DRAM data rates can I expect on each USRP? ===
  
 
DRAM performance is highly application-specific. For example, reading vs. reading and writing simultaneously, one data stream vs. multiple data streams, random access vs. sequential access, etc., can give dramatically different performance. Below are some measurements taken on different USRPs where a Null-Source-Sink RFNoC block is directly connected to a DMA FIFO block to test maximum streaming rates through the DRAM. The DRAM is shared between channels, so throughput goes down as the number of channels going through the DRAM is increased.
 
DRAM performance is highly application-specific. For example, reading vs. reading and writing simultaneously, one data stream vs. multiple data streams, random access vs. sequential access, etc., can give dramatically different performance. Below are some measurements taken on different USRPs where a Null-Source-Sink RFNoC block is directly connected to a DMA FIFO block to test maximum streaming rates through the DRAM. The DRAM is shared between channels, so throughput goes down as the number of channels going through the DRAM is increased.

Revision as of 16:12, 20 April 2022

Configuring the Stream Endpoint Buffer Size in RFNoC

What is the SEP buffer size?

Each stream endpoint (SEP) has an ingress buffer to store data received from others stream endpoints. This size of this buffer affects the data transfer rate that can be achieved when streaming to that endpoint. A larger ingress buffer in the stream endpoint means that there is more space to put data, minimizing idle time on the network. Additionally, streamers can queue up data before it is needed, reducing the chance of a buffer underflow.

How do I set the SEP buffer size?

The stream endpoint buffer size is set by adding a parameter under the endpoint you want to configure in the RFNoC image core YAML file. There are two parameters you can use to set the stream endpoint ingress buffer size in your RFNoC image core YAML file.

  • buff_size: Buffer size in CHDR words. The size in bytes depends on the CHDR width. For example, if the chdr_width parameter for the device is 64, then each CHDR word is 8 bytes. So a buff size of 32768 would be 262,144 bytes or 256 KiB. See here for an example.
  • buff_size_bytes: Buffer size in bytes. See here for an example.

To what value should I set the SEP buffer size?

The buffer size should be a power of two in size to make optimal use of FPGA RAM resources. The default FPGA bitstreams typically set them to the largest size the FPGA can fit in order to maximize performance. Here are some general recommendations:

  • Set to 0 if you don't need to send data to that SEP.
  • Set to 8192 bytes (8 KiB = 1 MTU) minimum in order to stream data packets.
  • Set to 32768 bytes (32 KiB = 4 MTU) in order to stream at maximum rates between SEPs on the same FPGA.
  • Set to 262144 bytes (256 KiB = 32 MTU) or lager for high performance streaming between a host computer and the FPGA.

Note that the requirements are application-dependent, so optimal sizes for your application may be different. MTU refers to the maximum transmission unit, which is the largest CHDR packet supported by the FPGA.

If you need to free up FPGA resources (particularly block RAM) for your application, you can reduce the SEP buffer sizes. Just keep in mind that the maximum streaming rate may be affected.

USRP DRAM

How much and what speed DRAM is available on each USRP?

The table below summarizes the DRAM that is connected to the USRP for use by RFNoC.

USRP DRAM Summary
USRP Model DRAM Size Default DRAM Speed Default User Interface
E31x 512 MiB 16-bit @ 800 MT/s (1.6 GB/s) 2 ch x 64-bit @ 100 MHz
E320 2 GiB 32-bit @ 1333 MT/s (5.33 GB/s) 4 ch x 64-bit @ 300 MHz
N3xx 2 GiB 32-bit @ 1300 MT/s (5.2 GB/s) 4 ch x 64-bit @ 303.819 MHz
X31x 1 GiB 32-bit @ 1200 MT/s (4.8 GB/s) 2 ch x 64-bit @ 300 MHz
X410 (100 and 200 MHz BW) 4 GiB 64-bit @ 2.0 GT/s (16.0 GB/s) 4 x 64-bit @ 250 MHz
X410 (400 MHz BW) 4 GiB per bank
(8 GiB total)
64-bit @ 2.0 GT/s (16.0 GB/s) per bank
(32.0 GB/s total)
4 x 128-bit @ 250 MHz (using 2 banks)

What DRAM data rates can I expect on each USRP?

DRAM performance is highly application-specific. For example, reading vs. reading and writing simultaneously, one data stream vs. multiple data streams, random access vs. sequential access, etc., can give dramatically different performance. Below are some measurements taken on different USRPs where a Null-Source-Sink RFNoC block is directly connected to a DMA FIFO block to test maximum streaming rates through the DRAM. The DRAM is shared between channels, so throughput goes down as the number of channels going through the DRAM is increased.

Example DRAM Throughput
USRP Model BIST (MB/s) 1 Ch (MS/s) 2 Ch (MS/s) 3 Ch (MS/s) 4 Ch (MS/s)
E31x 666 166 91 N/A N/A
E320 1361 340 170 113 85
N3xx 1368 341 295 191 144
X31x 1347 336 115 N/A N/A
X410 (64-bit) 1288 321 316 314 303
X410 (128-bit) 2801 697 672 672 672

Notes:

  1. BIST refers to the built-in self test, which gives a measure of raw data throughput for a single channel.
  2. For MS/s, we assumes 4 bytes per sample.
  3. The 128-bit DRAM on X410 uses two memory banks. Channels 0 and 1 are on Bank 0, and channels 2 and 3 are on Bank 1.

What can the DRAM be used for?

  • DMA FIFO Block: The DMA FIFO block is used in situations where you need a large buffer to store samples.
  • Replay Block: The Replay block is used to record and play back RF data. For example, you can record data from a host computer, then play it back over the radio. Or, record data from the radio, then play it back later to the host for analysis, or play it back to a radio at a specific timestamp. See Using the RFNoC Replay Block in UHD 4 for additional information. The Replay block also has a FIFO capability for situations in which the DMA FIFO block is not available in your FPGA image.
  • Custom Blocks: You can also create your own RFNoC block that uses DRAM. Refer to the DMA FIFO and/or Replay blocks as examples.

How do I add the Replay/DMA FIFO block to my FPGA image?

If the block you want is not included by default in the FPGA image you are using, you can add it to the RFNoC image core YAML file and rebuild the FPGA image using Vivado. See Getting Started with RFNoC in UHD 4.0 for additional information on customizing an RFNoC image.

Note: DRAM is not enabled by default on E31x FPGA builds because the FPGA is not large enough to fit the default image with DRAM. You will need to remove components from your RFNoC image's YAML file to make room, then build the E31x image with the variable DRAM=1 set, or modify the E31x Makefile to enable DRAM by default.

Note: The X410 configures its DRAM differently for 100/200 MHz bandwidth images and 400 MHz bandwidth. The parameters used will be different in each case, as shown in the table below.

When adding the blocks to your RFNoC image core YAML file, the parameters must be set correctly for the type of USRP you intend to use. The memory data width (MEM_DATA_W) and address width (MEM_ADDR_W) must match exactly. The number of ports (NUM_PORTS) must not exceed the maximum number available. You can use fewer ports to save resources if you don't need all the DRAM ports.

RFNoC Block Memory Parameters
USRP Model MEM_DATA_W MEM_ADDR_W NUM_PORTS (Max)
E31x 64 29 2
E320 64 31 4
N3xx 64 31 4
X31x 64 30 2
X410 (100 and 200 MHz BW) 64 32 4
X410 (400 MHz BW) 128 32 4

The DMA FIFO has a few additional parameters that should be provided. The clock rate (MEM_CLK_RATE) must match the value below for the built-in self test (BIST) to work correctly. The base address (FIFO_ADDR_BASE) and address mask (FIFO_ADDR_MASK) are written as Verilog constants and can be changed depending on your application. The FIFO_ADDR_BASE parameter contains the byte address for the first byte of the memory region to use for each port. The FIFO_ADDR_MASK parameter contains the address mask for each port, which tells the FIFO how much memory to use for each port. For example, an address mask of 30'h1FFFFFFF means that 0x1FFFFFFF+1 bytes (i.e., 0x20000000 bytes or 512 MiB) will be used by the corresponding port. The address mask must be 1 less than a power of 2.

The example values in the table below use the entire memory and divide it evenly between all available ports.

DMA FIFO Parameters
USRP Model MEM_CLK_RATE FIFO_ADDR_BASE FIFO_ADDR_MASK
E31x "200e6" "{29'h10000000, 29'h00000000}" "{29'h0FFFFFFF, 29'h0FFFFFFF}"
E320 "300e6" "{31'h60000000, 31'h40000000, 31'h20000000, 31'h00000000}" "{31'h1FFFFFFF, 31'h1FFFFFFF, 31'h1FFFFFFF, 31'h1FFFFFFF}"
N3xx "303819444" "{31'h60000000, 31'h40000000, 31'h20000000, 31'h00000000}" "{31'h1FFFFFFF, 31'h1FFFFFFF, 31'h1FFFFFFF, 31'h1FFFFFFF}"
X31x "300e6" "{30'h20000000, 30'h00000000}" "{30'h1FFFFFFF, 30'h1FFFFFFF}"
X410 (100 and 200 MHz BW) "250e6" "{32'hC0000000, 32'h80000000, 32'h40000000, 32'h00000000}" "{32'h3FFFFFFF, 32'h3FFFFFFF, 32'h3FFFFFFF, 32'h3FFFFFFF}"
X410 (400 MHz BW) "250e6" "{32'h80000000, 32'h00000000, 32'h80000000, 32'h00000000}" "{32'h7FFFFFFF, 32'h7FFFFFFF, 32'h7FFFFFFF, 32'h7FFFFFFF}"

Replay Example

See x310_rfnoc_image_core.yml for an example of how to instantiate the Replay block in the RFNoC image core YAML description. The following is a generic example that can be used for any USRP:

noc_blocks:
  # Instantiate the replay block
  replay0:
    block_desc: 'replay.yml'
    parameters:
      NUM_PORTS: <see table>
      MEM_DATA_W: <see table>
      MEM_ADDR_W: <see table>

connections:
  # Connect the replay block memory interface to the USRP DRAM
  - { srcblk: replay0, srcport: axi_ram, dstblk: _device_, dstport: dram }

Connect the DRAM clock to the block:
clk_domains:
  # Connect the DRAM clock to the replay block
  - { srcblk: _device_, srcport: dram, dstblk: replay0, dstport: mem }

DMA FIFO Example

See e320_rfnoc_image_core.yml for an example of how to instantiate the DMA FIFO block in the RFNoC image core YAML description. The following is a generic example that can be used for any USRP:

noc_blocks:
  # Instantiate the DMA FIFO block
  fifo0:
    block_desc: 'axi_ram_fifo.yml'
    parameters:
      NUM_PORTS: <see table>
      MEM_DATA_W: <see table>
      MEM_ADDR_W: <see table>
      FIFO_ADDR_BASE: <see table>
      FIFO_ADDR_MASK: <see table>
      MEM_CLK_RATE: <see table>

connections:
  # Connect the DMA FIFO block memory interface to the USRP DRAM
  - { srcblk: fifo0, srcport: axi_ram, dstblk: _device_, dstport: dram }

clk_domains:
  # Connect the DRAM clock to the replay block
  - { srcblk: _device_, srcport: dram, dstblk: fifo0,  dstport: mem }


RFNoC Clocks

What clocks are available for me to use?

Each device has different clocks available. See below for a list of clocks exposed to RFNoC. Although they have intended purposes, you can use any of these clocks for any purpose. The rfnoc_chdr_clock is a good default choice. This clock is always available in your block, even if it is not explicitly connected in the RFNoC image YAML description.

What are the clock frequencies?

See the table below for the clock rates. The radio clock rate depends on the master clock rate.

E31x

Clock Name Description Frequency
rfnoc_chdr RFNoC CHDR clock 100 MHz
dram DRAM interface clock 100 MHz
radio Radio interface clock Same as master clock rate

E320

Clock Name Description Frequency
rfnoc_chdr RFNoC CHDR clock 200 MHz
dram DRAM interface clock 166.667 MHz
radio Radio interface clock Same as master clock rate (200 kHz to 61.44 MHz)

N300/N310

Clock Name Description Frequency
rfnoc_chdr RFNoC CHDR clock 200 MHz
dram DRAM interface clock 303.189 MHz
radio Radio interface clock Same as master clock rate (122.88 MHz, 125.0 MHz, or 153.6 MHz)

N32x

Clock Name Description Frequency
rfnoc_chdr RFNoC CHDR clock 187.5 MHz
dram DRAM interface clock 303.819 MHz
radio Radio interface clock Same as master clock rate (200 MHz, 245.76 MHz, or 250 MHz)

X310

Clock Name Description Frequency
rfnoc_chdr RFNoC CHDR clock 187.5 MHz
ce Compute Engine clock 214.286 MHz
dram DRAM interface clock 300 MHz
radio Radio interface clock Same as master clock rate (184.32 MHz or 200 MHz)

X410

Clock Name Description Frequency
rfnoc_chdr RFNoC CHDR clock 200 MHz
dram DRAM interface clock 250 MHz
radio Radio interface clock 122.88 MHz when master clock rate is 122.88, 245.76, or 491.52 MHz
125 MHz when master clock rate is 125, 250, or 500 MHz
radio_2x Radio interface clock 2x Twice the frequency of radio_clk

How do I add a clock with a different frequency?

Adding custom clocks is not directly supported yet. Describing them in the YAML file will not cause them to be generated for you. If you can't use any of the available clocks, you can modify the HDL code to generate a clock.

If you only need the clock within your own RFNoC block, you can modify the HDL for your block to generate the clock that you need from one of the available clocks. To do this, add a new clock to your block's YAML description, connect the available clock to your block in the YAML description of your RFNoC image, then add a Xilinx MMCM IP instance to your block's HDL and connect the available clock to its input.

If the clock is needed by multiple RFNoC blocks, or if you want to change an existing clock, you can modify the HDL for the USRP you are using to add or change a clock. If you add a new clock to the RFNoC image core, you must also update the BSP YAML file (located in <repo>/host/include/uhd/rfnoc/core) so that the rfnoc_image_builder knows that the clock exists. How and where the clocks are generated varies between USRPs. Please refer to the source code for that USRP (<repo>/fpga/usrp3/top).