Difference between revisions of "RFNoC Frequently Asked Questions"
(Add DRAM throughput numbers.) |
m |
||
Line 49: | Line 49: | ||
|} | |} | ||
− | === What data rates can I expect on each USRP? === | + | === What DRAM data rates can I expect on each USRP? === |
DRAM performance is highly application-specific. For example, reading vs. reading and writing simultaneously, one data stream vs. multiple data streams, random access vs. sequential access, etc., can give dramatically different performance. Below are some measurements taken on different USRPs where a Null-Source-Sink RFNoC block is directly connected to a DMA FIFO block to test maximum streaming rates through the DRAM. The DRAM is shared between channels, so throughput goes down as the number of channels going through the DRAM is increased. | DRAM performance is highly application-specific. For example, reading vs. reading and writing simultaneously, one data stream vs. multiple data streams, random access vs. sequential access, etc., can give dramatically different performance. Below are some measurements taken on different USRPs where a Null-Source-Sink RFNoC block is directly connected to a DMA FIFO block to test maximum streaming rates through the DRAM. The DRAM is shared between channels, so throughput goes down as the number of channels going through the DRAM is increased. |
Revision as of 16:12, 20 April 2022
Configuring the Stream Endpoint Buffer Size in RFNoC
What is the SEP buffer size?
Each stream endpoint (SEP) has an ingress buffer to store data received from others stream endpoints. This size of this buffer affects the data transfer rate that can be achieved when streaming to that endpoint. A larger ingress buffer in the stream endpoint means that there is more space to put data, minimizing idle time on the network. Additionally, streamers can queue up data before it is needed, reducing the chance of a buffer underflow.
How do I set the SEP buffer size?
The stream endpoint buffer size is set by adding a parameter under the endpoint you want to configure in the RFNoC image core YAML file. There are two parameters you can use to set the stream endpoint ingress buffer size in your RFNoC image core YAML file.
-
buff_size
: Buffer size in CHDR words. The size in bytes depends on the CHDR width. For example, if thechdr_width
parameter for the device is 64, then each CHDR word is 8 bytes. So a buff size of 32768 would be 262,144 bytes or 256 KiB. See here for an example. -
buff_size_bytes
: Buffer size in bytes. See here for an example.
To what value should I set the SEP buffer size?
The buffer size should be a power of two in size to make optimal use of FPGA RAM resources. The default FPGA bitstreams typically set them to the largest size the FPGA can fit in order to maximize performance. Here are some general recommendations:
- Set to
0
if you don't need to send data to that SEP. - Set to
8192
bytes (8 KiB = 1 MTU) minimum in order to stream data packets. - Set to
32768
bytes (32 KiB = 4 MTU) in order to stream at maximum rates between SEPs on the same FPGA. - Set to
262144
bytes (256 KiB = 32 MTU) or lager for high performance streaming between a host computer and the FPGA.
Note that the requirements are application-dependent, so optimal sizes for your application may be different. MTU refers to the maximum transmission unit, which is the largest CHDR packet supported by the FPGA.
If you need to free up FPGA resources (particularly block RAM) for your application, you can reduce the SEP buffer sizes. Just keep in mind that the maximum streaming rate may be affected.
USRP DRAM
How much and what speed DRAM is available on each USRP?
The table below summarizes the DRAM that is connected to the USRP for use by RFNoC.
USRP Model | DRAM Size | Default DRAM Speed | Default User Interface |
---|---|---|---|
E31x | 512 MiB | 16-bit @ 800 MT/s (1.6 GB/s) | 2 ch x 64-bit @ 100 MHz |
E320 | 2 GiB | 32-bit @ 1333 MT/s (5.33 GB/s) | 4 ch x 64-bit @ 300 MHz |
N3xx | 2 GiB | 32-bit @ 1300 MT/s (5.2 GB/s) | 4 ch x 64-bit @ 303.819 MHz |
X31x | 1 GiB | 32-bit @ 1200 MT/s (4.8 GB/s) | 2 ch x 64-bit @ 300 MHz |
X410 (100 and 200 MHz BW) | 4 GiB | 64-bit @ 2.0 GT/s (16.0 GB/s) | 4 x 64-bit @ 250 MHz |
X410 (400 MHz BW) | 4 GiB per bank (8 GiB total) |
64-bit @ 2.0 GT/s (16.0 GB/s) per bank (32.0 GB/s total) |
4 x 128-bit @ 250 MHz (using 2 banks) |
What DRAM data rates can I expect on each USRP?
DRAM performance is highly application-specific. For example, reading vs. reading and writing simultaneously, one data stream vs. multiple data streams, random access vs. sequential access, etc., can give dramatically different performance. Below are some measurements taken on different USRPs where a Null-Source-Sink RFNoC block is directly connected to a DMA FIFO block to test maximum streaming rates through the DRAM. The DRAM is shared between channels, so throughput goes down as the number of channels going through the DRAM is increased.
USRP Model | BIST (MB/s) | 1 Ch (MS/s) | 2 Ch (MS/s) | 3 Ch (MS/s) | 4 Ch (MS/s) |
---|---|---|---|---|---|
E31x | 666 | 166 | 91 | N/A | N/A |
E320 | 1361 | 340 | 170 | 113 | 85 |
N3xx | 1368 | 341 | 295 | 191 | 144 |
X31x | 1347 | 336 | 115 | N/A | N/A |
X410 (64-bit) | 1288 | 321 | 316 | 314 | 303 |
X410 (128-bit) | 2801 | 697 | 672 | 672 | 672 |
Notes:
- BIST refers to the built-in self test, which gives a measure of raw data throughput for a single channel.
- For MS/s, we assumes 4 bytes per sample.
- The 128-bit DRAM on X410 uses two memory banks. Channels 0 and 1 are on Bank 0, and channels 2 and 3 are on Bank 1.
What can the DRAM be used for?
- DMA FIFO Block: The DMA FIFO block is used in situations where you need a large buffer to store samples.
- Replay Block: The Replay block is used to record and play back RF data. For example, you can record data from a host computer, then play it back over the radio. Or, record data from the radio, then play it back later to the host for analysis, or play it back to a radio at a specific timestamp. See Using the RFNoC Replay Block in UHD 4 for additional information. The Replay block also has a FIFO capability for situations in which the DMA FIFO block is not available in your FPGA image.
- Custom Blocks: You can also create your own RFNoC block that uses DRAM. Refer to the DMA FIFO and/or Replay blocks as examples.
How do I add the Replay/DMA FIFO block to my FPGA image?
If the block you want is not included by default in the FPGA image you are using, you can add it to the RFNoC image core YAML file and rebuild the FPGA image using Vivado. See Getting Started with RFNoC in UHD 4.0 for additional information on customizing an RFNoC image.
Note: DRAM is not enabled by default on E31x FPGA builds because the FPGA is not large enough to fit the default image with DRAM. You will need to remove components from your RFNoC image's YAML file to make room, then build the E31x image with the variable DRAM=1 set, or modify the E31x Makefile to enable DRAM by default.
Note: The X410 configures its DRAM differently for 100/200 MHz bandwidth images and 400 MHz bandwidth. The parameters used will be different in each case, as shown in the table below.
When adding the blocks to your RFNoC image core YAML file, the parameters must be set correctly for the type of USRP you intend to use. The memory data width (MEM_DATA_W
) and address width (MEM_ADDR_W
) must match exactly. The number of ports (NUM_PORTS
) must not exceed the maximum number available. You can use fewer ports to save resources if you don't need all the DRAM ports.
USRP Model | MEM_DATA_W | MEM_ADDR_W | NUM_PORTS (Max) |
---|---|---|---|
E31x | 64 | 29 | 2 |
E320 | 64 | 31 | 4 |
N3xx | 64 | 31 | 4 |
X31x | 64 | 30 | 2 |
X410 (100 and 200 MHz BW) | 64 | 32 | 4 |
X410 (400 MHz BW) | 128 | 32 | 4 |
The DMA FIFO has a few additional parameters that should be provided. The clock rate (MEM_CLK_RATE
) must match the value below for the built-in self test (BIST) to work correctly. The base address (FIFO_ADDR_BASE
) and address mask (FIFO_ADDR_MASK
) are written as Verilog constants and can be changed depending on your application. The FIFO_ADDR_BASE
parameter contains the byte address for the first byte of the memory region to use for each port. The FIFO_ADDR_MASK
parameter contains the address mask for each port, which tells the FIFO how much memory to use for each port. For example, an address mask of 30'h1FFFFFFF
means that 0x1FFFFFFF+1 bytes (i.e., 0x20000000 bytes or 512 MiB) will be used by the corresponding port. The address mask must be 1 less than a power of 2.
The example values in the table below use the entire memory and divide it evenly between all available ports.
USRP Model | MEM_CLK_RATE | FIFO_ADDR_BASE | FIFO_ADDR_MASK |
---|---|---|---|
E31x | "200e6" | "{29'h10000000, 29'h00000000}" | "{29'h0FFFFFFF, 29'h0FFFFFFF}" |
E320 | "300e6" | "{31'h60000000, 31'h40000000, 31'h20000000, 31'h00000000}" | "{31'h1FFFFFFF, 31'h1FFFFFFF, 31'h1FFFFFFF, 31'h1FFFFFFF}" |
N3xx | "303819444" | "{31'h60000000, 31'h40000000, 31'h20000000, 31'h00000000}" | "{31'h1FFFFFFF, 31'h1FFFFFFF, 31'h1FFFFFFF, 31'h1FFFFFFF}" |
X31x | "300e6" | "{30'h20000000, 30'h00000000}" | "{30'h1FFFFFFF, 30'h1FFFFFFF}" |
X410 (100 and 200 MHz BW) | "250e6" | "{32'hC0000000, 32'h80000000, 32'h40000000, 32'h00000000}" | "{32'h3FFFFFFF, 32'h3FFFFFFF, 32'h3FFFFFFF, 32'h3FFFFFFF}" |
X410 (400 MHz BW) | "250e6" | "{32'h80000000, 32'h00000000, 32'h80000000, 32'h00000000}" | "{32'h7FFFFFFF, 32'h7FFFFFFF, 32'h7FFFFFFF, 32'h7FFFFFFF}" |
Replay Example
See x310_rfnoc_image_core.yml for an example of how to instantiate the Replay block in the RFNoC image core YAML description. The following is a generic example that can be used for any USRP:
noc_blocks: # Instantiate the replay block replay0: block_desc: 'replay.yml' parameters: NUM_PORTS: <see table> MEM_DATA_W: <see table> MEM_ADDR_W: <see table> connections: # Connect the replay block memory interface to the USRP DRAM - { srcblk: replay0, srcport: axi_ram, dstblk: _device_, dstport: dram } Connect the DRAM clock to the block: clk_domains: # Connect the DRAM clock to the replay block - { srcblk: _device_, srcport: dram, dstblk: replay0, dstport: mem }
DMA FIFO Example
See e320_rfnoc_image_core.yml for an example of how to instantiate the DMA FIFO block in the RFNoC image core YAML description. The following is a generic example that can be used for any USRP:
noc_blocks: # Instantiate the DMA FIFO block fifo0: block_desc: 'axi_ram_fifo.yml' parameters: NUM_PORTS: <see table> MEM_DATA_W: <see table> MEM_ADDR_W: <see table> FIFO_ADDR_BASE: <see table> FIFO_ADDR_MASK: <see table> MEM_CLK_RATE: <see table> connections: # Connect the DMA FIFO block memory interface to the USRP DRAM - { srcblk: fifo0, srcport: axi_ram, dstblk: _device_, dstport: dram } clk_domains: # Connect the DRAM clock to the replay block - { srcblk: _device_, srcport: dram, dstblk: fifo0, dstport: mem }
RFNoC Clocks
What clocks are available for me to use?
Each device has different clocks available. See below for a list of clocks exposed to RFNoC. Although they have intended purposes, you can use any of these clocks for any purpose. The rfnoc_chdr_clock
is a good default choice. This clock is always available in your block, even if it is not explicitly connected in the RFNoC image YAML description.
What are the clock frequencies?
See the table below for the clock rates. The radio clock rate depends on the master clock rate.
E31x
Clock Name | Description | Frequency |
---|---|---|
rfnoc_chdr |
RFNoC CHDR clock | 100 MHz |
dram |
DRAM interface clock | 100 MHz |
radio |
Radio interface clock | Same as master clock rate |
E320
Clock Name | Description | Frequency |
---|---|---|
rfnoc_chdr |
RFNoC CHDR clock | 200 MHz |
dram |
DRAM interface clock | 166.667 MHz |
radio |
Radio interface clock | Same as master clock rate (200 kHz to 61.44 MHz) |
N300/N310
Clock Name | Description | Frequency |
---|---|---|
rfnoc_chdr |
RFNoC CHDR clock | 200 MHz |
dram |
DRAM interface clock | 303.189 MHz |
radio |
Radio interface clock | Same as master clock rate (122.88 MHz, 125.0 MHz, or 153.6 MHz) |
N32x
Clock Name | Description | Frequency |
---|---|---|
rfnoc_chdr |
RFNoC CHDR clock | 187.5 MHz |
dram |
DRAM interface clock | 303.819 MHz |
radio |
Radio interface clock | Same as master clock rate (200 MHz, 245.76 MHz, or 250 MHz) |
X310
Clock Name | Description | Frequency |
---|---|---|
rfnoc_chdr |
RFNoC CHDR clock | 187.5 MHz |
ce |
Compute Engine clock | 214.286 MHz |
dram |
DRAM interface clock | 300 MHz |
radio |
Radio interface clock | Same as master clock rate (184.32 MHz or 200 MHz) |
X410
Clock Name | Description | Frequency |
---|---|---|
rfnoc_chdr |
RFNoC CHDR clock | 200 MHz |
dram |
DRAM interface clock | 250 MHz |
radio |
Radio interface clock | 122.88 MHz when master clock rate is 122.88, 245.76, or 491.52 MHz 125 MHz when master clock rate is 125, 250, or 500 MHz |
radio_2x |
Radio interface clock 2x | Twice the frequency of radio_clk
|
How do I add a clock with a different frequency?
Adding custom clocks is not directly supported yet. Describing them in the YAML file will not cause them to be generated for you. If you can't use any of the available clocks, you can modify the HDL code to generate a clock.
If you only need the clock within your own RFNoC block, you can modify the HDL for your block to generate the clock that you need from one of the available clocks. To do this, add a new clock to your block's YAML description, connect the available clock to your block in the YAML description of your RFNoC image, then add a Xilinx MMCM IP instance to your block's HDL and connect the available clock to its input.
If the clock is needed by multiple RFNoC blocks, or if you want to change an existing clock, you can modify the HDL for the USRP you are using to add or change a clock. If you add a new clock to the RFNoC image core, you must also update the BSP YAML file (located in <repo>/host/include/uhd/rfnoc/core) so that the rfnoc_image_builder
knows that the clock exists. How and where the clocks are generated varies between USRPs. Please refer to the source code for that USRP (<repo>/fpga/usrp3/top).