The PCIe module

The PCIe module handles all PCIe communication. Its task is to forward/transform PCIe transactions for the DMA controller and the MI bus. The architecture of the PCIe module is divided into two main parts: PCIE_CORE and PCIE_CTRL. Its diagram is shown below.

../../../_images/pcie_module_arch.drawio.svg

Note

The PCIe module can support more than one PCIe endpoint. In this case, the individual parts of the PCIe module are appropriately duplicated for each PCIe endpoint. There is also bifurcation support for some PCIe HARD IPs.

Selecting a PCIe configuration

Before running the FPGA firmware compilation, the target PCIe configuration can be selected using the makefile parameter PCIE_CONF. Without this parameter, the card default configuration is automatically selected. Only some FPGA cards support multiple PCIe configurations. If you enter an unsupported value (for example: PCIE_CONF=1xGen1x16), the console will list the supported configurations on the target FPGA card.

Examples of some allowed configurations:

  • PCIE_CONF=1xGen3x16 – Single PCIe slot in Gen3 x16 mode.

  • PCIE_CONF=2xGen4x8x8 – Two PCIe slots in Gen4 x8x8 (bifuracation) mode.

  • PCIE_CONF=2xGen5x8x8 – Two PCIe slots in Gen5 x8x8 (bifuracation) mode.

  • PCIE_CONF=1xGen3x8LL – Single PCIe slot in Gen3 x8 Low-Latency mode (for Xilinx UltraScale+ only).

The PCIe Core (PCIE_CORE)

The PCIe Core varies according to the PCIe Hard IP or FPGA used. The PCIe Core contains the instance(s) of the used PCIe Hard IP, an adapter for converting the AXI/Avalon-ST buses to the MFB buses, the Vendor-Specific Extension Capability (VSEC) registers (implemented in the PCI_EXT_CAP module) containing mainly the DeviceTree firmware description and additional configuration logic. Thus, the main purpose of the PCIe Core is to unify the buses and provide the necessary information about the active PCIe link.

Supported PCIe Hard IP

A list of the supported PCIe Hard IPs is below. You can select the target architecture by setting the NDK parameter PCIE_MOD_ARCH. According to this parameter, the correct PCIE_CORE module variant is used and the VHDL generic PCIE_ENDPOINT_TYPE is set appropriately.

The PCIe Control unit (PCIE_CTRL)

The PCIe Control unit always includes the MI Transaction Controller (MTC), which transforms the associated PCIe memory transactions into read or write requests on the MI bus. In the case of a read request, the MI response is also transformed back into a PCIe completition transaction and sent back to the host PC. PCIe transactions from the BAR0 address space are allocated to the MTC module. If the NDK uses a DMA controller that requires its own BAR, the PCIe transactions from the DMA-BAR address space (BAR2) are routed directly to the DMA module. This functionality must be enabled via the DMA_BAR_ENABLE parameter.

Note

We assume that 64-bit PCIe BARs are used, meaning that half of them are available at most (BAR0, BAR2, and BAR4). You can find more information in the PCIe specification.

By default, this unit also contains the PTC module, which transforms memory requests (in a simplified format) coming from the DMA into the desired PCIe format and vice versa. The PTC module also implements a completion buffer and handles the allocation of the PCIe TAGs, etc. The PTC can be disabled using the PTC_DISABLE parameter, in which case the DMA requests (in the PCIe transaction format) are directly forwarded to the PCIe Hard IP and vice versa.

The PCIe module entity

ENTITY PCIE IS
Generics

Generic

Type

Default

Description

=====

BAR base address configuration

=====

=====

BAR0_BASE_ADDR

std_logic_vector(31 downto 0)

X”01000000”

BAR1_BASE_ADDR

std_logic_vector(31 downto 0)

X”02000000”

BAR2_BASE_ADDR

std_logic_vector(31 downto 0)

X”03000000”

BAR3_BASE_ADDR

std_logic_vector(31 downto 0)

X”04000000”

BAR4_BASE_ADDR

std_logic_vector(31 downto 0)

X”05000000”

BAR5_BASE_ADDR

std_logic_vector(31 downto 0)

X”06000000”

EXP_ROM_BASE_ADDR

std_logic_vector(31 downto 0)

X”0A000000”

=====

MFB configuration

=====

=====

CQ_MFB_REGIONS

natural

2

CQ_MFB_REGION_SIZE

natural

1

CQ_MFB_BLOCK_SIZE

natural

8

CQ_MFB_ITEM_WIDTH

natural

32

RC_MFB_REGIONS

natural

2

RC_MFB_REGION_SIZE

natural

1

RC_MFB_BLOCK_SIZE

natural

8

RC_MFB_ITEM_WIDTH

natural

32

CC_MFB_REGIONS

natural

2

CC_MFB_REGION_SIZE

natural

1

CC_MFB_BLOCK_SIZE

natural

8

CC_MFB_ITEM_WIDTH

natural

32

RQ_MFB_REGIONS

natural

2

RQ_MFB_REGION_SIZE

natural

1

RQ_MFB_BLOCK_SIZE

natural

8

RQ_MFB_ITEM_WIDTH

natural

32

=====

Other configuration

=====

=====

DMA_PORTS

natural

2

Total number of DMA_EP, DMA_EP=PCIE_EP or 2*DMA_EP=PCIE_EP

PCIE_ENDPOINT_TYPE

string

“P_TILE”

Connected PCIe endpoint type

PCIE_ENDPOINT_MODE

natural

0

Connected PCIe endpoint mode: 0=x16, 1=x8x8, 2=x8

PCIE_ENDPOINTS

natural

1

Number of PCIe endpoints

PCIE_CLKS

natural

2

Number of PCIe clocks per PCIe connector

PCIE_CONS

natural

1

Number of PCIe connectors

PCIE_LANES

natural

16

Number of PCIe lanes in each PCIe connector

CARD_ID_WIDTH

natural

0

Width of CARD/FPGA ID number

PTC_DISABLE

boolean

false

Disable PTC module and allows direct connection of the DMA module to the PCIe IP RQ and RC interfaces.

DMA_BAR_ENABLE

boolean

false

Enable CQ/CC interface for DMA-BAR, condition DMA_PORTS=PCIE_ENDPOINTS

XVC_ENABLE

boolean

false

Enable of XCV IP, for Xilinx only

DEVICE

string

“STRATIX10”

FPGA device

Ports

Port

Type

Mode

Description

=====

CLOCKS AND RESETS

=====

=====

PCIE_SYSCLK_P

std_logic_vector(PCIE_CONS*PCIE_CLKS-1 downto 0)

in

Clock from PCIe port, 100 MHz

PCIE_SYSCLK_N

std_logic_vector(PCIE_CONS*PCIE_CLKS-1 downto 0)

in

PCIE_SYSRST_N

std_logic_vector(PCIE_CONS-1 downto 0)

in

PCIe reset from PCIe port

INIT_DONE_N

std_logic

in

nINIT_DONE output of the Reset Release Intel Stratix 10 FPGA IP

PCIE_USER_CLK

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

out

PCIe user clock and reset

PCIE_USER_RESET

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

out

DMA_CLK

std_logic

in

DMA module clock and reset

DMA_RESET

std_logic

in

=====

PCIE SERIAL INTERFACE

=====

=====

PCIE_RX_P

std_logic_vector(PCIE_CONS*PCIE_LANES-1 downto 0)

in

Receive data

PCIE_RX_N

std_logic_vector(PCIE_CONS*PCIE_LANES-1 downto 0)

in

PCIE_TX_P

std_logic_vector(PCIE_CONS*PCIE_LANES-1 downto 0)

out

Transmit data

PCIE_TX_N

std_logic_vector(PCIE_CONS*PCIE_LANES-1 downto 0)

out

=====

Configuration status interface (PCIE_USER_CLK)

=====

=====

PCIE_MPS

slv_array_t(PCIE_ENDPOINTS-1 downto 0)(3-1 downto 0)

out

PCIe maximum payload size

PCIE_MRRS

slv_array_t(PCIE_ENDPOINTS-1 downto 0)(3-1 downto 0)

out

PCIe maximum read request size

PCIE_EXT_TAG_EN

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

out

PCIe extended tag enable (8-bit tag)

PCIE_10B_TAG_REQ_EN

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

out

PCIe 10-bit tag requester enable

PCIE_RCB_SIZE

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

out

PCIe RCB size control

CARD_ID

slv_array_t(PCIE_ENDPOINTS-1 downto 0)(CARD_ID_WIDTH-1 downto 0)

in

Card ID / PCIe Device Serial Number

=====

DMA RQ MFB+MVB interface (PCIE_CLK or DMA_CLK)

=====

PTC ENABLE: MFB+MVB bus for transferring RQ PTC-DMA transactions. MFB+MVB bus is clocked at DMA_CLK. PTC DISABLE: MFB bus only for transferring RQ PCIe transactions (format according to the PCIe IP used). Compared to the standard MFB specification, it does not allow gaps (SRC_RDY=0) inside transactions and requires that the first transaction in a word starts at byte 0. MFB bus is clocked at PCIE_CLK.

DMA_RQ_MFB_DATA

slv_array_t(DMA_PORTS-1 downto 0)(RQ_MFB_REGIONS*RQ_MFB_REGION_SIZE*RQ_MFB_BLOCK_SIZE*RQ_MFB_ITEM_WIDTH-1 downto 0)

in

DMA_RQ_MFB_META

slv_array_t(DMA_PORTS-1 downto 0)(RQ_MFB_REGIONS*PCIE_RQ_META_WIDTH-1 downto 0)

in

DMA_RQ_MFB_SOF

slv_array_t(DMA_PORTS-1 downto 0)(RQ_MFB_REGIONS-1 downto 0)

in

DMA_RQ_MFB_EOF

slv_array_t(DMA_PORTS-1 downto 0)(RQ_MFB_REGIONS-1 downto 0)

in

DMA_RQ_MFB_SOF_POS

slv_array_t(DMA_PORTS-1 downto 0)(RQ_MFB_REGIONS*max(1,log2(RQ_MFB_REGION_SIZE))-1 downto 0)

in

DMA_RQ_MFB_EOF_POS

slv_array_t(DMA_PORTS-1 downto 0)(RQ_MFB_REGIONS*max(1,log2(RQ_MFB_REGION_SIZE*RQ_MFB_BLOCK_SIZE))-1 downto 0)

in

DMA_RQ_MFB_SRC_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

in

DMA_RQ_MFB_DST_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

out

DMA_RQ_MVB_DATA

slv_array_t(DMA_PORTS-1 downto 0)(RQ_MFB_REGIONS*DMA_UPHDR_WIDTH-1 downto 0)

in

DMA_RQ_MVB_VLD

slv_array_t(DMA_PORTS-1 downto 0)(RQ_MFB_REGIONS-1 downto 0)

in

DMA_RQ_MVB_SRC_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

in

DMA_RQ_MVB_DST_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

out

=====

DMA RC MFB+MVB interface (PCIE_CLK or DMA_CLK)

=====

PTC ENABLE: MFB+MVB bus for transferring RC PTC-DMA transactions. MFB+MVB bus is clocked at DMA_CLK. PTC DISABLE: MFB bus only for transferring RC PCIe transactions (format according to the PCIe IP used). Compared to the standard MFB specification, it does not allow gaps (SRC_RDY=0) inside transactions and requires that the first transaction in a word starts at byte 0. MFB bus is clocked at PCIE_CLK.

DMA_RC_MFB_DATA

slv_array_t(DMA_PORTS-1 downto 0)(RC_MFB_REGIONS*RC_MFB_REGION_SIZE*RC_MFB_BLOCK_SIZE*RC_MFB_ITEM_WIDTH-1 downto 0)

out

DMA_RC_MFB_META

slv_array_t(DMA_PORTS-1 downto 0)(RC_MFB_REGIONS*PCIE_RC_META_WIDTH-1 downto 0)

out

DMA_RC_MFB_SOF

slv_array_t(DMA_PORTS-1 downto 0)(RC_MFB_REGIONS-1 downto 0)

out

DMA_RC_MFB_EOF

slv_array_t(DMA_PORTS-1 downto 0)(RC_MFB_REGIONS-1 downto 0)

out

DMA_RC_MFB_SOF_POS

slv_array_t(DMA_PORTS-1 downto 0)(RC_MFB_REGIONS*max(1,log2(RC_MFB_REGION_SIZE))-1 downto 0)

out

DMA_RC_MFB_EOF_POS

slv_array_t(DMA_PORTS-1 downto 0)(RC_MFB_REGIONS*max(1,log2(RC_MFB_REGION_SIZE*RC_MFB_BLOCK_SIZE))-1 downto 0)

out

DMA_RC_MFB_SRC_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

out

DMA_RC_MFB_DST_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

in

DMA_RC_MVB_DATA

slv_array_t(DMA_PORTS-1 downto 0)(RC_MFB_REGIONS*DMA_DOWNHDR_WIDTH-1 downto 0)

out

DMA_RC_MVB_VLD

slv_array_t(DMA_PORTS-1 downto 0)(RC_MFB_REGIONS-1 downto 0)

out

DMA_RC_MVB_SRC_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

out

DMA_RC_MVB_DST_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

in

=====

DMA CQ MFB interface - DMA-BAR (PCIE_CLK)

=====

MFB bus for transferring CQ DMA-BAR PCIe transactions (format according to the PCIe IP used). Compared to the standard MFB specification, it does not allow gaps (SRC_RDY=0) inside transactions and requires that the first transaction in a word starts at byte 0.

DMA_CQ_MFB_DATA

slv_array_t(DMA_PORTS-1 downto 0)(CQ_MFB_REGIONS*CQ_MFB_REGION_SIZE*CQ_MFB_BLOCK_SIZE*CQ_MFB_ITEM_WIDTH-1 downto 0)

out

DMA_CQ_MFB_META

slv_array_t(DMA_PORTS-1 downto 0)(CQ_MFB_REGIONS*PCIE_CQ_META_WIDTH-1 downto 0)

out

DMA_CQ_MFB_SOF

slv_array_t(DMA_PORTS-1 downto 0)(CQ_MFB_REGIONS-1 downto 0)

out

DMA_CQ_MFB_EOF

slv_array_t(DMA_PORTS-1 downto 0)(CQ_MFB_REGIONS-1 downto 0)

out

DMA_CQ_MFB_SOF_POS

slv_array_t(DMA_PORTS-1 downto 0)(CQ_MFB_REGIONS*max(1,log2(CQ_MFB_REGION_SIZE))-1 downto 0)

out

DMA_CQ_MFB_EOF_POS

slv_array_t(DMA_PORTS-1 downto 0)(CQ_MFB_REGIONS*max(1,log2(CQ_MFB_REGION_SIZE*CQ_MFB_BLOCK_SIZE))-1 downto 0)

out

DMA_CQ_MFB_SRC_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

out

DMA_CQ_MFB_DST_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

in

=====

PCIE CC MFB interface - DMA-BAR (PCIE_CLK)

=====

MFB bus for transferring CC DMA-BAR PCIe transactions (format according to the PCIe IP used). Compared to the standard MFB specification, it does not allow gaps (SRC_RDY=0) inside transactions and requires that the first transaction in a word starts at byte 0.

DMA_CC_MFB_DATA

slv_array_t(DMA_PORTS-1 downto 0)(CC_MFB_REGIONS*CC_MFB_REGION_SIZE*CC_MFB_BLOCK_SIZE*CC_MFB_ITEM_WIDTH-1 downto 0)

in

DMA_CC_MFB_META

slv_array_t(DMA_PORTS-1 downto 0)(CC_MFB_REGIONS*PCIE_CC_META_WIDTH-1 downto 0)

in

DMA_CC_MFB_SOF

slv_array_t(DMA_PORTS-1 downto 0)(CC_MFB_REGIONS-1 downto 0)

in

DMA_CC_MFB_EOF

slv_array_t(DMA_PORTS-1 downto 0)(CC_MFB_REGIONS-1 downto 0)

in

DMA_CC_MFB_SOF_POS

slv_array_t(DMA_PORTS-1 downto 0)(CC_MFB_REGIONS*max(1,log2(CC_MFB_REGION_SIZE))-1 downto 0)

in

DMA_CC_MFB_EOF_POS

slv_array_t(DMA_PORTS-1 downto 0)(CC_MFB_REGIONS*max(1,log2(CC_MFB_REGION_SIZE*CC_MFB_BLOCK_SIZE))-1 downto 0)

in

DMA_CC_MFB_SRC_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

in

DMA_CC_MFB_DST_RDY

std_logic_vector(DMA_PORTS-1 downto 0)

out

=====

MI32 interfaces (MI_CLK)

=====

MI - Root of the MI32 bus tree for each PCIe endpoint (connection to the MTC) MI_DBG - MI interface to PCIe registers (currently only debug registers)

MI_CLK

std_logic

in

MI_RESET

std_logic

in

MI_DWR

slv_array_t (PCIE_ENDPOINTS-1 downto 0)(32-1 downto 0)

out

MI_ADDR

slv_array_t (PCIE_ENDPOINTS-1 downto 0)(32-1 downto 0)

out

MI_BE

slv_array_t (PCIE_ENDPOINTS-1 downto 0)(32/8-1 downto 0)

out

MI_RD

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

out

MI_WR

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

out

MI_DRD

slv_array_t (PCIE_ENDPOINTS-1 downto 0)(32-1 downto 0)

in

MI_ARDY

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

in

MI_DRDY

std_logic_vector(PCIE_ENDPOINTS-1 downto 0)

in

MI_DBG_DWR

std_logic_vector(32-1 downto 0)

in

MI_DBG_ADDR

std_logic_vector(32-1 downto 0)

in

MI_DBG_BE

std_logic_vector(32/8-1 downto 0)

in

MI_DBG_RD

std_logic

in

MI_DBG_WR

std_logic

in

MI_DBG_DRD

std_logic_vector(32-1 downto 0)

out

MI_DBG_ARDY

std_logic

out

MI_DBG_DRDY

std_logic

out