Source code: link.
AXI4[-Stream] In Flits
To handle mismatched sizes of AXI4 protocol flits and actual flits to be sent, a serializer & deserializer for each master & slave device is needed. For example, for a AXI4-Stream request flit of 101+5 bits (5 is metadata, including valid bit, tail bit, etc.) and actual flit width of 38+5 bits, a single request needs to be split into 3 small flits by the serializer on the sender side and then small flits can be reassembled to a complete AXI request on the receiver side.
What makes things more complicated is that in the case that two master devices send data to a single slave device, flits may interleave (when virtual link is not enabled in CONNECT, or in more general NoCs). Thus we need to encode source ID into the small flit such that the deserializer can distinguish interleaving flits. Fortunately, both AXI4 and AXI4-Stream supports interleaving requests and responses on the protocol level, so we don’t need to store the whole packet, whose size is unknown at the beginnning. Namely, for every single transfer in AXI4[-Stream], we set the tail bit to 1.
For the serializer, the number of flits is ceil(IN_FLIT_DATA_WIDTH / OUT_FLIT_EFF_DATA_WIDTH)
, where OUT_FLIT_EFF_DATA_WIDTH = OUT_FLIT_DATA_WIDTH - SRC_BITS
. Similarly, for the deserializer, the number of flits is ceil(OUT_FLIT_DATA_WIDTH / IN_FLIT_EFF_DATA_WIDTH)
, where IN_FLIT_EFF_DATA_WIDTH = IN_FLIT_DATA_WIDTH - SRC_BITS
. The structure of small flit is shown as follows,
1
2
3
4
-----------------------------------------------------------------------------------
| <valid bit> | <is_tail> | <destination> | <virtual channel> | <source> | <data> |
-----------------------------------------------------------------------------------
1 1 `DEST_BITS `VC_BITS `SRC_BITS OUT_FLIT_EFF_DATA_WIDTH
To handle interleaving flits, currently we adopt only a simple solution, as we know that flits are in-order and we know the number of senders. We make a table with one row per source ID and just fill in the table with successive flits until the tail flit arrives. At that moment the flits can be reassembled into a AXI4 flit and be processed by a later stage. We also implement a priority encoder as an arbiter to handle multiple valid reassembled flits in a single cycle. See the code below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Use a priority encoder to decide which flit to send
logic [`SRC_BITS - 1 : 0] out_idx;
always_comb begin
out_idx = 0;
for (int i = `NUM_USER_SEND_PORTS - 1; i >= 0; i--) begin
if (out_valid[i])
out_idx = i;
end
end
always_comb begin
for (int i = 0; i < `NUM_USER_SEND_PORTS; i++) begin
out_ready[i] = 0;
out_flit = 0;
end
out_ready[out_idx] = out_flit_ready;
out_flit = {flit_meta_reg[out_idx], flit_data_reg[out_idx][OUT_FLIT_DATA_WIDTH - 1 : 0]};
end
However, if the flits arrive out-of-order, we need to have much more complex solutions, e.g., the reassembler to process out-of-order TCP packets in Pigasus.
Block Diagram
AXI4:
1
2
3
4
5
6
7
+--------+ +------------+ +--------------+ +-------------+ +---------+
| Device | <--AXI4--> | AXI4Bridge | ---flits--> | Serializer | ---flits--> | InPortFIFO | ---flits--> | |
+--------+ +------------+ 97 bits +--------------+ n bits +-------------+ n bits | |
^ | Network |
| +--------------+ +-------------+ | |
+-----------flits--- | Deserializer | <--flits--- | OutPortFIFO | <--flits--- | |
97 bits +--------------+ n bits +-------------+ n bits +---------+
AXI4-Stream:
1
2
3
4
5
6
7
+---------------+ +------------+ +--------------+ +-------------+ +---------+
| Master Device | ---AXI4-Stream--> | AXI4Bridge | ---flits--> | Serializer | ---flits--> | InPortFIFO | ---flits--> | |
+---------------+ +------------+ 106 bits +--------------+ n bits +-------------+ n bits | |
| Network |
+---------------+ +------------+ +--------------+ +-------------+ | |
| Slave Device | <--AXI4-Stream--- | AXI4Bridge | <--flits--- | Deserializer | <--flits--- | OutPortFIFO | <--flits--- | |
+---------------+ +------------+ 106 bits +--------------+ n bits +-------------+ n bits +---------+