Performance samples
A description of the Tensor Compute Unit performance samples
Performance sampling
The program counter and decoder control bus handshake signals can be sampled at a fixed interval of L cycles in order to measure system performance. The samples are written out to the sample IO bus in blocks of N sample words. The block is terminated by asserting the AXI stream TLAST signal. Each sample word is a 64-bit word, with the following meaning:
Bus name | Signal | Bit field(s) | Comments |
---|---|---|---|
Program counter | 0:31 | Contains all 1s if the sample is invalid. Invalid samples are produced when the sampling interval is set to 0. | |
Array | Valid | 32 | Contains all 0s if the sample is invalid. |
Ready | 33 | ||
Acc | Valid | 34 | |
Ready | 35 | ||
Dataflow | Valid | 36 | |
Ready | 37 | ||
DRAM1 | Valid | 38 | |
Ready | 39 | ||
DRAM0 | Valid | 40 | |
Ready | 41 | ||
MemPortB | Valid | 42 | |
Ready | 43 | ||
MemPortA | Valid | 44 | |
Ready | 45 | ||
Instruction | Valid | 46 | |
Ready | 47 | ||
<unused> | 48:64 |
Value of L can be changed by setting the configuration register. Value of N is defined by architecture.
Last modified March 23, 2022: Add opset doc and break up compiler doc (5edaf53)