Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 4909

Compute Module • Re: What is performance of the VideoCore VII GPU in the CM5?

$
0
0
something that is always overlooked, and the pi5/pi500/cm5 has begun to lock down is the 3rd compute source in the SoC, the VPU!

the VPU is a dual-core cpu, with 32 GP regs at 32bits each (some taken by special purposes like stack, status, pc, much like arm)
unlike arm, those 32bit GP regs can be int or float, entirely up to which opcode your using
but the hidden power, is the vector opcodes, each core has a `uint8_t[64][64]` of vector space, a full 4kb!
all vector opcodes operate on a `[16]` slice of it, and can combine 1/2/4 elements to create 8/16/32bit values

Code:

uint48_t accumulator[16];uint16_t a[16], b[16];uint32_t c[16];for (int i=0; i<16; i++) {  int temp = a[i] * b[i];  if (store) c[i] = temp;  if (accumulate) accumulator[i] += temp;}
for the mult opcode, it can take a pair of `uint16_t[16]`s and mult each element with the matching element in the other array, and produce up to 32bits of output
the entire chunk of code above, can be ran in 2 clock cycles, at the full clock speed (500mhz on pi2-pi4 i believe, 750mhz on the pi5)

so the VPU can add about 4GOPS of integer compute, 16bit for mults, 32bit for all other opcodes

Statistics: Posted by cleverca22 — Thu Dec 19, 2024 1:26 am



Viewing all articles
Browse latest Browse all 4909

Trending Articles