5
10
15
20
25
30
35
40
8
9
0.01
2
3
4
5
6
7
8
9
0.1
2
3
4
5
6
7
8
9
1
2048 context
2048 context
4096 context
8192 context
16384 context
32768 context
65536 context
100
200
300
400
500
batch size
Latency-throughput Pareto frontier for LLAMA 3-70B on TPU v5e 4x4 in int8
Per-token Latency (ms)
Throughput (tokens / ms / chip)
plotly-logomark