510152025303540890.01234567890.1234567891
2048 context2048 context4096 context8192 context16384 context32768 context65536 context100200300400500batch sizeLatency-throughput Pareto frontier for LLAMA 3-70B on TPU v5e 4x4 in int8Per-token Latency (ms)Throughput (tokens / ms / chip)