“After the dramas of the GA100 and GH100, it looks like the GB100 will finally use MCM.” @kopite7kimi “Maybe GB100=2*GB102,” he wrote in X’s post.
Nvidia’s GA100 and GH100 GPUs have a die size of 826mm² and 814mm², which is very close to the maximum reticle size of 858mm². Producing such large chips with good yields is difficult, but TSMC seems to be doing it well since Nvidia is literally shipping tons of H100 and A100 GPUs every quarter.
While Nvidia has consistently managed to increase the performance of its GPUs quite measurably with each new generation so far, network size is still an issue. Using a multi-chip design will enable Nvidia to add more transistors to its next-generation compute GPU and increase performance gains on the H100 more than it could with architectural improvements alone.
Since both AMD and Intel have already adopted multi-chip designs for their GPUs, and will only increase the number of chips and transistors in the future, Nvidia may have no choice but to adopt a multi-tile design as well. It remains to be seen whether the company will take a dual-chip approach (like AMD’s Instinct MI250) or a multi-chip approach (like AMD’s Instinct MI300 or Intel’s Ponte Vecchio), but the company can’t ignore the advantages offered by modern packaging technologies.
If we continue to speculate, we may assume that Nvidia will only adopt a multi-tile design for its Blackwell GPUs for AI and HPC computing, while its gaming GPUs will remain monolithic. This would make sense since getting two GPUs to work in parallel is difficult. But then again, Nvidia may not be able to ignore multi-chip designs even for client PCs in the High-NA era since next-generation ASML scanners will cut the reticle size in half (to 429 mm2) and Nvidia will no longer be able to handle high-resolution . Ultimate gaming machines with monstrous monolithic GPUs like the AD102 (609mm2) unless they use at least two microchips.