NVIDIA has announced the launch of the NVIDIA HGX H200, a new addition to its AI computing platform. The H200, which is based on the NVIDIA Hopper architecture, is equipped with the NVIDIA H200 Tensor Core GPU. This new chip is designed to handle large amounts of data for generative AI and high-performance computing (HPC) workloads.
The NVIDIA H200 is notable for being the first GPU to incorporate HBM3e memory technology. HBM3e provides faster and larger memory capabilities, crucial for generative AI applications and large language models. The H200 offers 141GB of memory and achieves a speed of 4.8 terabytes per second. This represents a significant increase over its predecessor, the NVIDIA A100, in both capacity and bandwidth.
Systems powered by H200 are expected to be available from leading server manufacturers and cloud service providers starting in the second quarter of 2024.
Ian Buck, NVIDIA’s vice president of hyperscale and HPC, stated that the H200 facilitates efficient processing of vast amounts of data at high speeds, which is essential for generative AI and HPC applications.
The NVIDIA Hopper architecture, which underlies the H200, has been recognized for delivering substantial performance improvements over previous generations. Continuous software enhancements, including new open-source libraries like NVIDIA TensorRT™-LLM, contribute to these advancements. The H200 is expected to further increase performance, including a significant boost in inference speed on large language models.
NVIDIA has made the H200 available in several form factors. It will be offered in NVIDIA HGX H200 server boards, which come in four- and eight-way configurations. These are compatible with both hardware and software of the HGX H100 systems. Additionally, the H200 is available in the NVIDIA GH200 Grace Hopper™ Superchip with HBM3e.
The H200 is designed for deployment in various data center environments, including on-premises, cloud, hybrid-cloud, and edge. Several leading server manufacturers will offer systems updated with H200, and major cloud service providers like Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure plan to deploy H200-based instances starting next year.
Featuring NVIDIA NVLink and NVSwitch high-speed interconnects, the HGX H200 is touted to provide high performance across various application workloads. An eight-way HGX H200 configuration offers over 32 petaflops of FP8 deep learning compute and 1.1TB of aggregate high-bandwidth memory. In combination with NVIDIA Grace™ CPUs, the H200 forms the GH200 Grace Hopper Superchip, an integrated module for large-scale HPC and AI applications.
NVIDIA also emphasizes the role of its full-stack software in supporting the new chip. This software suite enables developers and enterprises to create and accelerate AI and HPC applications.
The NVIDIA H200 is slated for availability through global system manufacturers and cloud service providers from the second quarter of 2024.