With its systems approach to chips, Microsoft aims to design everything “from silicon to service” to meet the demand for AI
“Microsoft is building the infrastructure to support AI innovation, and we are reimagining every aspect of our data centers to meet the needs of our customers,” said Scott Guthrie, executive vice president of Microsoft’s Cloud + AI Group. “At the scale at which we operate, it is important for us to optimize and integrate every layer of the infrastructure stack to maximize performance, diversify our supply chain and give customers infrastructure choice.”
Optimize every layer of the stack
Chips are the backbone of the cloud. They control billions of transistors that process the massive streams of ones and zeros flowing through data centers. This eventually lets you do almost everything on your screen, from sending an email to creating an image in Bing with a simple sentence.
Just as building a house lets you control every design choice and detail, Microsoft sees adding on-premises chips as a way to ensure every element is designed specifically for Microsoft’s cloud and AI workloads. The chips will be placed on custom server boards, housed inside specially designed racks that fit easily into Microsoft’s existing data centers. The hardware will work alongside the software, designed together to open up new capabilities and opportunities.
The ultimate goal is an Azure hardware system that provides maximum flexibility and can also be optimized in terms of power, performance, sustainability or cost, said Rani Borkar, the company’s vice president of Azure Hardware Systems and Infrastructure (AHSI).
“Software is our core strength, but frankly, we are a systems company. At Microsoft, we co-design and optimize hardware and software together so that one plus one is greater than two,” Borkar said. “We have a clear vision of the entire stack, and silicon is just one of the components.” “
At Microsoft Ignite, the company also announced the general availability of one of these key components: Azure Boost, a system that makes storage and networking faster by moving those operations from host servers to purpose-built hardware and software.
To complement its dedicated silicon efforts, Microsoft also announced that it is expanding its industry partnerships to provide more infrastructure options to customers. Microsoft has launched a preview of its new NC H100 v5 series of virtual machines designed for NVIDIA H100 Tensor Core GPUs, delivering greater performance, reliability and efficiency for mid-range AI training and generative AI inference. Microsoft will also add the latest NVIDIA H200 Tensor Core GPU to its fleet next year to support inference on larger models without an increase in latency.
The company also announced that it will add AMD MI300X accelerated VMs to Azure. The ND MI300 virtual machines are designed to accelerate processing of AI workloads for high-scale AI model training and generative inference, and will feature AMD’s latest GPU, the AMD Instinct MI300X.
By adding first-party silicon to the growing ecosystem of chips and devices from industry partners, Microsoft will be able to offer more choices in price and performance to its customers, Borkar said.
“Customer obsession means we deliver what is best for our customers, and that means taking what is available in the ecosystem as well as what we have developed,” she said. “We will continue to work with all our partners to deliver what the customer wants.”
Advanced hardware and software
The company’s new Maia 100 AI Accelerator will power some of the largest internal AI workloads running on Microsoft Azure. Additionally, OpenAI provided feedback on Azure Maia, and Microsoft’s deep insights into how OpenAI workloads run on infrastructure designed for its large language models are helping to inform Microsoft’s future designs.
“Since our first partnership with Microsoft, we’ve collaborated to co-engineer Azure’s AI infrastructure at every layer to meet our unprecedented modeling and training needs,” said Sam Altman, CEO of OpenAI. “We were excited when Microsoft first shared its designs for the Maia chip, and we worked together to improve and test them with our models. Azure’s end-to-end AI architecture, now optimized down to silicon with Maia, paves the way to train more capable models and make those models cheaper for our customers.
The Maia 100 AI Accelerator is also designed specifically for the Azure device family, said Brian Harry, a Microsoft technical fellow who leads the Azure Maia team. This vertical integration — that is, aligning the chip design with the larger AI infrastructure designed with Microsoft’s workloads in mind — could deliver huge gains in performance and efficiency, he said.
“Azure Maia is designed specifically for AI and to get the absolute most out of hardware,” he said.
Meanwhile, the Cobalt 100 CPU is built on the Arm architecture, a type of power-efficient chip design, and is optimized for greater efficiency and performance in cloud-native offerings, said Wes McCullough, the company’s vice president of hardware product development. Choosing Arm technology was a key component of Microsoft’s sustainability goal. It aims to improve “performance per watt” across all of its data centers, which essentially means getting more computing power per unit of power consumed.
“The architecture and implementation were designed with energy efficiency in mind,” he said. “We are making the most efficient use of transistors on silicon. Multiply these efficiency gains in servers across all of our data centers, and it adds up to a very large number.
Dedicated hardware, from chip to data center
Before 2016, most Microsoft cloud tiers were bought off the shelf, said Pat Stemen, partner program manager on the AHSI team. Microsoft then began custom-designing its servers and racks, cutting costs and giving customers a more consistent experience. Over time, silicone became the key missing piece.
The ability to build its own custom silicon allows Microsoft to target specific qualities and ensure chips perform optimally in the workloads that matter most. Its testing process involves determining how each chip will perform under different frequency, temperature and power conditions to achieve peak performance and, most importantly, testing each chip under the same conditions and configurations it would encounter in a real Microsoft data center.
The silicon architecture unveiled today allows Microsoft to not only enhance cooling efficiency but also optimize utilization of existing data center assets and maximize server capacity within its existing footprint, the company said.
For example, there were no racks to accommodate the unique requirements of the Maia 100 server boards. So Microsoft built them from scratch. These racks are wider than what is typically found in corporate data centers. This expanded design provides ample space for both power and networking cables, which is essential to meet the unique demands of AI workloads.
These AI tasks come with intense computational requirements that consume more energy. Traditional air cooling methods are not sufficient for these high-performance chips. As a result, liquid cooling – which uses circulating fluids to dissipate heat – has emerged as the preferred solution to these thermal challenges, ensuring they operate efficiently without overheating.
But Microsoft’s current data centers weren’t designed for large liquid coolers. So I developed a “holder” that sits next to the rack of the Maia 100. These side tools work a bit like the radiator on a car. Cold liquid flows from the chum to cold plates attached to the surface of the Maia 100 chips. Each plate has channels through which the liquid circulates to absorb and transfer heat. This flows to the chum, which removes heat from the fluid and sends it back to the carrier to absorb more heat, and so on.
The tandem design of the rack and holder underscores the value of a systems approach to infrastructure, McCullough said. By controlling every aspect – from the low-power ethos of the Cobalt 100 chip to the intricacies of data center cooling – Microsoft is able to orchestrate a harmonious interaction between every component, ensuring that the whole is truly greater than the sum of its parts in reducing environmental impact.
Microsoft has shared what it has learned from the design from its dedicated shelf with industry partners and can use it regardless of the piece of silicon inside, Stemen said. “All the things we build, whether it’s infrastructure, software or firmware, we can leverage them whether we deploy our own chips or those provided by our industry partners,” he said. “This is a choice that the customer has to make, and we try to offer them the best set of options, whether that is in terms of performance, cost or any other dimension they care about.”
Microsoft plans to expand this set of options in the future; It is already designing second-generation versions of the Azure Maia AI Accelerator series and the Azure Cobalt CPU series. Stemen said the company’s mission remains clear: to improve every layer of its technology stack, from core silicon to end-to-end service.
“Microsoft’s innovations are moving forward with this silicon work to future-proof our customers’ workloads on Azure, prioritizing performance, energy efficiency and cost,” he said. “We intentionally chose this innovation so that our customers have the best experience they can with Azure today and in the future.”
Read more: Microsoft delivers purpose-built cloud infrastructure in the age of artificial intelligence
Read more: Azure announces new AI-optimized VM series featuring AMD’s leading MI300X GPU
Read more: Introducing Azure NC H100 v5 VMs for mid-range AI and HPC workloads
Learn more: Microsoft Ignite
Top image: A technician installs the first server racks containing Microsoft Azure Cobalt 100 CPUs in a data center in Quincy, Washington. It is the first CPU designed by Microsoft for the Microsoft Cloud. Photography by John Brecher for Microsoft.