![]() Intel claims a ~3.3X advantage over AMD in ResNet50 (INT8 BS1) image classification with a sub-15ms SLA, and a 3X advantage in DLRM, a Deep Learning Recommendation Model, with PyTorch BF16 and INT8 in a batched workload. This model is trained in PyTorch but converted to the ONNX format. Intel also claims a ~5.5X advantage in this same workload with a batched test. Intel claims a 7X advantage over EPYC Genoa in ResNet34, a 34-layer object detection CNN model, using INT8 instructions at a batch size of 1 to measure latency - in this case, with an SLA of sub-100ms. The benchmarks above leverage Intel's AMX, and not the optional in-built AI accelerator engine. ![]() The Tile Matrix Multiply Unit (TMUL) that powers AMX is native to the Sapphire Rapids chips - you don't have to pay extra to use it like you do the dedicated AI accelerator engine - and leverages BF16 and INT8 to perform matrix multiply operations that can vastly enhance AI performance. Intel's focus on AI acceleration features, including software optimizations, has expanded over the years to now include purpose-built AI acceleration engines on its Sapphire Rapids chips - if you're willing to pay the extra fee.īut a more important development lurks in the Sapphire Rapids silicon - Intel has now progressed to its new Advanced Matrix Extensions (AMX) x86 instructions, which deliver tremendous performance uplift in AI workloads by using a new set of two-dimensional registers called tiles. ![]() One of Intel's bedrock principles behind its AI strategy has been to use AVX-512 to vastly improve Xeon's performance and power efficiency in AI workloads by using VNNI and BF16. Those chips came with support for new VNNI (Vector Neural Network Instructions) that optimized instructions for the smaller data types prized in AI applications. Intel has had its eyes on accelerating AI workloads since the debut of its DL (Deep Learning) Boost suite with its second-gen Cascade Lake Xeon Scalable chips in 2019, which it claimed made them the first CPUs specifically optimized for AI workloads. Given the quickening pace of AI infusion in the data center, the CPUs' performance in various types of inference will only become more important in the years to come. Yes, AI training remains the land of GPUs and various flavors of custom silicon, and we can expect Large Language Models (LLMs) to continue to rely upon those types of accelerators for the foreseeable future, but the majority of AI inference workloads still tend to run on CPUs. With that, let's take a closer look at Intel's results.ĪI Workloads: Intel Sapphire Rapids Xeon vs AMD EPYC Genoaįor nearly every large organization, the question is no longer "if" or "when" they should deploy AI-driven applications into their deployments - the question is where and how. We've included Intel's full test notes for the tested configurations in the relevant image albums below. The price of the chips used for comparison are also lopsided, too. Intel claims it enabled all rational optimizations for both its and AMD's silicon for these tests, but be aware that the comparisons can be a bit lopsided, which we'll call out where we see it. However, as with all vendor-provided benchmarks, these should be approached with caution. With a few shipping OEM systems powered by AMD's Genoa in hand, Intel has conducted a wide range of benchmarks in multiple types of workloads spanning AI, HPC, and general-purpose workloads, to present its view of the competitive landscape. The benchmarks come a day before AMD's AI and Data Center event that we're flying out to cover, so we'll attempt to get AMD's feedback about Intel's benchmarks while we're at the event. Intel's performance comparisons come well after the company's launch of its Sapphire Rapids Xeons back in January of this year, but the company says its benchmark comparisons were delayed due to difficulties procuring AMD's competing EPYC Genoa chips, which launched in November of last year. Intel's 56-core Xeon Max, the first x86 data center CPU with HBM memory, also takes on AMD's 96-core flagship in several HPC workloads, matching or exceeding AMD's bulkier chip. Intel also touts higher performance under certain conditions, like when Sapphire Rapids' in-built accelerators are brought into play, in a spate of standard general-purpose workloads. Intel has shared a slew of new benchmarks of its fourth-gen Xeon Scalable Sapphire Rapids CPUs going head-to-head with AMD's fourth-gen EPYC Genoa processors, claiming up to 7 times more performance in AI workloads when comparing two 32-core chips.
0 Comments
Leave a Reply. |