Tuesday, 19 November 2019

SC19: Intel Unveils New GPU Stack, oneAPI Development Effort

Intel made some significant announcements at Supercomputing 19 on Sunday, including new details on its Xe GPU architecture and a programming model it calls oneAPI. Both products are critical to the company’s future plans; Xe represents Intel’s first-ever push into data center GPUs and its first discrete GPU in nearly a decade. OneAPI is part of Intel’s effort to expand both its total addressable market and to unify the compute space developers use to target its products.

One-API-Intel-1

The goal of OneAPI is to present a single unified development target for the four major types of workloads (scalar, vector, matrix, spatial) and the various components that Intel manufactures (FPGAs, CPUs, GPUs, and other AI accelerators via products through companies like Movidius and Mobileye). One of the major goals of OneAPI is to abstract away the work of optimizing for any single specific architecture, allowing the developer to focus on writing code that runs on any underlying supported hardware.

oneAPI-BaseKit

The “write once, run anywhere” idea that Intel is going for with OneAPI is clearly reminiscent of Java, but there are some major differences between the two. Java compiles to bytecode and runs inside a JVM, while oneAPI is a set of libraries. Those libraries translate hardware-agnostic API calls into more specific low-level code that runs on whatever target hardware is present in the system. OneAPI isn’t completely without targeting — users are expected to define whether they’re writing code for an FPGA, CPU, or GPU, for example — but anything higher should be abstracted away.

Ponte Vecchio: Intel’s First Data Center GPU

Intel also unveiled details on Ponte Vecchio, its first data center and HPC GPU. Ponte Vecchio is a medieval bridge in Florence. It isn’t clear why Intel picked this particular naming convention; the company may have opted for famous bridges as a codename source. ServeTheHome has extensive details on Ponte Vecchio, which is optimized more towards compute workloads and less for graphics. The design uses variable vector width and can handle both SIMT and SIMD data, offering top performance when both modes are used.

PV can scale to thousands of EUs (firmer figures were not offered) and supports data types like INT8, bfloat16, and FP16. Xe is said to offer a 40x increase in double-precision floating point per execution unit compared with Intel’s existing integrated graphics. Xe will use CXL for a coherent interconnect between CPU and GPU. The GPU also includes something called a “Rambo” cache connected to the XEMF (Xe Memory Fabric).

Image by ServeTheHome

Intel believes the cache is essential to its plan for improving performance when using large matrices. Intel’s new interconnects are both in play on this project, with EMIB used for HBM and Foveros used for Rambo. Ponte Vecchio will be built on Intel’s 7nm process. This may be the GPU that Intel expects to debut on that node when it’s ready for manufacturing.

OneAPI and Xe are both critical components of Intel’s broad future approach to computing. The company has articulated a multi-faceted future that leverages FPGAs, CPUs, GPUs, and other accelerators from the Loihi and NNP-I/NNP-T families to create an overall product ecosystem. We’ll start to see how those plays are coming together in 2020, as consumer Xe moves into production and next-generation products built on 10nm ship in greater volume.

Now Read:



No comments:

Post a Comment