HOTINFRA'23
Updated 131 days ago
In this paper, we investigate the system virtualization techniques for NPUs across the entire hardware and software stack. In the hardware stack, we design a hardware-assisted multi-tenant NPU for fine-grained resource sharing and isolation. It employs an operator scheduler on the NPU core to enable concurrent operator executions and flexible priority-based resource scheduling. In the software stack, we propose a flexible vNPU abstraction. We leverage this abstraction to design the vNPU allocation, mapping, and scheduling policies to maximize resource utilization while guaranteeing both performance and security isolation for vNPU instances at runtime... We then extended AI/ML accelerators with more operations in support of more workloads. We propose SIMD 2, a new programming paradigm to support generalized matrix operations with a semiring-like structure. SIMD 2 instructions accelerate eight more types of matrix operations, in addition to matrix multiplications. Since SIMD 2..