[연구] 컴퓨터시스템 연구실 HPCA 2023 논문
- 소프트웨어융합대학
- 조회수1170
- 2023-01-25
제목: 컴퓨터시스템 연구실 (지도교수: 서의성) HPCA 2023 논문
컴퓨터시스템 연구실의 유준열, 김종석 박사과정과 서의성 교수는
클라우드에서 인공지능 서비스를 제공할 때,
서버, 가속장치, 그리고 인공지능 모델의 조합과 자원투입량에 따라
에너지 효율성이 크게 차이가 남을 발견하고,
이를 응용하여 기존 클라우드에서 인공지능을 서비스하는 GPU 서버가 소비하는 에너지를
20% 이상 절약할 수 있는 클라우드 플랫폼 자원 관리 기법을 개발하였습니다.
이상의 발견과 제안하는 기법은
"Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving”이라는
제목으로 2월 25일 캐나다의 몬트리올에서 열리는
29회 IEEE International Symposium on High-Performance Computer Architecture (HPCA)에
발표될 예정입니다.
HPCA는 컴퓨터 구조 및 시스템 분야에서 최고 수준의 학술대회이며,
BK21+ 사업에서 가장 높은 등급(IF 4)을 인정 받고 있습니다.
Abstract:The proportion of machine learning (ML) inference in modern cloud workloads is rapidly increasing,
and graphic processing units (GPUs) are the most preferred computational accelerators for it.
The massively parallel computing capability of GPUs is well-suited to the inference workloads but consumes more power than conventional CPUs.
Therefore, GPU servers contribute significantly to the total power consumption of a data center.
However, despite their heavy power consumption, GPU power management in cloud-scale has not yet been actively researched.
In this paper, we reveal three findings about energy efficiency of ML inference clusters in the cloud.
<1> GPUs of different architectures have comparative advantages in energy efficiency to each other for a set of ML models.
<2> The energy efficiency of a GPU set may significantly vary depending on the number of active GPUs and their clock frequencies even when producing the same level of throughput.
<3> The service level objective(SLO)-blind dynamic voltage and frequency scaling (DVFS) driver of commercial GPUs maintain an immoderately high clock frequency.
Based on these implications, we propose a hierarchical GPU resource management approach for cloud-scale inference services.
The proposed approach consists of energy-aware cluster allocation,
intra-cluster node scaling, intra-node GPU scaling and GPU clock scaling schemes considering the inference service architecture hierarchy.
We evaluated our approach with its prototype implementation and cloud-scale simulation.
The evaluation with real-world traces showed that the proposed schemes can save up to 28.3\% of the cloud-scale energy consumption when serving five ML models with 105 servers having three different kinds of GPUs.
홈페이지: https://hpca-conf.org/2023/