Plinius: Secure and Persistent Machine Learning Model Training
With the increasing popularity of cloud based machine learning (ML) techniques there comes a need for privacy and integrity guarantees for ML data. In addition, the significant scalability challenges faced by DRAM coupled with the high access-times of secondary storage represent a huge performance bottleneck for ML systems. While solutions exist to tackle the security aspect, performance remains an issue. Persistent memory (PM) is resilient to power loss (unlike DRAM), provides fast and fine-granular access to memory (unlike disk storage) and has latency and bandwidth close to DRAM (in the order of ns and GB/s, respectively). We present PLINIUS, a ML framework using Intel SGX enclaves for secure training of ML models and PM for fault tolerance guarantees. P LINIUS uses a novel mirroring mechanism to create and maintain (i) encrypted mirror copies of ML models on PM, and (ii) encrypted training data in byte-addressable PM, for near-instantaneous data recovery after a system failure. Compared to disk-based checkpointing systems,PLINIUS is 3.2x and 3.7x faster respectively for saving and restoring models on real PM hardware, achieving robust and secure ML model training in SGX enclaves.
READ FULL TEXT