2024-06-15 (Day 1):
Time Sessions Talks
8:45 - 8:55 开幕介绍
8:55 - 9:00 领导致辞
9:00 - 9:40 Keynote Session #1
Session Chair:
9:40 - 10:20 智海系列垂直领域大模型与人工智能体
10:20 - 10:30 茶歇
10:30 - 11:10 Keynote 方天视窗高并行统一渲染架构
11:10 - 12:00 Lightning Talk #1
Session Chair:
Lightning Talk
12:00 - 13:30 午饭+Poster Session
13:30 - 13:50 Oral Session #1
Operating System
Session Chair:
An Empirical Study of Rust-for-Linux: The Success, Dissatisfaction, and Compromise
13:50 - 14:10 PathFuzz: Broadening Fuzzing Horizons with Footprint Memory for CPUs
14:10 - 14:30 On-the-fly Quarantine Before Patches for N-day Kernel Vulnerabilities Are Available
14:30 - 14:50 Flexible, Secure and Efficient CVM Maintenance with Confidential Procedure Calls
陈家浩 (上海交通大学)
14:50 - 15:10 Taming Hot Bloat Under Virtualization with HugeScope
15:10 - 15:20 茶歇
15:20 - 15:40 Oral Session #2
Session Chair:
CMC: Video Transformer Accelerator with CODEC Assisted Matrix Condensing
15:40 - 16:00 MagPy: Effective Operator Graph Instantiation for Deep Learning by Execution State Monitoring
16:00 - 16:20 Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators
16:20 - 16:40 Removing Obstacles before Breaking Through the Memory Wall: A Close Look at HBM Errors in the Field
16:40 - 16:50 茶歇
16:50 - 17:10 Industry Session
Session Chair:
17:10 - 17:30 构建 AI 2.0 时代的万卡集群:零一万物 AI Infra 建设实践
17:30 - 17:50 大模型时代的AI系统:挑战与展望
17:50 - 18:10 Q & A
18:10 晚宴

2024-06-16 (Day 2):
Time Sessions Talks
9:00 - 9:40 Keynote Session #2
Session Chair:
9:40 - 10:20 面向多核处理器的矩阵计算优化
10:20 - 10:30 茶歇
10:30 - 10:50 Best Paper Session
Session Chair:
Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning
10:50 - 11:10 What’s the Story in EBS Glory: Evolutions and Lessons in Building Cloud Block Store
11:10 - 11:30 Towards a Shared-storage-based Serverless Database Achieving Seamless Scale-up and Read Scale-out
11:30 - 12:00 Lightning Talk #2
Session Chair:
Lightning Talk
12:00 - 13:30 午饭+Poster Session
13:30 - 13:50 Oral Session #3
Session Chair:
StreamPIM: Streaming Matrix Computation in Racetrack Memory
13:50 - 14:10 UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space
14:10 - 14:30 AVM-BTB: Adaptive and Virtualized Multi-level Branch Target Buffer
14:30 - 14:50 An Instruction Inflation Analyzing Framework for Dynamic Binary Translators
14:50 - 15:00 茶歇
15:00 - 15:20 Oral Session #4
Cloud Computing
Session Chair:
Harmonizing Efficiency and Practicability: Optimizing Resource Utilization in Serverless Computing with Jiagu
15:20 - 15:40 UFO: The Ultimate QoS-Aware CPU Core Management for Virtualized and Oversubscribed Public
彭雅娟 (中国科学院深圳先进技术研究院)
15:40 - 16:00 AND: Application-network Diagnosing System for Millions of IPs in Production Clouds
16:00 - 16:20 Improving Resource and Energy Efficiency for Cloud 3D through Excessive Rendering Reduction
刘天义 (得克萨斯大学圣安东尼奥分校)
16:20 - 16:30 茶歇
16:30 - 16:50 Oral Session #5
Session Chair:
Designing an Efficient Data Deduplication Scheme for File-Based Encrypted Mobile Systems
16:50 - 17:10 Ethane: An Asymmetric File System for Disaggregated Persistent Memory
17:10 - 17:30 TeRM: Extending RDMA-Attached Memory with SSD
17:30 - 17:50 Sync+Sync: A Covert Channel Built on fsync with Persistent Storage
18:00 晚餐


2024-06-15 (Day 1):
Session ID Paper Author
0-Operating System 1 Adaptive Memory Swapping to Improve User Experience on Mobile Devices 李文通(华东师范大学)
2 Detecting Smart Home Automation Application Interferences with Domain Knowledge 汪涛(中国科学院软件研究所)
3 Efficient Maximal Biclique Enumeration on GPUs 潘哲(浙江大学)
4 GraalVM as a generic runtime for FOSS EDA 李枫(独立开发者)
5 HydraRPC: RPC in the CXL Era 马腾(阿里巴巴集团)
6 Live Migration of Virtual Machines Based on Dirty Page Similarity 程延博(兰州大学)
7 Quantized Data Transmission Optimization for Distributed GMRES Algorithm 高建花(北京师范大学)
8 SandTable: Scalable Distributed System Model Checking with Specification-Level State Exploration 唐瑞泽(南京大学)
9 TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical Applications 李宁(华东师范大学)
10 Userspace Bypass: Accelerating Syscall-intensive Applications 周喆(复旦大学)
11 面向容器集群的网络入侵检测系统 张良康(华中科技大学)
1-MLSys+GPU 12 Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation 刘国栋(中国科学院计算技术研究所)
13 GNNavigator: Towards Adaptive Training of Graph Neural Networks via Automatic Guideline Exploration 乔同(北京航空航天大学)
14 Graph Neural Networks Automated Design and Deployment on Device-Edge Co-Inference Systems 周傲(北京航空航天大学)
15 INSPIRE: Accelerating Deep Neural Networks via Hardware-friendly Index-Pair Encoding 汪宗武(上海交通大学)
16 MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs 陈扬锐(字节跳动)
17 Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances 段江飞(香港中文大学)
18 Personalized Meta-Federated Learning for Embedded Health Monitoring System 贾振格(山东大学)
19 SiBrain: A Sparse Spatio-temporal Parallel Neuromorphic Architecture for Accelerating Spiking Convolution Neural Networks with Low Latency 崔友锋(广东工业大学)
20 SpecFL: An Efficient Speculative Federated Learning System for Tree-based Model Training 张玉会(中国科学院信息工程研究所)
21 Efficient SpMM Accelerator for Deep Learning: Sparkle and Its Automated Generator 姜晶菲(中国人民解放军国防科技大学)
22 A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning 麻津铭(上海人工智能实验室)
2-Architecture 23 AIG-CIM: A Scalable Chiplet Module with Tri-Gear Heterogeneous Compute-in-Memory for Diffusion Acceleration 孙奕扬(北京大学)
24 Alchemist: A Unified Accelerator Architecture for Cross-Scheme Fully Homomorphic Encryption 穆嘉楠(中国科学院计算技术研究所)
25 Cuper: Customized Dataflow and Perceptual Decoding for Sparse Matrix-Vector Multiplication on HBM-Equipped FPGAs 伊恩鑫(中国石油大学(北京))
26 Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive Grouping 刘通宇(华东师范大学)
27 NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures 田博宇(清华大学)
28 SMG: A System-level Modality Gating Facility for Fast and Energy-Efficient Multimodal Computing 侯小凤(上海交通大学)
29 ReCG: ReRAM-Accelerated Sparse Conjugate Gradient 范明嘉(中国石油大学(北京))
30 SegScope: Probing Fine-grained Interrupts via Architectural Footprints 张鑫(北京大学)
31 QuFEM: Fast and Accurate Quantum Readout Calibration Using the Finite Element Method 张涵禹(浙江大学)
32 SpREM: Exploiting Hamming Sparsity for Fast Quantum Readout Error Mitigation 张涵禹(浙江大学)
33 Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses 薛峰(中国科学院计算技术研究所)
34 Optimization of current DMA operation with allocation and mapping 朱彦军(Intel/IONOS)

2024-06-16 (Day 2):
Session ID Paper Author
3-Cloud Computing 1 A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications 陈磊(中国科学院计算技术研究所)
2 Flagger: Near-Data Acceleration for Large-Scale Cross-Silo Federated Learning Aggregation 张杰(北京大学)
3 FUYAO: DPU-enabled Direct Data Transfer for Serverless Computing 刘国威(天津大学)
4 Rethinking an eBPF-based lightweight and unified solution for Edge networking 李枫(独立开发者)
4-Storage 5 A Write-Optimized PM-oriented B+-tree with Aligned Flush and Selective Migration 李明杰(贵州大学)
6 Boosting File Systems Elegantly: a Case for a Transparent NVM Page Cache 王国毓(吉林大学)
7 CCL-BTree: A Crash-Consistent Locality-Aware B+-Tree for Reducing XPBuffer-Induced Write Amplification in Persistent Memory 李振鑫(浙江大学)
8 Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction 宋建森(中国科学院软件研究所)
9 Differential Optimization Testing of Gremlin-Based Graph Database Systems 郑莹莹(中国科学院软件研究所)
10 Efficient Large Graph Processing with Chunk-Based Graph Representation Model 宗威旭(浙江大学)
11 Exploit both SMART Attributes and NAND Flash Wear Characteristics to Effectively Forecast SSD-based Storage Failures in Clusters 谷云飞(上海交通大学)
12 Fast and Scalable In-network Lock Management Using Lock Fission 张汉泽(上海交通大学)
13 HADB: Hotness-Aware Key-Value Store with Persistent Memory 谭蕴麟(贵州大学)
14 Hardware-Software Co-Designs of User-Space All-Flash Array Engine 张杰(北京大学)
15 Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters 莫梓钊(澳门大学)
16 Improving Graph Compression for Efficient Resource-Constrained Graph Analytics 许骞(中国人民大学)
17 OmniCache: Collaborative Caching for Near-storage Accelerators 张坚(罗格斯大学)
18 PolarDB-SCC: A Cloud-Native Database Ensuring Low Latency for Strongly Consistent Reads 陈浩(阿里云)
19 SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory 罗旭川(复旦大学)
20 SODA: A Set of Fast Oblivious Algorithms in Distributed Secure Data Analytics 李想(清华大学)
21 Understanding Transaction Bugs in Database Systems 崔紫玉(中国科学院软件研究所)