2020

D1

Topic: Anomaly prediction and diagnosis in distributed systems

Abstract: Data centers deploy hundreds of thousands of servers. At such scale, numerous anomalies and faults in the system architecture arise daily. These anomalies significantly impact service performance and availability even if the system draws on principles in reliable design such as redundancy and replication. When anomalies arise, data center operator wishes to arrive rapidly at a diagnosis and remedy. To achieve these goals, we propose two directions for research. First, we will develop statistical machine learning models to predict hardware failures in components such as the disk, memory, power delivery, motherboard, etc. These models should anticipate future anomalies and failures based on current system conditions. Second, we will develop statistical machine learning models to localize faults in large distributed systems. These models should draw on server logs and other diagnosis data to detect anomalies and identify problematic components in the hardware architecture or system firmware.

Deliverable: (1) A manuscript submitted to a premier system conference or journal; (2) Code and benchmarks used in the experiments.

Requested Resources: 2 graduate student

Participating Site: Duke (PI: Benjamin Lee)

 

D2

Topic: Server architectures for distributed shared memory

Abstract: Data center servers are increasingly disaggregated. Disaggregated architectures meet workload demands for diverse hardware capabilities that are provisioned in a modular architecture. They do so by permitting nodes within a server to access each other’s hardware resources such as memories, accelerators, and network interfaces. Although accessing remote hardware has historically incurred significant performance and energy costs, emerging technologies motivate a new look at disaggregated architectures. Fast, serial links offer low latency and high bandwidth. Messaging protocols support cache coherent shared memory and fine-grained communication. We re-think data center server design given these new technology parameters. First, we will characterize and simulate the capabilities of emerging interconnects, such as PCIe 5.0, and communication protocols such as Compute Express Link. Second, we will study the potential for distributed, cache coherent shared memory. Third, we will deploy data analytics workloads and assess performance, power, and energy costs.

Deliverable: (1) A manuscript submitted to a premier system conference or journal; (2) Code and benchmarks used in the experiments.

Requested Resources: 1 graduate student

Participating Site: Duke (PI: Benjamin Lee)

 

D3

Topic: Efficient and secure online learning system with local smart sensor Data

Abstract: Smart sensors are now existing in a wide range of household appliances, aiming to provide personalized service for each user, especially for healthcare related services based on smart watches of smart beds. However, building personalized machine learning model for each customer based on local gathered data is challenging. For example, the lack of computation and storage resource near the local sensor requires the acquired data to be transported elsewhere for processing and storage, commonly smartphones (via Bluetooth) or online server (via WIFI), where the communication efficiency is a major concern in both scenarios. Also, the locally sensed data may contain private personal information, which may be unsafe to be shared online directly. In this project, we propose a system that can achieve online learning with local data both securely and efficiently. From the local side, preprocess of the raw sensor data will be performed to remove sensitive private information, which can be defined by the user and evaluated by information theory-based metrics. The local feature extraction model will be compressed by various techniques to fit the computation resource. The feature extracted by the local model will have a much smaller size than its raw form but contain adequate information for the online learning task. The communication cost to the online server can be further reduced by adopting an active learning scheme, where the importance of the local data will be evaluated and only the important ones will be uploaded and stored.

Deliverable: 1) A submission to premium ML conference or journal; 2) A demo system for preprocessing and analyzing sensor data based on market-available smart sensor application.

Requested resource: 1 graduate student.

Participated site: Duke (PI: Hai Li)

 

D4

Topic: A machine learning based pre-placement wirelength estimator 

Abstract. In recent years, we have witnessed a lot of inspiring works on machine learning (ML)-based EDA algorithms. Despite this success, a majority of ML algorithms apply late after placement, including the estimators on timing, congestion, clock tree quality, IR drop, etc. These estimators require designers to go through placement for all candidate solutions, which is not feasible when search space is large. In practice, many designers hope for a pre-placement estimator on net/path wirelength, in order to address critical paths at a very early stage. This is highly challenging for several reasons: 1. The behavior of the modern placer is highly complex and depends on many parameters; 2. Lack of an efficient input format to encode netlist; 3. Many existing pre-placement ML applications assume label propagation, which is not suitable for the wirelength problem. To solve this, we propose an ML-based pre-placement wirelength estimator, which targets two problems: 1. Estimates HPWL of individual nets; 2. Estimates register-to-register path length. The path length is the summation of HPWL of all nets on this path. For problem 1, we plan to apply graph classification algorithms to learn the neighboring topology of each net. The preliminary result has outperformed a popular traditional wirelength estimator [Kahng, 05]. For problem 2, we adopt LSTM to recognize paths with variable net numbers. The absolute error is now lower than 15% for current cross-design models. Both ML algorithms will be customized for this problem. Later we will seek to improve current models further and to achieve reduced turn-around time or better PPA in real applications.

Deliverable: (1) A manuscript submitted to a premier system conference or journal; (2) Code and benchmarks used in the experiments.

Requested Resources: 1 graduate student

Participating Site: Duke (PI: Yiran Chen)

 

N1

Topic: Software defined FPGA hardware and co-exploration for real-time applications

Abstract: The neural architecture search (NAS) has achieved great success in liberating human labor in the design of neural architectures for image classification. However, the conventional NAS with the mono-object on accuracy without the consideration in hardware will potentially lead to excessive latencies beyond specifications, rendering the resulting architectures useless. Such a problem becomes more serious in applications with real-time performance e.g., autonomous driving, surgery robotics, etc. We propose to co-explore neural architectures and hardware (FPGA) design to guarantee timing specifications such as latency and throughput. In addition, we plan to extend the existing NAS to support more complicated machine learning tasks, such as image segmentation and object detection. Given a dataset, a machine learning task, a target FPGA, and a real-time constraint, the proposing automation tool can find the best architecture and FPGA implementation pair to maximize prediction accuracy while satisfying the required real-time performance. As the starting of the project, we aim to create, optimize, and characterize basic and advanced hardware IP blocks that serve as the IP libraries of the automation tools as well as performs algorithm-level model compilation and optimization.

Deliverable: 1) A submission to premium ML conference or journal; 2) Code and benchmarks used in the experiment.

Requested resource: 1 postdoc scholar and 1 graduate student

Participated site: Notre Dame (PI: Weiwen Jiang), Duke (PI: Yiran Chen)

 

N2

Topic: Hardware-aware neural network competency awareness

Abstract: In order for Deep Neural Networks DNNs to gain humans' trust in making decisions for humans reliably, especially in mission-critical scenarios, it is required to equip DNNs with self-awareness on its task competency. There is an emerging trend that the confidence score obtained by uncertainty estimation, either in its fine-grained form or coarse-grained form, be used to enable the competency-awareness of DNNs. However, the existing development and improvement in uncertainty estimation mostly are mostly driven by performance merit. They may not fit into the real-world platforms due to the hardware resource constraints. In addition, little has been known on how hardware designs affect the uncertainty estimation which makes it difficult to design hardware-aware neural network with competency awareness. In this project, we will tackle this challenge in two approaches. First, we will systematically evaluate how the practical hardware constraints may affect the uncertainty estimation performance so as to facilitate the manual design of hardware-aware neural networks. The general factors considered include the memory size, FLOPs, quantization, and process variation. Second, we will apply hardware-aware neural architecture search to competency-aware neural networks so as to automate the model design and achieve near-optimal performance.

Deliverable: 1) A submission to premium ML conference or journal; 2) Code and benchmarks used in the experiment.

Requested resource: 1 graduate student.

Participated site: Notre Dame (PI: Yiyu Shi)

 

N3

Topic: Uncertainty aware training for emerging device based in-memory computing

Abstract: Emerging device-based computing in memory platforms have shown great potential for deep neural network (DNN) implementations due to the low latency and high energy efficiency. However, the finite number of states that can be represented by emerging devices as well as the spatial and temporal variations (across devices and across cycles) induce considerable uncertainties on the weights of DNNs, which jeopardize the performance of the DNNs deployed on the platforms. To tackle those issues, in this project we will: (1) Build models for weight uncertainties on quantized neural networks and provide both modeling and simulation results for the effects of such variations. (2) Incorporate the model into the training phase of DNNs by offering uncertainty aware loss terms and error propagation schemes, so that the trained DNN models would be more robust against uncertainties over weights.

Deliverable: 1) A submission to premium ML/DAC conference or journal; 2) Model, code and benchmarks used in the experiment.

Requested resource: 1 graduate student.

Participated site: Notre Dame (PI: Yiyu Shi)

 

S1 (Continued project)

Topic: Attribute-based object localization

Abstract: Despite the recent advances in object detection, it is still a challenging task to localize a free-form textual phrase in an image. Unlike locating objects over a deterministic number of classes, localizing textual phrases involves a massively larger search space. Thus, along with learning from the visual cues, it is necessary to develop an understanding of these textual phrases and its relation to the visual cues to reliably reason about locations of the target described by the phrases. Spatial attention networks are known to learn this relationship and enable the language-encoding recurrent networks to focus its gaze on salient objects in the image. We propose to utilize spatial attention networks to refine region proposals for the phrases from a Region Proposal Network (RPN) and localize them through reconstruction. Utilizing in-network RPN and attention allows for an independent/self-sufficient model and interpretable results respectively.

Deliverable:  Code and benchmarks used in the experiments.

Requested resource: 1 PhD student for 2 years

Performing sites: Syracuse University (PI: Qinru Qiu)

 

S2 (Continued project)

Topic: Biologically plausible spike-domain backpropagation for in-hardware learning

Abstract: Asynchronous event-driven computation and communication through spikes enable massively parallel, extremely energy efficient and highly robust neuromorphic hardware specialized for spiking neural networks (SNN). However, the lack of a unified and effective learning algorithm limits the SNN to shallow networks with low accuracies. While backpropagation algorithm, which utilizes gradient descent to train networks, has been successfully used in Artificial Neural Networks (ANNs), it is neither biologically plausible nor neuromorphic implementation friendly. In this project, we propose to develop methods to achieve backpropagation in spiking neural networks. This will enable error propagation through spiking neurons in a more biologically plausible way and hence makes the in-hardware learning feasible on existing neuromorphic processors.

Deliverable: Code and benchmarks used in the experiments.

Requested resource: 1 PhD student for 2 years

Performing sites: Syracuse University (PI: Qinru Qiu)

 

S3 (Continued project)

Topic: Fast prediction for UAV traffic/communication in a metropolitan area

Abstract: With the rapid increase of the UAV applications in delivery, surveillance, and rescue mission, there is an urgent need for UAV traffic management that ensures the safety and timeliness of the missions. Accurate and fast UAV traffic prediction and resource usage estimation is an enabling technique for UAV traffic management. Due to the temporal and spatial behavior of UAV traffic, complexity and difficulty of traffic density prediction is much beyond the traditional analytical approach. We propose to solve this problem using machine learning. We will investigate the best network structure of the prediction model that gives the best accuracy with the least complexity. The inputs are multi-channel time varying streams, such as weather map, cellular network usage map, geographical constraints, and UAV launching/landing information, etc. The outputs will be predicted UAV density map and congestion distribution. Compression and acceleration of the model will also be studied for real-time prediction.

Deliverable: Model and implementation of the UAV traffic/communication prediction framework. Technical publications.

Requested resource: 1 PhD students for 2 years.

Performing sites: Syracuse University (PI: Qinru Qiu)