2021 | Alternative Sustainable & Intelligent Computing

ASIC1

Topic: Comparative Analysis of Anomaly Detection Strategies in Distributed Systems

Abstract: Effective strateiges for anticipating hardware anomalies and failures would allow system operators to proactively mitigate their effects. Recent efforts on disk failure prediction have focused primarily on using hardware telemetry and neural networks to classify healthy and failed disks. Yet these neural networks have not yet delivered precision and recall at levels necessary to act on their predictions. We propose a comparative analysis of anomaly detection strategies that examine the relative merits of classification, regression, and anomaly detection. Classification and regression, forms of supervised learning, will develop neural network architectures to predict disk failure and time to failure, respectively. Anomaly detection, a form of unsupervised learning, will estimate probability density functions to determine which system conditions are (a)typical. We will focus on disk failures but expect our research findings to produce generalizable lessons for modeling resilient systems.

Deliverable: (1) Software for statistical modeling techniques, (2) A manuscript targeted for a computer systems or architecture conference.

Dates: 2/2021-2/2022

Budget: $50,000

Requested Resources: 1 graduate student

Participating Site: University of Pennsylvania (PI: Benjamin Lee)

ASIC2

Topic: Coherent GPUDirect Storage Transactions with Computational Storage

Abstract: The computational storage device can aid GPU-based applications (e.g. distributed learning, graph processing) by offloading data preprocessing from the host CPU; meanwhile, connection fabrics such as GPUDirect allow storage-to-GPU communication without involving the host. However, maintaining coherence across one computational storage device and multiple GPUs is not currently possible over GPUDirect. Thus, the computational storage device can be augmented to manage coherent traffic across GPUDirect for corresponding GPUs/devices. To achieve this goal, we will: (1) Implement directory-based protocol on the computational storage device and verify the system with simulators. (2) Explore Storage-to-GPU bottlenecks and opportunities by evaluating the performance of an important workload -- fine-tune training and the inference of sparse deep neural network. The optimization will be performed accordingly based on the proposed system.

Deliverable: (1) Functional simulator framework; (2) A manuscript targeted for a computer architecture conference

Dates: 2/2021 – 2/2022

Budget: $150,000

Requested Resources: 3 graduate students

Participating Site: Duke (PI: Yiran Chen)

ASIC3

Topic: Multimodal Physiological Signal Analysis for Intelligent Health Monitoring

Abstract: With the rapid development in IoT sensors and short-range wireless communications, it has become feasible to put tiny sensors on or near human body (i.e. a sticker on the chest or a sensor beneath the mattress etc.) to monitor physiological signals like heart sound, breath sounds, electrocardiogram (ECG) and ballistocardiogram (BCG) etc. continuously throughout daily life. These signals, traditionally monitored at hospitals and analyzed by exports, are important to identify the risk of life-threatening conditions like stroke and myocardial infarction, and could also reflect minor issues like drunk or tiredness. As there are recent services like Apple Watch that use a single source of signal (e.g. ECG) for health monitoring, we believe jointly considering multimodal signals could provide previously undiscovered features for accurate health monitoring, and cover more health conditions with the system. In this project, we propose to build an intelligent health monitoring system by jointly analyzing the multimodal data supported by the membership companies. We will also combine model compression and federated learning techniques established in previous projects into the system to make it feasible and safe under IoT setting.

Deliverable: (1) Algorithm and demo system for health monitoring with multimodal signals; (2) A manuscript targeted for a machine learning or digital health conference

Dates: 2/2021 – 2/2022

Budget: $100,000

Requested Resources: 2 graduate students

Participating Site: Duke (PI: Binghui Wang)

ASIC4

Topic: Prediction and Management of Communication Resource in a Dynamic UAV Traffic Environment

Abstract. In this project, we consider a dynamic environment where resources are shared by agents in the same neighborhood. The action of one agent affects the resource usage of another. As a result, the overall resource availability/usage is a function not only on each agent's location/action but also the joint location/action of all agents. We use the LTE resource block usage in a UAV traffic environment as a case study. When two UAVs communicate with two neighboring base stations in the same frequency band, due to interference, lower efficiency code must be used and more bandwidth (i.e. resource blocks) have to be allocated. At the same time the distance between the UAV and the base station also affects the amount of resource blocks that need to be allocated. Due to the interaction, even if each agent behaves deterministically, when their number increases, it is hard to foresee the system state after a long time without detailed simulation. We propose to develop a prediction model that can be trained to predict the future resource usage and availability. Based on this prediction model, better UAV routing or resource block allocation policy will also be investigated.

Deliverable: (1) A manuscript submitted to a premier system conference or journal; (2) Code and benchmarks used in the experiments.

Dates: 2/2021 – 2/2022

Budget: $50,000

Requested Resources: 1 graduate student

Participating Site: Syracuse Unviersity (PI: Carlos Caicedo, Mustafa C Gursoy and Qinru Qiu)

ASIC5

Topic: Dynamically Provisioned SSDs for Container-Native Storage

Abstract: Container-native storage manages the heterogeneous and ephemeral nature of containers through dynamic provisioning of storage volumes. However, the state-of-the-art SSDs implement static partitioning of resources to achieve deterministic performance among multiple applications - a stark contrast to the flexibility of containers. Motivated to bridge the gap between the requirements in container-native storage and the current SSD protocols, we propose an SSD-driven approach wherein the device autonomically and intelligently manages, allocates, and provisions its internal resources. The key insights behind this project are the observations that an SSD (1) has the freedom to determine the placement for a data using indirection, and (2) relocates data through the means of garbage collection, wear leveling, and read scrubbing that can adjust and balance the load. Thus, by exploiting the existing SSD-internal mechanisms, we aim to design a cloud-native SSD architecture that achieves both performance isolation and resource utilization eﬃciency. In this project, we will first design an SSD architecture that dynamically provisions resources while achieving strong performance isolation. We will then augment the efficiency of the container-native storage by orchestrating the SSD's resource allocation mechanisms with the container scheduling policies.

Deliverable: (1) A manuscript targetted for a system conference; (2) code and benchmarks used in the experiments.

Dates: 2/2021 – 2/2023

Budget: $100,000 over 2 years

Requested Resources: 2 graduate students over 2 years

Participating Site: Syracuse (PI: Bryan S. Kim)

ASIC6

Topic: NAS-based Fully Automatic ML Estimator Development Flow in EDA

Abstract. The rise of machine learning technology has inspired a boom of its applications in electronic design automation (EDA) and helps improve the degree of automation in chip design. However, manually crafted machine learning models rely on extensive human expertise and tremendous engineering efforts. To address this, we have proposed a neural network search (NAS)-based algorithm to automatically develop a routability estimator, whose performance is superior to previous works tuned by human engineers. Based on this pioneering research effort, we propose a more efficient and fully automatic estimator development flow by targeting at least three directions. (1) Support a larger search space with more candidate model structures. There are many advanced structures that prove to be effective in computer vision but are not widely adopted in EDA, including depth-wise or group convolution, dilated convolution, dense connection, etc. A large enough search space with these structures ensures that optimal solutions are not left out. To search this space effectively, we can start the searching with highly coarse-grained strategies. (2) Automatic feature selection. Since feature engineering also requires extensive human effort, the development flow is not fully automatic without automatic feature selection. This selection can be performed by either top-down pruning or bottom-up selection based on a large number of candidate features. (3) Predict other design objectives besides routability. The development flow can be easily extended to other post-placement predictions, where data is in the format of matrices. As for early-stage prediction on netlists, we can define different search spaces based on various graph convolutional networks, like GCN, GraphSage, and GAT.

Deliverable: (1) A manuscript submitted to a premier system conference or journal; (2) Code and benchmarks used in the experiments.

Dates: 2/2021 – 2/2022

Budget: $50,000

Requested Resources: 1 graduate student

Participating Site: Duke (PI: Yiran Chen)

ASIC8

Topic: Accelerating Transformer-based Neural Network to Resource-Constrained Hardware

Abstract: Transformer is one of the state-of-the-art models for Natural Language Processing (NLP) applications; most recently, it has also been demonstrated by Google that the Transformer can be applied for image recognition at scale. However, the coupled hardware accelerator has not been well studied. This project will focus on the edge devices with limited resources, including the mobile platform and the microcontroller unit (MCU). The project will be carried out in two steps: (1) targeting on the mobile platform, we will employ the pruning techniques to enable the dynamic reconfigurable of Transformer at run-time; in such way, we can guarantee real-time performance and prolong battery life; (2) moving to MCU with more constrained hardware resources, we will further compress the model and involve the compiler optimization; with the model compression and compiler optimization, we will further couple it with neural architecture search approach to finally identify a model to be accommodated to the MCU under the memory and time constraints. Given a dataset, a machine learning task, and target hardware specifications, the proposing automation tool can find the best Transformer and compiler optimization pair to maximizing prediction accuracy while satisfying the required real-time performance.

Deliverable: (1) A manuscript targeted for a machine learning or design automation conference; (2) Code and benchmarks used in the experiments.

Dates: 2/2021 – 2/2022

Budget: $50,000

Requested Resources: 1 graduate students

Participating Site: Notre Dame (PI: Weiwen Jiang)

ASIC9

Topic: Device Uncertainty Aware Co-Exploration for Computing-in-Memory Neural Accelerators

Abstract: To offer lower delay and higher energy efficiency for Deep Neural Networks (DNNs) inference, various forms of DNN accelerators have been proposed. One of the most promising designs approach is to use emerging device-based crossbar arrays to perform vector-matrix multiplication, which can effectively reduce the delay and energy consumption of vector-matrix multiplications. However, two challenging problems of emerging device-based designs remain to be solved: (1) emerging device-based designs include frequent transformations across digital and analog domains, requiring energy-hungry DAC and ADC operations, and thus reducing the energy efficiency. (2) Device to device variations among RRAMs induce considerable uncertainties on the weights of DNNs. To address the first issue, we propose to co-explore DNN topologies and Computing-in-Memory (CiM) accelerator design to guarantee both inference accuracy and hardware efficiency such as latency and energy consumption. For the second issue, we plan to provide a simulation-based framework to analyze the robustness of DNN implementations on CiM accelerators via Monte Carlo method and utilize distributional properties of CiM accelerator behaviors to estimate the robustness of target accelerators against device uncertainties. The robustness estimation scheme can then be embedded into co-exploration frameworks to design DNN topology and CiM accelerator design pairs that are accurate, robust, and efficient.

Deliverable: (1) A manuscript targeted for a machine learning or design automation conference; (2) Code and benchmarks used in the experiments.

Dates: 2/2021 – 2/2022

Budget: $50,000

Requested Resources: 1 graduate students

Participating Site: Notre Dame (PI: Weiwen Jiang)

ASIC10

Topic: Unsupervised Learning and Bayesian Inference Using Neuromorphic Computing

Abstract. Bayesian network explores probabilistic relationship among data and applies it for many machine intelligence applications such as prediction, reasoning, anomaly detection and error correction. The model has close resemblance of high-level cognitive function of a biological brain. In our previous work, we have developed a special Bayesian network model for information completion and anomaly detection. We have also shown that the weight of a Bayesian connection between two nodes can be learned using unsupervised STDP (Synaptic Time Difference Plasticity) rule of stochastic neurons. In this project, we propose to investigate the biological plausible learning of the Bayesian network and its implementation on neuromorphic processor.

Deliverable: (1) A manuscript submitted to a premier system conference or journal; (2) Code and benchmarks used in the experiments.

Dates: 2/2021 – 2/2022

Budget: $50,000

Requested Resources: 1 graduate student

Participating Site: Syracuse University (PI: Qinru Qiu)

ASIC11

Topic: In-hardware Real-time Learning of Spatiotemporal Spiking Neural Networks

Abstract: The recent discovered spatiotemporal information processing capability of bio-inspired Spiking neural networks (SNN) has enabled some interesting models and applications. In a realistic neuron model, each synapse and neuron behave as filters capable of preserving temporal information. As such neuron dynamics and filter effects are ignored in existing training algorithms, the SNN downgrades into a memoryless system and loses the ability of temporal signal processing. Further-more, spike timing plays an important role in in-formation representation, but conventional rate-based spike coding models discard information carried by its temporal structures. Our previous work formulates SNN as a network of infinite impulse response (IIR) filters with neuron nonlinearity to exploit the temporal dynamics of SNNs. The learning of the model is achieved offline using backpropagation through time. In this project, we will investigate in-hardware real-time learning of the spatiotemporal model of SNN. Novel parameter update rules will be studied, and the trade-off between hardware cost and learning efficiency will be explored.

Deliverable: (1) A manuscript submitted to a premier system conference or journal; (2) Code and benchmarks used in the experiments.

Dates: 2/2021 – 2/2022

Budget: $50,000

Requested Resources: 1 graduate students

Participating Site: Syracuse Unviersity (PI: Qinru Qiu)

ASIC12

Topic: Reliability, Security, and Learning Co-Design for Processing in Memory

Abstract: This project explores the collaborative design of storage and processing-in-memory (PIM) architectures that address reliable operation in conjunction with security provisions, while maintaining energy efficiency and high performance in deeply scaled conventional and emerging memories. For example, we propose a PIM technique using domain-wall memories (DWM), a cousin to STT-MRAM that improves density while requiring a pseudo-sequential access. Using our recently developed transverse access technique, we can develop innovative reliability schemes and we envision multi-operand gates to support PIM, construct arithmetic operators, and even implement accelerators for application kernels such as AES block ciphers. Moreover, we collaboratively design these systems to address multiple concerns across these metrics in the same system. As part of this project we plan to develop new approaches for reliability, security, and PIM for DWM and other scaled memories. We also have a target of a custom architecture for learning through a hierarchical temporal memory acceleration.

Deliverable: New theory, simulation flows, and prototype PIM architectures in addition to papers published at top conference venues and journals.

Dates: 2/2021 – 2/2022

Budget: $150,000

Requested Resources: 3 graduate students

Participating Site: University of Pittsburgh (PI: Alex Jones; Co-PI: Jingtong Hu), University of South Florida (Co-PI: Sanjukta Bhanja)