Enyue Annie Lu

Research ExperienceS for Undergraduates REU SITE: Exercise - Explore Emerging Computing in Science and Engineering

Flyer Application SCHEDULE PROJECTS People Contact Photo Gallery REU HOME		Sample Research Projects: The research project background materials and problem sets will be posted in the spring pre-program online workshop. The research projects will be finalized in the first week of the summer REU program. The research projects will be designed to allow undergraduate student participants to be at the forefront of new innovations, yet work within an environment that is manageable for their level of expertise. Some sample research projects are listed as follows. Graph mining for large-scale networks using MapReduce Faculty Mentor: Dr. Enyue (Annie) Lu Analyzing patterns in large-scale graphs, such as social and cyber networks (e.g. Facebook, Linkedin, Twitter), with millions, even billions of edges has many important applications such as community detection, blog analysis, intrusion and spamming detections, and many more. Currently, it is impossible to process information in real-world large-scale networks with millions even billions of objects with a single processor. To overcome single processor limitations, a cluster of computers with multiple processing elements operated in parallel connected by a distributed network are used to solve large-size problems and reduce processing time. In this project, students will try to enumerate and identify important graph patterns. The network is modeled as a graph. Each person is represented as a vertex and a mutual friendship between people is represented as an edge in the graph. Finding a pattern in a real-world network is equivalent to finding a subgraph in a large-scale graph. We will map graph decomposing operations into a series of MapReduce processes. The proposed MapReduce algorithms will be implemented in Amazon Elastic MapReduce. We will also do performance comparison and analysis for the proposed MapReduce algorithms and simulation results. Implementation of parallel iterative improvement stable matching algorithms Faculty Mentor: Dr. Enyue (Annie) Lu In a graph, a set of independent edges (no two edges in the set are adjacent) is called matching. Matching algorithms are widely used in many applications including database search, image processing, pattern analysis, and scheduling. The stable matching problem was first introduced by Gale and Shapley in 1962. Given n men, n women, and 2n ranking lists in which each person ranks all members of the opposite sex in the order of preference, a stable matching is a set of n pairs of man and woman with each man and woman in exactly one pair, where there is no pair who are not matched who both prefer the other to their current partner. Gale and Shapley showed that every instance of the stable matching problem admits at least one stable matching, which can be computed in O(n2) iterations. For real-time applications such as switch scheduling, the algorithm proposed by Gale and Shapley is not fast enough. To date, the most well-known parallel algorithms for the stable matching problem are all run on theoretical parallel computing models such as CRCW PRAM. In this project, students will implement a parallel iterative improvement (PII) stable matching algorithm with linear time average performance. For real-time applications with a hard time-constraint, the PII algorithm can terminate at any time during its execution, and the matching with the minimum number of unstable matching pairs can be used as an approximation to a stable matching. The PII algorithm will be implemented using MPICH2 from Argonne National Lab, NVIDIA CUDA-enabled GPUs, and MapReduce computing on Amazon EC2. The goal of the project is to examine the parallelism of the stable matching by using the practical PII algorithm. We will implement the PII algorithm on the software level, measure its speedup against the sequential approaches, find out its efficiency and applicability, and investigate the parallelism limits of the stable matching algorithms. Anomaly Detection for Network Data Using MapReduce Faculty Mentor: Dr. Enyue (Annie) Lu As the volume of collected data grows at an unprecedented rate with the recent technology advances, information retrievals on very large data sets have becoming a beneficial and challenging task. Anomaly detection, which is used to identify abnormal events or patterns that do not conform to expected events or patterns, has been a very useful methodology to perform predictive modeling checking of available data and become increasingly complex to detect intrusion through network records due to large volumes of network traffic data. In this project, we will develop a new framework that combines graph modeling with MapReduce computing techniques to tackle anomaly detection on network record data at extreme scales. Graph is an expressive data structure and has been widely used to model complex data in many applications. By graph modeling, data will be represented as vertices and the relationship (e.g., similarity in spatial, temporal, or semantic attributes) within data are represented as edges in a graph. We will detect the anomalies by analyzing the graph generated by the network data. We plan to tackle the problem in the following three research steps. In the first step, we analyze the characteristics of the data and develop an effective graph model for the data. In the second step, we will develop efficient MapReduce graph-based anomaly detection algorithms and analyze their performance. In the last step, we will test the proposed algorithms and verify their performance using real-world network data. Graph Clustering with Applications on Covid-19 Growth Data Faculty Mentors: Dr. Enyue (Annie) Lu Coronavirus Disease 2019 (Covid-19) has affected all people across the world. According to the Center for Disease Control (CDC), more than 600,000 lives have been lost due to Covid-19 across the U.S.. Much of this has been attributed to a lack of preparedness and a lack of resources. Different counties across the U.S. have had varying rates of Covid-19 case growth due to area density, strictness of guidelines, deployment of Covid-19 vaccines, or other factors. The ability to analyze Covid-19 growth data across counties in the US and find similarities and possible relations between them could help predict future trends in the nation, so that resources can be allocated in a way that would allow the spread, number of cases, and mortality rate of the virus to decrease substantially in all counties. Graph is an expressive data structure and has been widely used to model complex data in many applications. Data-driven graph construction and graph learning methods have been proven to be an effective way of designing general machine learning algorithms and have achieved promising research results. In this project, we plan to leverage our prior work on graph mining MapReduce cloud computing algorithms for large-scale graph to develop graph clustering machine learning algorithms with applications in public health. The REU students will investigate graph clustering techniques for Covid-19 data and apply graph clustering machine learning algorithms to identify centers and dimeters of the Covid-19 case clusters. We will test the accuracy of our algorithms on a testbed that has been be created based on our preliminary work on network data using Apache Spark. We will also test the scalability of our algorithms in Amazon Elastic MapReduce, Google Colab, and XSEDE infrastructure. Image Processing and Computer Vision Algorithms for Sustainable Shellfish Farming Faculty Mentors: Drs. Enyue (Annie) Lu, Yuanwei Jin, Lei Zhang Aquaculture of shellfish such as oysters, mussels, and scallops provides a sustainable, environmentally beneficial source of high-protein food, as well a way to grow the economy in rural coastal areas. As the demand for seafood continues to surpass the supply of wild-caught fish and shellfish, sustainable aquaculture is becoming recognized as a solution for feeding a future global population of nine billion. Current practices and technologies used in shellfish farming lack the advancement found in today’s digital, automated world. Transforming traditional shellfish farming to sustainable smart farming demands new technologies, such as sensing and imaging, machine learning, artificial intelligence, high performance computing, computer vision, and robotics in the seeding, dredging, and harvesting processes. In this project, we propose to develop innovative image processing and computer vision algorithms by applying machine learning and high performance computing. By leveraging our current funded USDA research, this project will enable undergraduate students to develop algorithms based upon our previous REU work on imaging processing algorithms using deep neural networks. Students will dive into two sources of authentic data – data collected in water tanks in the SPIS-Lab at UMES and the data collected by underwater drones at Pacific Shellfish Institute to decipher what happens in the video frames in order to detect oysters and recognize the activities (active versus resting) of each individual oyster. Students will further design effective algorithms to monitor the oysters’ amount of interaction, length of interaction, growth, and its overall adaptivity to its surroundings, and develop a smart shellfish farming software system for crop inventory monitoring and identification of behaviors of oysters and their relationship with their habitat remotely. Finally, students will test the algorithms over commodity off-the-shelf GPU clusters for HPC implementation. GPU accelerated medical image reconstruction and processing Faculty Mentors: Dr. Yuanwei Jin and Dr. Enyue (Annie) Lu Image reconstruction and processing is a rapidly developing field based both on engineering, mathematics, and computer science. Algebraic Reconstruction Technique (ART) is a well known reconstruction method for computed tomography (CT) scanners. Although the ART method has many advantages over the popular filtered back-projection approaches, due to its high complexity, it is rarely applied in most of today’s medical CT systems. The typical medical environment requires fast reconstructions in order to save valuable time. Industrial solutions address the performance challenge using dedicated special-purpose reconstruction platforms with digital signal processors (DSPs) and field programmable gate arrays (FPGAs). The most apparent downside of such solutions is the loss of flexibility and their time-consuming implementation, which can lead to long innovation cycles. In contrast, research has already shown that current GPUs offer massively parallel processing capability that can handle the computational complexity of two-dimensional or three-dimensional cone beam reconstruction. In this project, students will accelerate a new iterative image reconstruction algorithm called “propagation and backpropagation (PBP)” image reconstruction method using Matlab computing with NVIDIA CUDA-enabled GPUs. Through the project, students will learn the basics of Matlab parallel computing for medical imaging with GPU support and gain understanding of the benefits of parallel processing for large scientific computing tasks based on a real-world medical imaging problem. Furthermore, students will be able to verify their algorithms using experimentally collected data through data measurement systems funded by a Department of Defense (DOD) award and an NSF Major Research Instrumentation (MRI) award. Deep Learning and Data Analytics for Remote Sensing Applications Faculty Mentor: Dr. Yuanwei Jin With massive amounts of computational power, machines can now recognize objects and translate speech in real time. Research in this area attempts to make better representations and create models to learn these representations from large-scale unlabeled data. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning algorithms attempt to learn multi-level representations of data, embodying a hierarchy of factors that may explain them. Various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have been applied to fields like computer vision, automatic speech recognition, and natural language processing where they have been demonstrated to be effective at uncovering underlying structure in data and producing state-of-the-art results on various tasks. In this project, we will focus on remote sensing applications such as radar target recognition and feature extraction of acoustic dispersion characteristics. For example, automatic target recognition based upon a sequence of synthetic aperture radar (SAR) images is an important task for both military and civilian applications. By employing emerging deep learning method applicable to SAR images and implementing the algorithms on commercial off-the-shelf graphics processing units (GPUs), significant improvement in recognition performance is expected. Flood detection by Deep Learning with Uninhabited Aerial Vehicle Radar Imagery Faculty Mentors: Yuanwei Jin and Enyue Lu Floods are the most frequent, disastrous, and widespread natural hazards. Research shows that floods account for more than 70% of hazard events occurring globally between 1994 and 2013. One of the leading causes of flooding in the US is hurricane. As climate change intensifies, the frequency and severity of floods is expected to increase, posing significant challenges for societies worldwide. One modern approach to minimize the risk associated with massive inundation is building radar technology, for example, NASA’s Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) to automate flood detection, allowing organizations and first responders to aid in the response and recovery efforts sooner. Unlike optical images, UAVSAR has its capability for disaster segmentation due to its ability to work in a variety of lighting and weather conditions even when clouds are present in the atmosphere. In this project, we will use images captured by NASA’s UAVSAR for flood detection based upon emerging machine learning and image processing techniques. Machine learning often refers to the use of data and learning algorithms to enable machines or computers to perform tasks by imitating intelligence human behaviors. In the field of flood detection, reports have shown that machine learning can be applied to flooded water segmentation utilizing UAVSAR include edge detection, random forests and clustering methods (k-means and fuzzy c-means). Moreover, deep learning (DL), a type of machine learning but with a more complex structure of algorithms to process and learn from data, has proven to be of great use for several applications of UAVSAR. We’ll focus on two major research tasks: (1) Revisiting flood zone maps using AI/DL and remote sensing integration; and (2) Measuring coastal land cover changes from flooding using AI/DL models. By leveraging our prior work of former REU students in the hurricane Florence in 2018 in North Carolina, this work will utilize of deep learning and AI methods to segment bodies of water in flooded areas caused by hurricanes using UAVSAR imagery. Exploring the Design of Optical Interconnected Multicore Computer Architectures Faculty Mentor: Lei Zhang In pursuing more powerful computing capability, multiple and even many computing cores are integrated into a chip. As a result the bottleneck of computing is shifted from the how fast a core can compute to how fast cores can transfer data to each other. By replacing the traditional electrical wire to optical waveguide, in the Optical Network-on-Chip (ONoC), computing cores are integrated to one chip by which they can communicate via lights. In addition, the ONoC offers better energy conservation because of the higher power efficiency in optical transmission. These outstanding properties enable the ONoC to be the most promising candidates in constructing the next generation super computers. In this project, we will explore the ONoC system design and development process. Students will be exposed to the advanced computer architecture concepts, optical computing theories, optoelectronics fundamentals, photonic VLSI design basics, and dynamically reconfigurable ONoC architectures. Through the project, participants will study the methodology of computing system architecture design, explore network topologies, play with mathematical tools, and develop software for simulation. Personality-Augmented Intelligent Agents and Their Behaviors in HPC Faculty Mentor: Dr. Randall Cone Visual representation and analysis of textual works have often aided human learning and understanding. In the Digital Age this is particularly true, given the advent of natural language processing, the wholesale availability of general programming languages, and the maturation of digital visualization. In our research, we eschew disciplinary boundaries to view and analyze classic literary and other textual works in unconventional ways. We study these texts with a sequence of progressively sophisticated content analysis and feature extraction software packages, many of which renders a useful artistic visual representation of a given text. To examine the entire corpus of an author’s (or group of authors’) works, appealing to the power of HPC is a natural choice. We have recently begun to take the above mentioned content analysis and feature extraction research into the realm of Artificial Intelligence (AI). Using a bootstrap of cognitive and emotional reaction vectors, we endow artificially intelligent agents with personalities, then observe their reactions to sets of textual information. This work currently incorporates the following technologies: WordNet, Word2Vec, neural networks, and a novel AI framework written in the Python programming language. Our future plans are to extend this work into distributed computing environments and HPC (via mpi4py), strongly coupling it with a study wherein we establish groups of personality-endowed AI to study a population of such intelligences and their behavior. Real-time Model Adaptation for Human Activity Recognition Faculty Mentor: Dr. Shuangquan (Peter) Wang Human activity recognition (HAR) is usually formulated as a classification problem to identify physical activities and body postures. It is used in various applications such as human-computer interaction, smart health, assisted living, sports, and entertainment. Recently, deep learning techniques have been widely adopted for HAR tasks because of their abilities in automatic feature extraction and high-level data representation. The strong learning ability of a deep learning model is often accompanied by its weak generalization ability, which degrades the model’s recognition performance in scenarios where user behavior variations, device deployment changes, and dynamic ambient environments are present. Thus, it is essential to adapt a HAR model that can evolve in real time with regard to different application scenarios. However, deep learning model adaptation may involve intensive computation. One challenge is how to accurately adapt deep learning HAR models to diverse application scenarios in real time. In this project, we will address the above research challenge by combining the following three strategies: 1) Parallel computing based HAR model training on GPUs. We will investigate how to minimize the training time of deep learning HAR models by taking full advantage of the parallel computing ability of GPUs; 2) Sample selection for model adaptation. Sample selection eliminates redundant training samples to further reduce the computation cost of model adaptation. We will investigate how to evaluate and select the most representative training samples; and 3) Incremental learning based model update. Model adaptation contains model retraining and/or model update. Compared with model retraining, model update adapts a model incrementally and requires less computation cost. We will investigate how to update deep learning HAR models using incremental learning methods. Through this project, students will learn the process of HAR and the related deep learning models, such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM). In addition, they will accumulate hands-on experience by conducting experiments on a public dataset using Matlab. Online Deep Network Model Adaptation for Facial Emotion Recognition Faculty Mentor: Dr. Shuangquan (Peter) Wang Facial emotion recognition (FER) aims to identify and distinguish human emotional states (joy, anger, fear, disgust, sadness, etc.) through analysis of facial images and/or videos, employing machine learning and artificial intelligence technologies. Its applications span healthcare, education, advertising, public safety, social media, and entertainment. In recent years, deep network models, especially convolutional neural networks (CNN), have revolutionized FER, outperforming traditional machine learning approaches that rely on manual feature engineering and simple algorithms. Nevertheless, achieving high-accurate FER remains challenging, particularly in diverse application scenarios, including highly personalized facial expressions (impacted by socio-cultural contexts, individual differences, etc.) and dynamic environments (impacted by lighting conditions, camera angles, backgrounds, facial occlusions, etc.). Therefore, it is essential to adapt FER models to diverse users and environments. However, retraining deep network models demands a large number of labeled samples and involves intensive computation to re-learn millions of model parameters, leading to unacceptable delays in real-world applications. One important research question is how to adapt deep network models online and in real-time for high-accurate emotion recognition across diverse application scenarios. In this project, we will tackle the aforementioned research question and address corresponding research challenges by combining the following three strategies: 1) Automated training sample generation via data augmentation. This project will explore how to automatically extract scenario characters (lighting conditions, backgrounds, facial occlusions, etc.) from live images/videos. Subsequently, we will investigate how to embed these scenario characters into existing training samples via data augmentation techniques to generate new training samples for model adaptation; 2) Transfer learning-based lightweight model adaptation. Leveraging transfer learning techniques, we will investigate how to select and freeze reusable layers of pre-trained models in new application scenarios. Then, this project will explore how to quickly update the unfrozen layers using lightweight parameter adjustment and optimization methods, based on the newly generated training samples; 3) Incremental learning-based online model update. Model adaptation will occur online and in real-time during application processes. We will investigate how to utilize incremental learning methods to update deep network models promptly while maintaining a balance between model stability and adaptability. Through this project, students will gain proficiency in FER and related deep network models. In addition, they will also accumulate hands-on experience through experimentation with public real-world datasets (e.g., FER2013 and AffectNet ) using Python programming language. Reeb Graph and Persistent Diagram of 3D Mesh Models Faculty Mentor: Junyi Tu Topological Data Analysis (TDA) has emerged as a new and promising field for processing, analyzing and understanding complex data and has gained great impetus in the last two decade. TDA has been applied in machine learning, computer vision, drug design, computer graphics and many other fields. The popularity of topology-based techniques is due in large part to their ability to capture the intrinsic property of data, robustness and their applicability to a wide variety of datasets and scientific domains. Reeb graph was originally proposed as a data structure to encode the geometric skeleton of 3D objects, but recently it has been re-purposed as an important tool in TDA. Reeb graph encodes the evolution of level sets obtained from a scalar function by sweeping the entire domain space and tracking the topology changes such as birth and death of the connected components in the level sets. The scalar fields on the 3D mesh model determine the shape of a Reeb graph. In this project, students will explore different scalar fields on the 3D mesh model. The first one is the Gaussian distribution on the vertices of 3D mesh model, and the second is the geodesic distance integral. We will deploy our algorithms on HPC machines to speed up the computing process. After computing Reeb graphs, we will obtain the persistent diagrams using the software. Another goal of this project is to have a better understanding of visualizations between Reeb graph and persistent diagram, and add an interaction interface to the visualization/user study in WebGL browser. The interaction interface will have the ability to adjust the number of contours shown, reduce low persistence features in Reeb graphs, and remove low persistence points in persistence diagrams. Illustrating n-tuple graphs and their internally disjoint paths Faculty Mentor: Alexander Halperin Can subcollections of nodes be used a coordinates for an even larger structure? Consider a graph G, which is a collection of points connected by lines, whose vertices have labels 1 through k. Now, imagine an n-tuple graph Un(G), whose points correspond to n-element subsets of {1,…,k} and whose lines connect nearly identical subsets. While small examples can be done by hand, illustrating Un(G) for large G is nearly impossible without HPC because of the large number of points and lines. It was recently shown that each pair of points in Un(G) has (n-1)(d-n+1)+1 internally disjoint paths between them. Viewing these (n-1)(d-n+1)+1 internally disjoint paths would provide insight into their structure and symmetry throughout Un(G). Students will be asked to create a program that displays Un(G) for graphs G and n as large as possible. Further, the program will highlight the (n-1)(d-n+1)+1 known internally disjoint paths between each pair of vertices as well as determine the (lack of) existence of other internally disjoint paths. An ideal program will also provide information on the partite sets, regularity, and symmetry of Un(G). Once a clear visualization of Un(G) is achieved, applications into social media bots and gerrymandered districts can be explored.