Yanyi Zhang
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yanyi Zhang.
international conference on embedded networked sensor systems | 2016
Xinyu Li; Yanyi Zhang; Ivan Marsic; Aleksandra Sarcevic; Randall S. Burd
We present a system for activity recognition from passive RFID data using a deep convolutional neural network. We directly feed the RFID data into a deep convolutional neural network for activity recognition instead of selecting features and using a cascade structure that first detects object use from RFID data followed by predicting the activity. Because our system treats activity recognition as a multi-class classification problem, it is scalable for applications with large number of activity classes. We tested our system using RFID data collected in a trauma room, including 14 hours of RFID data from 16 actual trauma resuscitations. Our system outperformed existing systems developed for activity recognition and achieved similar performance with process-phase detection as systems that require wearable sensors or manually-generated input. We also analyzed the strengths and limitations of our current deep learning architecture for activity recognition from RFID data.
ubiquitous computing | 2016
Xinyu Li; Yanyi Zhang; Mengzhu Li; Shuhong Chen; Farneth R. Austin; Ivan Marsic; Randall S. Burd
We present a multimodal deep-learning structure that automatically predicts phases of the trauma resuscitation process in real-time. The system first pre-processes the audio and video streams captured by a Kinects built-in microphone array and depth sensor. A multimodal deep learning structure then extracts video and audio features, which are later combined through a “slow fusion” model. The final decision is then made from the combined features through a modified softmax classification layer. The model was trained on 20 trauma resuscitation cases (>13 hours), and was tested on 5 other cases. Our results showed over 80% online detection accuracy with 0.7 F-Score, outperforming previous systems.
Proceedings of the Eighth Wireless of the Students, by the Students, and for the Students Workshop on | 2016
Xinyu Li; Yanyi Zhang; Mengzhu Li; Ivan Marsic; JaeWon Yang; Randall S. Burd
We propose a Deep Neural Network (DNN) structure for RFID-based activity recognition. RFID data collected from several reader antennas with overlapping coverage have potential spatiotemporal relationships that can be used for object tracking. We augmented the standard fully-connected DNN structure with additional pooling layers to extract the most representative features. For model training and testing, we used RFID data from 12 tagged objects collected during 25 actual trauma resuscitations. Our results showed 76% recognition micro-accuracy for 7 resuscitation activities and 85% average micro-accuracy for 5 resuscitation phases, which is similar to existing system that, however, require the user to wear an RFID antenna.
acm multimedia | 2017
Xinyu Li; Yanyi Zhang; Jianyu Zhang; Yueyang Chen; Huangcan Li; Ivan Marsic; Randall S. Burd
We present a method for activity recognition that first estimates the activity performers location and uses it with input data for activity recognition. Existing approaches directly take video frames or entire video for feature extraction and recognition, and treat the classifier as a black box. Our method first locates the activities in each input video frame by generating an activity mask using a conditional generative adversarial network (cGAN). The generated mask is appended to color channels of input images and fed into a VGG-LSTM network for activity recognition. To test our system, we produced two datasets with manually created masks, one containing Olympic sports activities and the other containing trauma resuscitation activities. Our system makes activity prediction for each video frame and achieves performance comparable to the state-of-the-art systems while simultaneously outlining the location of the activity. We show how the generated masks facilitate the learning of features that are representative of the activity rather than accidental surrounding information.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | 2017
Xinyu Li; Yanyi Zhang; Jianyu Zhang; Moliang Zhou; Shuhong Chen; Yue Gu; Yueyang Chen; Ivan Marsic; Richard A. Farneth; Randall S. Burd
Process modeling and understanding are fundamental for advanced human-computer interfaces and automation systems. Most recent research has focused on activity recognition, but little has been done on sensor-based detection of process progress. We introduce a real-time, sensor-based system for modeling, recognizing and estimating the progress of a work process. We implemented a multimodal deep learning structure to extract the relevant spatio-temporal features from multiple sensory inputs and used a novel deep regression structure for overall completeness estimation. Using process completeness estimation with a Gaussian mixture model, our system can predict the phase for sequential processes. The performance speed, calculated using completeness estimation, allows online estimation of the remaining time. To train our system, we introduced a novel rectified hyperbolic tangent (rtanh) activation function and conditional loss. Our system was tested on data obtained from the medical process (trauma resuscitation) and sports events (Olympic swimming competition). Our system outperformed the existing trauma-resuscitation phase detectors with a phase detection accuracy of over 86%, an F1-score of 0.67, a completeness estimation error of under 12.6%, and a remaining-time estimation error of less than 7.5 minutes. For the Olympic swimming dataset, our system achieved an accuracy of 88%, an F1-score of 0.58, a completeness estimation error of 6.3% and a remaining-time estimation error of 2.9 minutes.
international conference on image and signal processing | 2016
Xinyu Li; Yanyi Zhang; Ivan Marsic; Randall S. Burd
We present a novel and efficient room layout mapping strategy that does not reveal people’s identity. The system uses only a Kinect depth sensor instead of RGB cameras or a high-resolution depth sensor. The users’ facial details will neither be captured nor recognized by the system. The system recognizes and localizes 3D objects in an indoor environment, that includes the furniture and equipment, and generates a 2D map of room layout. Our system accomplishes layout mapping in three steps. First, it converts a depth image from the Kinect into a top-view image. Second, our system processes the top-view image by restoring the missing information from occlusion caused by moving people and random noise from Kinect depth sensor. Third, it recognizes and localizes different objects based on their shape and height for a given top-view image. We evaluated this system in two challenging real-world application scenarios: a laboratory room with four people present and a trauma room with up to 10 people during actual trauma resuscitations. The system achieved 80 % object recognition accuracy with 9.25 cm average layout mapping error for the laboratory furniture scenario and 82 % object recognition accuracy for the trauma resuscitation scenario during six actual trauma cases.
information processing in sensor networks | 2017
Xinyu Li; Yanyi Zhang; Jianyu Zhang; Shuhong Chen; Yue Gu; Richard A. Farneth; Ivan Marsic; Randall S. Burd
We present a deep learning framework for fast 3D activity localization and tracking in a dynamic and crowded real world setting. Our training approach reverses the traditional activity localization approach, which first estimates the possible location of activities and then predicts their occurrence. Instead, we first trained a deep convolutional neural network for activity recognition using depth video and RFID data as input, and then used the activation maps of the network to locate the recognized activity in the 3D space. Our system achieved around 20cm average localization error (in a 4m by 5m room) which is comparable to Kinects body skeleton tracking error (10-20cm), but our system tracks activities instead of Kinects location of people.We present a deep learning framework for fast 3D activity localization and tracking in a dynamic and crowded real world setting. Our training approach reverses the traditional activity localization approach, which first estimates the possible location of activities and then predicts their occurrence. Instead, we first trained a deep convolutional neural network for activity recognition using depth video and RFID data as input, and then used the activation maps of the network to locate the recognized activity in the 3D space. Our system achieved around 20cm average localization error (in a 4m × 5m room) which is comparable to Kinects body skeleton tracking error (10--20cm), but our system tracks activities instead of Kinects location of people.
information processing in sensor networks | 2017
Yanyi Zhang; Xinyu Li; Jianyu Zhang; Shuhong Chen; Moliang Zhou; Richard A. Farneth; Ivan Marsic; Randall S. Burd
We introduce the Concurrent Activity Recognizer (CAR) - an efficient deep learning structure that recognizes complex concurrent teamwork activities from multimodal data. We implemented the system in a challenging medical setting, where it recognizes 35 different activities using Kinect depth video and data from passive RFID tags on 25 types of medical objects. Our preliminary results showed our system achieved an 84% average accuracy with 0.20 F1-Score.
information processing in sensor networks | 2017
Yanyi Zhang; Xinyu Li; Jianyu Zhang; Shuhong Chen; Moliang Zhou; Richard A. Farneth; Ivan Marsic; Randall S. Burd
information processing in sensor networks | 2017
Xinyu Li; Yanyi Zhang; Jianyu Zhang; Shuhong Chen; Yue Gu; Richard A. Farneth; Ivan Marsic; Randall S. Burd