Computer Science Computer Vision And Pattern Recognition - Researchain

Featured Researches

Automated Rip Current Detection with Region based Convolutional Neural Networks

This paper presents a machine learning approach for the automatic identification of rip currents with breaking waves. Rip currents are dangerous fast moving currents of water that result in many deaths by sweeping people out to sea. Most people do not know how to recognize rip currents in order to avoid them. Furthermore, efforts to forecast rip currents are hindered by lack of observations to help train and validate hazard models. The presence of web cams and smart phones have made video and still imagery of the coast ubiquitous and provide a potential source of rip current observations. These same devices could aid public awareness of the presence of rip currents. What is lacking is a method to detect the presence or absence of rip currents from coastal imagery. This paper provides expert labeled training and test data sets for rip currents. We use Faster-RCNN and a custom temporal aggregation stage to make detections from still images or videos with higher measured accuracy than both humans and other methods of rip current detection previously reported in the literature.

Computer Vision And Pattern Recognition

Automatic Ship Classification Utilizing Bag of Deep Features

Detection and classification of ships based on their silhouette profiles in natural imagery is an important undertaking in computer science. This problem can be viewed from a variety of perspectives, including security, traffic control, and even militarism. Therefore, in each of the aforementioned applications, specific processing is required. In this paper, by applying the "bag of words" (BoW), a new method is presented that its words are the features that are obtained using pre-trained models of deep convolutional networks. , Three VGG models are utilized which provide superior accuracy in identifying objects. The regions of the image that are selected as the initial proposals are derived from a greedy algorithm on the key points generated by the Scale Invariant Feature Transform (SIFT) method. Using the deep features in the BOW method provides a good improvement in the recognition and classification of ships. Eventually, we obtained an accuracy of 91.8% in the classification of the ships which shows the improvement of about 5% compared to previous methods.

Computer Vision And Pattern Recognition

Automatic analysis of artistic paintings using information-based measures

The artistic community is increasingly relying on automatic computational analysis for authentication and classification of artistic paintings. In this paper, we identify hidden patterns and relationships present in artistic paintings by analysing their complexity, a measure that quantifies the sum of characteristics of an object. Specifically, we apply Normalized Compression (NC) and the Block Decomposition Method (BDM) to a dataset of 4,266 paintings from 91 authors and examine the potential of these information-based measures as descriptors of artistic paintings. Both measures consistently described the equivalent types of paintings, authors, and artistic movements. Moreover, combining the NC with a measure of the roughness of the paintings creates an efficient stylistic descriptor. Furthermore, by quantifying the local information of each painting, we define a fingerprint that describes critical information regarding the artists' style, their artistic influences, and shared techniques. More fundamentally, this information describes how each author typically composes and distributes the elements across the canvas and, therefore, how their work is perceived. Finally, we demonstrate that regional complexity and two-point height difference correlation function are useful auxiliary features that improve current methodologies in style and author classification of artistic paintings. The whole study is supported by an extensive website (this http URL) for fast author characterization and authentication.

Computer Vision And Pattern Recognition

BEDS: Bagging ensemble deep segmentation for nucleus segmentation with testing stage stain augmentation

Reducing outcome variance is an essential task in deep learning based medical image analysis. Bootstrap aggregating, also known as bagging, is a canonical ensemble algorithm for aggregating weak learners to become a strong learner. Random forest is one of the most powerful machine learning algorithms before deep learning era, whose superior performance is driven by fitting bagged decision trees (weak learners). Inspired by the random forest technique, we propose a simple bagging ensemble deep segmentation (BEDs) method to train multiple U-Nets with partial training data to segment dense nuclei on pathological images. The contributions of this study are three-fold: (1) developing a self-ensemble learning framework for nucleus segmentation; (2) aggregating testing stage augmentation with self-ensemble learning; and (3) elucidating the idea that self-ensemble and testing stage stain augmentation are complementary strategies for a superior segmentation performance. Implementation Detail: this https URL.

Computer Vision And Pattern Recognition

Benefits of Linear Conditioning with Metadata for Image Segmentation

Medical images are often accompanied by metadata describing the image (vendor, acquisition parameters) and the patient (disease type or severity, demographics, genomics). This metadata is usually disregarded by image segmentation methods. In this work, we adapt a linear conditioning method called FiLM (Feature-wise Linear Modulation) for image segmentation tasks. This FiLM adaptation enables integrating metadata into segmentation models for better performance. We observed an average Dice score increase of 5.1% on spinal cord tumor segmentation when incorporating the tumor type with FiLM. The metadata modulates the segmentation process through low-cost affine transformations applied on feature maps which can be included in any neural network's architecture. Additionally, we assess the relevance of segmentation FiLM layers for tackling common challenges in medical imaging: multi-class training with missing segmentations, model adaptation to multiple tasks, and training with a limited or unbalanced number of annotated data. Our results demonstrated the following benefits of FiLM for segmentation: FiLMed U-Net was robust to missing labels and reached higher Dice scores with few labels (up to 16.7%) compared to single-task U-Net. The code is open-source and available at this http URL.

Computer Vision And Pattern Recognition

Bidirectional Multi-scale Attention Networks for Semantic Segmentation of Oblique UAV Imagery

Semantic segmentation for aerial platforms has been one of the fundamental scene understanding task for the earth observation. Most of the semantic segmentation research focused on scenes captured in nadir view, in which objects have relatively smaller scale variation compared with scenes captured in oblique view. The huge scale variation of objects in oblique images limits the performance of deep neural networks (DNN) that process images in a single scale fashion. In order to tackle the scale variation issue, in this paper, we propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction. The experiments are conducted on the UAVid2020 dataset and have shown the effectiveness of our method. Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.

Computer Vision And Pattern Recognition

BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices

Face masks have long been used in many areas of everyday life to protect against the inhalation of hazardous fumes and particles. They also offer an effective solution in healthcare for bi-directional protection against air-borne diseases. Wearing and positioning the mask correctly is essential for its function. Convolutional neural networks (CNNs) offer an excellent solution for face recognition and classification of correct mask wearing and positioning. In the context of the ongoing COVID-19 pandemic, such algorithms can be used at entrances to corporate buildings, airports, shopping areas, and other indoor locations, to mitigate the spread of the virus. These application scenarios impose major challenges to the underlying compute platform. The inference hardware must be cheap, small and energy efficient, while providing sufficient memory and compute power to execute accurate CNNs at a reasonably low latency. To maintain data privacy of the public, all processing must remain on the edge-device, without any communication with cloud servers. To address these challenges, we present a low-power binary neural network classifier for correct facial-mask wear and positioning. The classification task is implemented on an embedded FPGA, performing high-throughput binary operations. Classification can take place at up to ~6400 frames-per-second, easily enabling multi-camera, speed-gate settings or statistics collection in crowd settings. When deployed on a single entrance or gate, the idle power consumption is reduced to 1.6W, improving the battery-life of the device. We achieve an accuracy of up to 98% for four wearing positions of the MaskedFace-Net dataset. To maintain equivalent classification accuracy for all face structures, skin-tones, hair types, and mask types, the algorithms are tested for their ability to generalize the relevant features over all subjects using the Grad-CAM approach.

Computer Vision And Pattern Recognition

Blocks World Revisited: The Effect of Self-Occlusion on Classification by Convolutional Neural Networks

Despite the recent successes in computer vision, there remain new avenues to explore. In this work, we propose a new dataset to investigate the effect of self-occlusion on deep neural networks. With TEOS (The Effect of Self-Occlusion), we propose a 3D blocks world dataset that focuses on the geometric shape of 3D objects and their omnipresent challenge of self-occlusion. We designed TEOS to investigate the role of self-occlusion in the context of object classification. Even though remarkable progress has been seen in object classification, self-occlusion is a challenge. In the real-world, self-occlusion of 3D objects still presents significant challenges for deep learning approaches. However, humans deal with this by deploying complex strategies, for instance, by changing the viewpoint or manipulating the scene to gather necessary information. With TEOS, we present a dataset of two difficulty levels (L1 and L2 ), containing 36 and 12 objects, respectively. We provide 738 uniformly sampled views of each object, their mask, object and camera position, orientation, amount of self-occlusion, as well as the CAD model of each object. We present baseline evaluations with five well-known classification deep neural networks and show that TEOS poses a significant challenge for all of them. The dataset, as well as the pre-trained models, are made publicly available for the scientific community under this https URL.

Computer Vision And Pattern Recognition

Boosting Deep Transfer Learning for COVID-19 Classification

COVID-19 classification using chest Computed Tomography (CT) has been found pragmatically useful by several studies. Due to the lack of annotated samples, these studies recommend transfer learning and explore the choices of pre-trained models and data augmentation. However, it is still unknown if there are better strategies than vanilla transfer learning for more accurate COVID-19 classification with limited CT data. This paper provides an affirmative answer, devising a novel `model' augmentation technique that allows a considerable performance boost to transfer learning for the task. Our method systematically reduces the distributional shift between the source and target domains and considers augmenting deep learning with complementary representation learning techniques. We establish the efficacy of our method with publicly available datasets and models, along with identifying contrasting observations in the previous studies.

Computer Vision And Pattern Recognition

Boundary-induced and scene-aggregated network for monocular depth prediction

Monocular depth prediction is an important task in scene understanding. It aims to predict the dense depth of a single RGB image. With the development of deep learning, the performance of this task has made great improvements. However, two issues remain unresolved: (1) The deep feature encodes the wrong farthest region in a scene, which leads to a distorted 3D structure of the predicted depth; (2) The low-level features are insufficient utilized, which makes it even harder to estimate the depth near the edge with sudden depth change. To tackle these two issues, we propose the Boundary-induced and Scene-aggregated network (BS-Net). In this network, the Depth Correlation Encoder (DCE) is first designed to obtain the contextual correlations between the regions in an image, and perceive the farthest region by considering the correlations. Meanwhile, the Bottom-Up Boundary Fusion (BUBF) module is designed to extract accurate boundary that indicates depth change. Finally, the Stripe Refinement module (SRM) is designed to refine the dense depth induced by the boundary cue, which improves the boundary accuracy of the predicted depth. Several experimental results on the NYUD v2 dataset and \xff{the iBims-1 dataset} illustrate the state-of-the-art performance of the proposed approach. And the SUN-RGBD dataset is employed to evaluate the generalization of our method. Code is available at this https URL.

Ready to get started?

Join us today

Archive Your Research