We propose incorporating human labelers in a model fine-tuning system that provides immediate user feedback. In our framework, human labelers can interactively query model predictions on unlabeled data, choose which data to label, and see the resulting effect on the model's predictions. This bi-directional feedback loop allows humans to learn how the model responds to new data. Our hypothesis is that this rich feedback allows human labelers to create mental models that enable them to better choose which biases to introduce to the model. We compare human-selected points to points selected using standard active learning methods. We further investigate how the fine-tuning methodology impacts the human labelers' performance. We implement this framework for fine-tuning high-resolution land cover segmentation models. Specifically, we fine-tune a deep neural network -- trained to segment high-resolution aerial imagery into different land cover classes in Maryland, USA -- to a new spatial area in New York, USA. The tight loop turns the algorithm and the human operator into a hybrid system that can produce land cover maps of a large area much more efficiently than the traditional workflows. Our framework has applications in geospatial machine learning settings where there is a practically limitless supply of unlabeled data, of which only a small fraction can feasibly be labeled through human efforts.
Batch normalization has been widely used to improve optimization in deep neural networks. While the uncertainty in batch statistics can act as a regularizer, using these dataset statistics specific to the training set impairs generalization in certain tasks. Recently, alternative methods for normalizing feature activations in neural networks have been proposed. Among them, group normalization has been shown to yield similar, in some domains even superior performance to batch normalization. All these methods utilize a learned affine transformation after the normalization operation to increase representational power. Methods used in conditional computation define the parameters of these transformations as learnable functions of conditioning information. In this work, we study whether and where the conditional formulation of group normalization can improve generalization compared to conditional batch normalization. We evaluate performances on the tasks of visual question answering, few-shot learning, and conditional image generation.
In the interpretability literature, attention is focused on understanding black-box classifiers, but many problems ranging from medicine through agriculture and crisis response in humanitarian aid are tackled by semantic segmentation models. The absence of interpretability for these canonical problems in computer vision motivates this study. In this study we present a usercentric approach that blends techniques from interpretability, representation learning, and interactive visualization. It allows to visualize and link latent representation to real data instances as well as qualitatively assess strength of predictions. We have applied our method to a deep learning model for semantic segmentation, U-Net, in a remote sensing application of building detection. This application is of high interest for humanitarian crisis response teams that rely on satellite images analysis.
Preparing and responding to humanitarian disasters requires accurate and timely mapping of affected regions. Foundational data such as roads, waterways, population settlements are critical in mapping evacuation routes, community gathering points, and resource allocation. Current approaches require time-intensive manual labeling from teams of crowdsource human volunteers, such as the Humanitarian OpenStreetMap Team (HOT). We are partnering with the American Red Cross to explore how machine learning techniques can be leveraged to automate the generation of accurate foundational maps from remote sensing data. Here, we describe two critical Red Cross missions in Uganda, our proposed application of machine learning, and the constraints and challenges we anticipate to encounter in deployment and evaluation. The American Red Cross described two missions where effectiveness is hampered by the lack of accurate foundational data: Pandemic Response: Containing outbreaks of diseases endemic to the region, such as viral hemorrhagic fevers, requires accessible facilities to act as local outposts to coordinate the response, and train healthcare workers. Severe flooding: Heavy rainfall can cause disruptive flooding in Uganda, rendering transportation infrastructure unusable and displacing hundreds of thousands of people, who often rely on emergency relief for food and clean water. These events are expected to become more frequent due to climate change. Flooding that coincides with outbreaks could exacerbate pandemics by disrupting communities’ evacuation routes and hindering aid organizations’ ability to bring in needed supplies. Quickly identifying viable infrastructure after flooding would accelerate the ability of aid organizations to respond. For both types of emergencies, well-annotated, reliable maps can provide emergency preparedness teams with crucial information needed to successfully and hastily conduct their missions.
Machine learning (ML) models based on RGB images are vulnerable to adversarial attacks, representing a potential cyber threat to the user. Adversarial examples are inputs maliciously constructed to induce errors by ML systems at test time. Recently, researchers also showed that such attacks can be successfully applied at test time to ML models based on multispectral imagery, suggesting this threat is likely to extend to the hyperspectral data space as well. Military communities across the world continue to grow their investment portfolios in multispectral and hyperspectral remote sensing, while expressing their interest in machine learning based systems. This paper aims at increasing the military community’s awareness of the adversarial threat and also in proposing ML training strategies and resilient solutions for state of the art artificial neural networks. Specifically, the paper introduces an adversarial detection network that explores domain specific knowledge of material response in the shortwave infrared spectrum, and a framework that jointly integrates an automatic band selection method for multispectral imagery with adversarial training and adversarial spectral rule-based detection. Experiment results show the effectiveness of the approach in an automatic semantic segmentation task using Digital Globe’s WorldView-3 satellite 16- band imagery.
We focus on the automatic 3D terrain segmentation problem using hyperspectral shortwave IR (HS-SWIR) imagery and 3D Digital Elevation Models (DEM). The datasets were independently collected, and metadata for the HS-SWIR dataset are unavailable. We explore an overall slope of the SWIR spectrum that correlates with the presence of moisture in soil to propose a band ratio test to be used as a proxy for soil moisture content to distinguish two broad classes of objects: live vegetation from impermeable manmade surface. We show that image based localization techniques combined with the Optimal Randomized RANdom Sample Consensus (RANSAC) algorithm achieve precise spatial matches between HS-SWIR data of a portion of downtown Los Angeles (LA (USA)) and the Visible image of a geo-registered 3D DEM, covering a wider-area of LA. Our spectral-elevation rule based approach yields an overall accuracy of 97.7%, segmenting the object classes into buildings, houses, trees, grass, and roads/parking lots.
The curse of dimensionality is a well-known phenomenon that arises when applying machine learning algorithms to highly-dimensional data; it degrades performance as a function of increasing dimension. Due to the high data dimensionality of multispectral and hyperspectral imagery, classifiers trained on limited samples with many spectral bands tend to overfit, leading to weak generalization capability. In this work, we propose an end-to-end framework to effectively integrate input feature selection into the training procedure of a deep neural network for dimensionality reduction. We show that Integrated Learning and Feature Selection (ILFS) significantly improves performance on neural networks for multispectral imagery applications. We also evaluate the proposed methodology as a potential defense against adversarial examples, which are malicious inputs carefully designed to fool a machine learning system. Our experimental results show that methods for generating adversarial examples designed for RGB space are also effective for multispectral imagery and that ILFS significantly mitigates their effect.
We focus on the problem of spatial feature correspondence between images generated by sensors operating in different regions of the spectrum, in particular the Visible (Vis: 0.4-0.7 m) and Shortwave Infrared (SWIR: 1.0-2.5 m). Under the assumption that only one of the available datasets is geospatial ortho-rectified (e.g., Vis), this spatial correspondence can play a major role in enabling a machine to automatically register SWIR and Vis images, representing the same swath, as the first step toward achieving a full geospatial ortho-rectification of, in this case, the SWIR dataset. Assuming further that the Vis images are associated with a Lidar derived Digital Elevation Model (DEM), corresponding local spatial features between SWIR and Vis images can also lead to the association of all of the additional data available in these sets, to include SWIR hyperspectral and elevation data. Such a data association may also be interpreted as data fusion from these two sensing modalities: hyperspectral and Lidar. We show that, using the Scale Invariant Feature Transformation (SIFT) and Optimal Randomized RANdom Sample Consensus (RANSAC) algorithm, a software method can successfully find spatial correspondence between SWIR and Vis images for a complete pixel by pixel alignment. Our method is validated through an experiment using a large SWIR hyperspectral data cube, representing a portion of Los Angeles, California, and a DEM with associated Vis images covering a significantly wider area of Los Angeles.
We address the problem of automatically fusing hyperspectral data of a digitized scene with an image-based 3D model, overlapping the same scene, in order to associate material spectra with corresponding height information for improved scene understanding. The datasets have been independently collected at different spatial resolutions by different aerial platforms and the georegistration information about the datasets is assumed to be insufficient or unavailable. We propose a method to solve the fusion problem by associating Scale Invariant Feature Transform (SIFT) descriptors from the hyperspectral data with the corresponding 3D point cloud in a large scale 3D model. We find the correspondences effi- ciently without affecting matching performance by limiting the initial search space to the centroids obtained after performing k-means clustering. Finally, we apply the Optimal Randomized RANdom Sample Consensus (RANSAC) algorithm to enforce geometric alignment of the hyperspectral images onto the 3D model. We present preliminary results that show the effectiveness of the method using two large datasets collected from drone-based sensors in an urban setting.
Following an initiative formalized in April 2016—formally known as ARL West—between the U.S. Army Research Laboratory (ARL) and University of Southern California’s Institute for Creative Technologies (USC ICT), a field experiment was coordinated and executed in the summer of 2016 by ARL, USC ICT, and Headwall Photonics. The purpose was to image part of the USC main campus in Los Angeles, USA, using two portable COTS (commercial off the shelf) aerial drone solutions for data acquisition, for photogrammetry (3D reconstruction from images), and fusion of hyperspectral data with the recovered set of 3D point clouds representing the target area. The research aims for determining the viability of having a machine capable of segmenting the target area into key material classes (e.g., manmade structures, live vegetation, water) for use in multiple purposes, to include providing the user with a more accurate scene understanding and enabling the unsupervised automatic sampling of meaningful material classes from the target area for adaptive semisupervised machine learning. In the latter, a target-set library may be used for automatic machine training with data of local material classes, as an example, to increase the prediction chances of machines recognizing targets. The field experiment and associated data post processing approach to correct for reflectance, georectify, recover the area’s dense point clouds from images, register spectral with elevation properties of scene surfaces from the independently collected datasets, and generate the desired scene segmented maps are discussed. Lessons learned from the experience are also highlighted throughout the paper.