Plantation forests provide critical ecosystem services and have experienced worldwide expansion during the past few decades. Accurate mapping of tree species through remote sensing is critical for managing plantation forests. The typical temporal behaviors and traits of tree species in satellite image time series (SITS) generate temporal and spectral features in multiple phenological stages that are critical to improve tree species mapping. However, the diverse input features, sequential relations and complex structures in SITS drastically increase the dimension and difficulty of spectral-temporal feature extraction, which challenges the capacity of many general classifiers not explicitly adapted for spectral-temporal learning. As a result, there is still a lack of a method that could automatically extract spectral-temporal features with high separability and regional adaptability from high-dimensional SITS for tree species mapping of plantation forests. Moreover, the effects of varying temporal resolution and feature combination on the plantation tree species mapping are under-explored. Here, we developed a multi-head attention-based method for automatically extracting spectral-temporal features with high separability based on a modified Transformer network (Transformer4SITS) for improved plantation tree species mapping. The end-to-end network model consists of a feature extraction module to learn deep spectral-temporal features from SITS and a fusion module to combine multiple features for improving mapping accuracy. We applied this method to two representative plantation forests in southern and northern China for tree species mapping. The results show that: (1) Transformer4SITS method could self-adaptively extract typical spectral-temporal features of key phenological stages (e.g., greenness rising and falling) from SITS, and achieved significantly improved accuracies by at most 15% in comparison with all four baseline methods (i.e., long short-term memory, harmonic analysis, time-weighted dynamic time warping, linear discriminant analysis); (2) time series with higher temporal resolution tended to produce more accurate species maps consistently across two sites, with their overall accuracies (OA) respectively increasing from 91.05% and 84.33% (60-day) to 94.88% and 88.72% (5-day), but the effect of high temporal resolution respectively leveled off around 90-day and 50-day resolution across two sites; (3) the mapping results using all available bands and two-band spectral indices outperformed the results using a subset of them, but with only modest increase in the accuracy (i.e., OA increased from 93.63% and 86.01% to 94.88% and 88.72%. This study thus provides a state-of-the-art deep learning-based method for improved tree species mapping, which is critical for sustainable management and biodiversity monitoring of plantation forests across large scales.
https://www.sciencedirect.com/science/article/pii/S0924271623002502?via%3Dihub