Abstract
Accurate and timely crop type classification is essential for effective agricultural monitoring, cropland management, and yield estimation. Unfortunately, the complicated temporal patterns of different crops, combined with gaps and noise in satellite observations caused by clouds and rain, restrict crop classification accuracy, particularly during early seasons with limited temporal information. Although deep learning-based methods have exhibited great potential for improving crop type mapping, insufficient and noisy training data may lead them to overlook more generalizable features and derive inferior classification performance. To address these challenges, we developed a Mask Pixel-set SpatioTemporal Integration Network (Mask-PSTIN), which integrates a temporal random masking technique and a novel PSTIN model. Temporal random masking augments the training data by selectively removing certain temporal information to improve data variability, enforcing the model to learn more generalized features. The PSTIN, comprising a pixel-set aggregation encoder (PSAE) and long short-term memory (LSTM) module, effectively captures comprehensive spatiotemporal features from time-series satellite images. The effectiveness of Mask-PSTIN was evaluated across three regions with different landscapes and cropping systems. Results demonstrated that the addition of PSAE in PSTIN significantly improved crop classification accuracy compared to a basic LSTM, with average overall accuracy (OA) increasing from 80.9% to 83.9%, and the mean F1-Score (mF1) rising from 0.781 to 0.818. Incorporating temporal random masking in training led to further improvements, increasing average OA and mF1 to 87.4% and 0.865, respectively. The Mask-PSTIN significantly outperformed traditional machine learning and deep learning methods (i.e., RF, SVM, Transformer, and CNN-LSTM) in crop type mapping across all three regions. Furthermore, Mask-PSTIN enabled earlier and more accurate crop type identification before or during their developing stages compared to machine learning models. Feature importance analysis based on the gradient backpropagation algorithm revealed that Mask-PSTIN effectively leveraged multi-temporal features, exhibiting broader attention across various time steps and capturing critical crop phenological characteristics. These results suggest that Mask-PSTIN is a promising approach for improving both post-harvest and early-season crop type classification, with potential applications in agricultural management and monitoring.