Improved Forest Signal Detection for Space-Borne Photon-Counting LiDAR Using Automatic Machine Learning

NASA’s (National Aeronautics and Space Administration) ICESat-2 with a Photon Counting LiDAR (Light Detection And Ranging) Sensor sensitively detects signal photons at high speed with an advanced detection system called the Advanced Topographic Laser Altimeter System (ATLAS). However, the sensor also extracts a large amount of background photon noise coming from the atmosphere, ground, sun, or other radiation. This condition is particularly evident in forest areas. This study proposes an automatic machine learning approach to utilize data for forestry applications to improve data availability compared to NASA’s official product. Our method uses only a very limited number (10%) of sample points for training, ensuring operational efficiency and training accuracy. We conclude that the integrated learning performance generally outperforms single models, and the mean F1 score of all tests is approximately 0.9. The mean F1 score of the Stacked Ensembles model is 0.957 ahead of the other models. The top three variables used in training models are kNNdist5, kNNdist10, and h. These three variables could explain 51.6% of the components of the models. Over the regions tested, the proposed method could improve the proportion of signals correctly identified by 6.4%, 12.2%, 2.7%, 9.3%, and 1.4% in five datasets. The model performs better in low signal-to-noise (SNR) datasets less than 7.5. The method would be largely unaffected by differences in topography, noise distribution, and SNR. The classifiers could correct misclassified labels in ATL08 products and show good stability in different conditions.NASA’s (National Aeronautics and Space Administration) ICESat-2 with a Photon Counting LiDAR (Light Detection And Ranging) Sensor sensitively detects signal photons at high speed with an advanced detection system called the Advanced Topographic Laser Altimeter System (ATLAS). However, the sensor also extracts a large amount of background photon noise coming from the atmosphere, ground, sun, or other radiation. This condition is particularly evident in forest areas. This study proposes an automatic machine learning approach to utilize data for forestry applications to improve data availability compared to NASA’s official product. Our method uses only a very limited number (10%) of sample points for training, ensuring operational efficiency and training accuracy. We conclude that the integrated learning performance generally outperforms single models, and the mean F1 score of all tests is approximately 0.9. The mean F1 score of the Stacked Ensembles model is 0.957 ahead of the other models. The top three variables used in training models are kNNdist5, kNNdist10, and h. These three variables could explain 51.6% of the components of the models. Over the regions tested, the proposed method could improve the proportion of signals correctly identified by 6.4%, 12.2%, 2.7%, 9.3%, and 1.4% in five datasets. The model performs better in low signal-to-noise (SNR) datasets less than 7.5. The method would be largely unaffected by differences in topography, noise distribution, and SNR. The classifiers could correct misclassified labels in ATL08 products and show good stability in different conditions. Leer más