Physics-Guided ML for Global Sensor Data

Trained and benchmarked ML models on high-frequency IoT data (124 global locations), implementing robust data-splitting to ensure real-world reliability.

View Source

View Presentation

Python Scikit-Learn TensorFlow Data Pipelines

Inspiration & Context

Evapotranspiration is one of the most difficult variables to measure directly in the field. At the same time, it is a critical variable in analyzing the global water cycle. I saw great potential in machine learning models to fill this gap for locations where measurement is practically impossible. By leveraging the massive FLUXNET dataset, my main question was: despite their success in other fields, could these models actually learn physical patterns that are generalizable to unseen locations?

Group By:

Global distribution of FLUXNET sites used in my analysis, categorized by IGBP vegetation types and Koppen climate zones.

I also noticed that the scientific community typically trains and tests models on the exact same locations, which leads to a spatial data leak. And that is why I decided to take three of the most powerful ML regression models and put them to the test.

Physics-Guided ML for Global Sensor Data

THE PROBLEM

THE ACTION

THE RESULT

Inspiration & Context

New Geographical Locations