University of Wisconsin professor of soil science Jingyi Huang and data scientist Maria Oros worked over the summer on a new modeling tool for soil scientists. The pair used machine learning and public data to build the Soil Organic Carbon Assistant, which models changes in soil organic carbon.
According to the project summary, the SOCA was created to help model changes to soil organic carbon stocks across the country. Soil organic carbon is the component of soil composed of organic carbon matter, according to the Department of Primary Industries and Regional Development. It typically comprises 2-10% of the matter in soils.
Soil organic carbon is a major sequestering site for atmospheric carbon and a key factor in fighting climate change. By using machine learning modeling, Huang said they hope to better understand and forecast changes in soil organic carbon and its impacts on agriculture, climate change and policy.
Oros said the SOCA is a combination of a data set, a modeling tool and a visualizer. The SOCA runs on an online applet which contains an interactive map with all of their data points from soil organic carbon samples across the U.S. They included a tab which quickly provides a matrix showing the relationship between soil organic carbon and other variables, like precipitation, temperature and sand in the soil.
“We are interested in assessing or describing how these variables correlate with different levels of soil organic carbon,” Oros said.
Huang said the project was originally an extension of his work in 2019 focusing on soil organic carbon change over the past 150 years in Wisconsin. Huang wanted to create a historic database of soil organic carbon across the state utilizing machine learning to fill in missing data. Off the success of the paper, Huang set his sights on a bigger project — creating a dataset and visualization for the entire country.
UW’s Data Science Institute helped make the project possible, pairing data scientists with researchers to help make their work accessible for other researchers and students. For Huang, the access to extra computing resources provided the project a boost. It also opened up the ability to communicate their research in visualizations available to other researchers.
“First, we [didn’t] have the hardware to run the computations in a fast way, second we [didn’t], have a platform to host this service,” Huang said. “All the previous models were offline.”
Oros provided the expertise needed to both create visualizations for the project and the modeling provided by machine learning. The SOCA’s dataset and code are both fully public, which Huang and Oros hope will let researchers and students easily access it for their own research.
Huang said the most difficult part of research is visualizing data with a team of people that are not specialized in data science. He said the ability for researchers to easily translate their findings into visualizations without having to dig through both the data and source code for programs is key to communication.
Oros said having a public app allows other researchers to directly use their own models on the data or retrieve the code for the current model to run on their data sets. The amount of data points per state varies broadly, with some states like Missouri having hundreds of samples, but Ohio only has a handful.
The advances in AI also could create new paths for the project, Oros said. Outside of the buzz of LLMs, machine learning has also made leaps recently which could help the project and researchers in the future. Oros said they are open to new approaches in the future.
“We are open to applying new techniques or going through AI development to describe soil organic carbon and to understand and to provide insights for people that will use it in the future,” Oros said.
Huang hopes this step, especially the machine learning component, can help both model soil organic carbon across the country and teach about the process of machine learning.
Huang said the project could find applications outside of the classroom with farmers or conservationists who need to understand the history of soil organic carbon and its implications for their work. Looking at the history of Wisconsin, Huang said the impact of being able to model the changes in soil organic carbon for farmers could be enormous.
Wisconsin is one of the few places on Earth where mollisol or “black earth” soil is found, the other major regions being the Midwest and Great Plains of the U.S. and Ukraine. These soils are incredibly fertile and delicate, having taken millions of years to develop.
“If you destroy those soils, it can take millions of years to recover,” Huang said. “One of the indicators of the quality of the soil is carbon. The more soil organic carbon, the more fertile and the [blacker] it is.”