Disease in plane sight
What is this?
The oaks from Minnesota are being attacked by an invasive fungi that makes them wilt! We need a way to detect it across the landscape so that forest managers can control it. We are using visible and near-infrared technology mounted on a plane and machine learning algorithms!
Why would I care?
- Because oaks are one of the most valuable trees in North America both for society and nature.
- Because the disease has already spread over 24 states and is spreading into Canada.
- Because if you like drinking, you might want to preserve the trees used to build those nice barrels.
How did you do this?
We flew a plane with a special sensor over the forests of Minnesota, where different types of oaks are abundant. The oak wilt disease -which is caused by a fungus that parasites the wood of a group of oaks called red oaks- is commonly found attacking pockets of trees here. The sensor on this plane can capture the light reflected from any surface and measure how much light is reflected across wavelengths that cover from the visible range of light to wavelengths that we cannot see like the short wave infrared. In total, the sensor can see more than 400 types of light! Embedded in these different types of light is information about plants. We can tell if they are stressed or not, how much water they hold, how much sugars they are making through photosynthesis, and more. You can read more about what we can tell from light in this other post where we discover why we can detect this disease using light and develop methods to distinguish it from drought even before we see symptoms.
A few months after the plane captured images through the landscape, our team walked through the forest and identified red oak trees that had recently died from oak wilt and that were likely alive but diseased when the flights took place. We also identified red oak trees that were healthy, other oaks that are not killed by this disease such as white oaks, and trees from other species that are never affected by the disease. Identifying all these other trees was very important because we wanted to construct a machine-learning algorithm that could accurately distinguish diseased red oaks from any other kind of tree. Identifying trees that are not diseased and do not need treatment or removal is as important as identifying those that do. Lastly, we obtained high-precision coordinates for all these trees so that we could extract the pixels from the flight images that contained these trees. With that, we were ready to start building fancy machine-learning algorithms!
Just like with humans, algorithms do better when they only have to distinguish between two options rather than dozens. Thus, we built an algorithm that would sequentially ask "yes" or "no" questions when presented with a pixel from the image until the pixel is as "diseased" or "other". The questions that the algorithm asked are:
Does this pixel belong to an oak?
If so, does this pixel belong to a red oak or a white oak?
And if it is red oak, does this pixel belong to a healthy or diseased red oak?
Like us, algorithms like these need to be trained before they can answer questions. We did that by using the pixels that we identified as oaks, red oaks, and disease red oaks. The training process is like an exam. First, we present a portion of our extracted pixels to the algorithm and we say "Look, these are oaks and they should appear like this. These are not oaks and they should look like that". Then, during a test phase, we present the remaining extracted pixels and ask the algorithm to classify those based on what we just taught. This process is repeated hundreds of times with different groups of pixels until the algorithm has a set of models that distinguish the two requested groups the best it can. This is called "iteration". We performed this iterative training and testing process for each of the three questions to obtain the three sets of models that could accurately answer the questions.
Notice, however, that this training method makes the models for the second question very specific. This is because we trained the algorithm to distinguish red oak pixels from white oak pixels, but we have never presented a pixel from another species during this training. Thus, the model thinks that there are only red and white oaks in the world and everything must be one or the other. This means that this model will likely fail to handle other types of pixels. The same problem applies to the first question because we never presented anything else than a red oak pixel during that training. This is where teamwork makes the dream work. By making the pixels "flow" through the three models, the questions can be answered sequentially and the pixels get funneled into their corresponding class with maximum accuracy.
The diagram above shows how this process works. The top boxes represent the training process in which we used 75% of the data and the bottom triangle represents the funneling process in which we used the remaining 25% of the data. The whole process is also iterated so that we can see how well it performs with different subsets of data.
So what did you find?
So... Does it work? The answer is yes! The graph on the right shows the percentage of times that the algorithm correctly or incorrectly predicted the true identity of a pixel. Blue circles are correct predictions and red circles are incorrect predictions. The small colored boxes represent the success rates of the testing of each question.
On average, the first two questions correctly classified pixels 97% of the time. For the third question, the algorithm correctly classified diseased red oaks 87% of the time and healthy red oaks 95% of the time. That's great, but as explained above these models must work together.
When linked together to separate diseased red oaks from any other kind of pixel, the algorithm correctly classified 71% of pixels belonging to diseased red oaks and 97.3% of pixels belonging to any other class. That's very good! It's not perfect, but remember that we are detecting a disease from a plane across the landscape.
There are a couple of things that could explain the incorrect classifications. First, it could be that we incorrectly identified some trees as diseased when we went to the forest. This would reduce the accuracy of the models as we would be "confusing" them during the training session. Second, it could be that some trees were diseased when we went to the forest, but that they were still quite healthy when the flight took place. The algorithm would correctly classify those pixels as healthy, but they would end up counted as a misclassification due to the identity that we assigned to them. Last, it could be that the few pixels incorrectly classified as oaks or as red oaks in the first and second stages of the algorithm end up incorrectly classified as diseased during the final stage of the algorithm. Regardless, the rate of success is promising for everyone working in the detection of this disease.
You might be thinking at this point that planes with expensive sensors might not be accessible to everyone, and you are correct. A small drone would be much more accessible because they are affordable. However, affordable drones are small and they cannot carry the load of the sensor used in this plane. Thus, they are often limited to simpler sensors that can only measure a few selected wavelengths of the light spectrum. This is why we took the opportunity to ask our algorithm which light wavelengths are the most important to differentiate each of our target classes. The points in the three panels below show, from top to bottom, the 20 most important wavelengths to distinguish diseased from healthy red oaks, red oaks from white oaks, and oaks from other species.
The vertical lines in the top panel indicate wavelengths that are commonly used in drone-compatible spectral indexes. This graph allows us to select a few of these already-known indexes that contain wavelengths that should be able to detect diseased red oaks. We can then process our flight images and color them using these indexes to see if diseased red oaks look different than healthy red oaks. For instance, numbers 4, 10, and 13 in the figure show high-importance wavelengths and correspond to the Chlorophyl Index (CI), Carter-Miller Stress index (CMS), and Water Band Index in the Short-Wave InfraRed index (WBI SWIR). Let's see how the flight image looks like in the typical red, green, and blue wavelengths that we see through our eyes compared to a combination of these three indexes.
The top picture is the image as our eyes would see it. I marked four healthy trees with a white point, a dead tree in orange, and three diseased trees in blue. The diseased trees look just like the healthy trees and only the dead tree can be distinguished from the others.
The bottom picture is the same image seen through the combination of the CMS, CI, and WBI SWIR indexes. You can clearly see that the diseased trees are lighter in color and show areas with similar tonalities to that of the dead tree.
These three indexes are just some of the ones that we found to work, but there are more, and several are compatible with low-cost drones. These results therefore bring much-needed hope to those who are currently fighting against this disease on foot.
And the big point is...?
Oak forests play a vital role in providing ecosystem services like habitat, climate regulation, clean air and water, and erosion control across North America. These forests are threatened by diseases like oak wilt. Oak wilt has spread across the continent severely affecting red oaks. Our current ways to detect oak wilt across the landscape rely on crews of experts that survey the forests from the ground. You can imagine how much this limits our ability to detect oak wilt across a continent. Thus, an approach that can do this from the air will facilitate the labor. This work also serves as a proof of concept to attempt something similar using current and incoming satellites that carry sensors similar to the ones used here.
The actual paper for the nerds: