How To Do Reinforcement Learning in Real Life: Example datasets from the reinforcement learning experiment. Top Left: Benign samples; Top Right: Malignant samples; Bottom Left: No calcifications; Bottom Right: Calcifications.

Reinforcement learning is a key branch of machine learning that finds applications in many fields. At its core, reinforcement learning focuses on modelling intelligent agents as algorithms. These agents in turn aim to make decisions in their environment(s) that maximise cumulative reward.

Reinforcement learning is an actively growing field where mathematics and computer science integrate to produce practical results. The growth of computational power over the years has certainly helped the cause of reinforcement learning.

However, as I was researching this topic most recently, I came across a science project that bucked the trend and followed a very unique approach. In this project, scientific researchers chose not to employ computer science to execute reinforced learning, but behavioural psychology instead.

Thatā€™s right. We are talking aboutĀ applying reinforcement learning algorithms to real-life intelligent agents. If you think about it, such an approach goes back to the roots of learning algorithms and epistemology. In this article, I aim to cover the details of this unique project and touch upon what implications such an approach might have on the further development of reinforcement learning as a field.

This essay is supported by Generatebg

Project Background ā€” Identifying Breast Cancer

When it comes to identifying cancers in general, Pathology and Radiology are two fields that play a key role. There are constant developments of new automated technology in both Pathology as well as Radiology.

In order to test such new technology, the medical device manufacturers need access to expensive pathologists and radiologists. Such experts supervise and accompany the development process of the latest technology until it meets the stringent norms to be classified as market-ready.

A team of researchers set out to see if non-experts could be trained using reinforcement learning methods to produce comparable results in pathology and radiology. For this purpose, they chose histological and mammogram images related to cases of breast cancer.

To put it simply, they aimed to see if it was possible to train non-experts using reinforcement learning into experts at (visually) detecting breast cancer cases. This pretty much wraps up the project background.


The Reinforcement Learning Procedure

The researchers first grouped the real-life non-expert agents into two groups. Group 1 received normal images at different levels of magnification, whereas group 2 received hue- and brightness-balanced monochrome images at different levels of magnification.

For the first few days, all agents were trained using picture slides that were shown to them on a touchscreen. The screen featured two binary options ā€” benign or malignant (in simpler terms: cancerous or not cancerous). Each time an agent got the visual analysis right, the agent was rewarded (more on the reward mechanism later). Each time an agent got the visual analysis wrong, the same slide was flashed once more.

How To Do Reinforcement Learning in Real Life: Examples showing results post training. On the left, it is seen that the agents show logarithmic improvement from 1ā€“15 days. On the right, it is seen that at each magnification level (4x, 10x, and 20x), the success rate is lower on rotated datasets as compared to original datasets.
Results post trainig with breast histopathology samples (Image Credit: Levenson et al)

Once all agents showed logarithmic improvement in the visual analysis of the histological and mammogram images, they were tested using variations (rotations) of the same image set. From this phase onward, all agents were rewarded regardless of whether they got the analysis correct or wrong. This was done to stop the ā€˜learning phaseā€™ and study the ā€˜performance phaseā€™.

Finally, each agent was subject to a novel image set that was not part of the training image set. The novel image set was considered to be significantly difficult and was prepared under the supervision of experts.

In addition to this general procedure, the researchers tested further subtleties such as the effects of image treatment (brightness-/hue-) levels and image compression.

The Results ā€” Novice to Expert in 15 Days?

All agents involved started with a success rate of around 50% (on par with mere chance). By the fifteenth day, they achieved a success rate of around 85%. This accomplishment did not exactly put them on par with experts but was good enough to put them in a comparable range.

Further testing (beyond the first 15 days) revealed that the agents were able to show comparable performance (to the experts) as both pathologists and radiologists.

The only situations where there was a significant performance dip were with novel mammogram datasets and when image compression was involved.

Based on these results, can we conclude that reinforcement learning can be employed to train cheaper non-experts into experts for the purpose of developing medical technology? Before we answer that question, there is a key detail that I have deliberately kept from you until now. We need to look at this key piece of information before we can make any conclusions.


The Catch

Remember when I told you that I will be sharing more information about how the agents were rewarded later in the article? Well, it is actually time for that now. Each time an agent got the analysis right, the agent was rewarded with seeds. Thatā€™s right; seeds! But why?

That is because these real-life intelligent agents were not human beings. They were pigeons. You might be fuming at me for having kept this key piece of information from you until now. But I chose to deliberately do so to demonstrate our distorted association of the word ā€˜intelligenceā€™.

We associate intelligence with adjectives such as ā€˜human intelligenceā€™ or ā€˜artificial intelligenceā€™. It seldom occurs to us that there also exist other forms of intelligence such as ā€˜animal intelligenceā€™ or ā€˜pigeon intelligenceā€™.

The researchers chose pigeons specifically for this project because of their comparable optical processing systems to human optical systems. At the end of the project, the results showed that reinforcement learning could be used to train pigeon agents to detect cancer with a success-rate comparable to human experts.

How To Do Reinforcement Learning in Real Life: The agentā€™s training environment. A pigeon is seen sitting inside a box with an open end (to enable us to see what happens inside). The pigeon is looking at a touch screen where the data sample is shown at the centre. On the left is a blue rectangular patch, and on the right is a yellow rectangular patch.
The agentā€™s training environment (Image Credit: Levenson et al)

Whatā€™s more, when the researchers averaged the results from all agents to utilize ā€œflock sourcingā€, they success rate jumped further closer to the human experts.

What Does this Mean for the Future of Reinforcement Learning?

With extensive mathematical and computational research in the field of machine learning and reinforcement learning, there is no doubt that humanity is taking advantage of the latest computational capabilities available.

However, what this science project shows us is that there are other processing options available to us. The computers used to model intelligent agents need not be digital. We could use natureā€™s diverse manifestations of intelligent agents (within ethical bounds, of course) to further develop practical solutions for our advanced day-to-day problems using methods such as reinforcement learning.

Although the researchers in this science project managed to establish the significance of pigeons as potential agents for reinforcement learning, they did not get into the mechanics of what exactly enables pigeons to not just memorize training datasets, but generalize learning to unseen novel datasets as well.

Understanding how natural agents learn (in the context of specific datasets) could be an area of research for the future.

To conclude, it is both refreshing and promising to see such novel approaches to reinforcement learning beyond the fields of mathematics and computer science.


Credit/Source: Levenson et al. (scientific research article).

I hope you found this article interesting and useful. If youā€™d like to get notified when interesting content gets published here, consider subscribing.

Further reading that might interest you: Logarithms: The Long Forgotten Story Of Scientific Progress and What Is Special About Absolute Zero That Makes It Impossible?

Street Science

Explore humanity's most curious questions!

Sign up to receive more of our awesome content in your inbox!

Select your update frequency:

We donā€™t spam! Read our privacy policy for more info.