Every day in life, you are faced with situations where you have to choose between favourite experiences and trying new ones. This challenge is so subtly woven into the fabric of human existence, that it goes almost undetected.
Even before you begin your day, you start choosing between getting up on the alarm-tone (a new experience) or hitting the snooze button (a favoured experience); you choose to hit the snooze. Once you get up, you quickly evaluate if youâd like to stick with your usual porridge breakfast (a favoured experience) or invest time in a lavish breakfast (a new experience).
You figure that the lavish breakfast comes at the cost of time; time that you would need to make it to work on time. So, you stick with your usual breakfast.
After you arrive at work, you are treated with a technical problem. Youâve already solved this problem before, so you proceed to solve it using your tried and trusted approach. At this point, your boss pops over and insists that you solve it using a novel approach that is supposedly quicker. Reluctantly, you agree and proceed to try the new novel approach. It works, and you solve the issue faster.
After work, you are torn between trying a new restaurant in town and just ordering pizza and chilling at home while streaming your favourite series. I could go on and on, but you get the picture.
In this essay, we will be answering the question of how to choose between favourite experiences and trying new ones with the help of science and logic. At the same time, we will not be venturing deeply into technicalities and overcomplicate things for ourselves. With this in mind, let us begin.
This essay is supported by Generatebg
Choose Between Favourite Experiences and Trying New Ones â Beyond Daily Life
To begin, let us cover three hypothetical examples of situations where this problem occurs beyond just daily-life decisions.
Imagine that you are an undergraduate at a university. You have a rough idea of which subjects you like and in which direction youâd like to start your career. But you are not entirely sure. So, you start trying out a few niche courses that catch your eye. After a month of trying out a bunch of courses, you find yourself choosing between which ones to stick to and which ones to ditch. If you think thatâs a tricky conundrum, hereâs another one:
Imagine that you have been together with your high-school sweetheart for over a decade. You really value this person and the comfort they bring to your life. With this person, you can just be you! However, there is this tingling feeling inside of you. It says that you have no real âbenchmarkâ experience in romantic relationships in life. It asks:
âWhat if there is a better potential partner you are missing out on?â
Hereâs one more:
Imagine that you are an eighty-year-old. Your beloved family has plans to come over to your place for a cozy evening together. However, you receive a phone call informing you that you have won a once-in-a-lifetime opportunity to spend time with your favourite musician on the same evening. You sit there pondering.
As you can see, the problem we are dealing with here is fractal in nature. Once you learn to spot it, you start seeing it everywhere. It is integral to the process of human decisions. So, what does science have to say about this?
The Explore and Exploit Problem
In computer science and logic, this problem is formally known as âthe explore and exploitâ problem. In natural language, the words âexploreâ and âexploitâ have loaded meanings. However, in the world of computer science and logic, these words come with neutral meanings that are problem and solution relevant.
In this context, exploration refers to the process of gathering new information that can be used later in the future. And exploitation refers to the process of using already gathered information to arrive at a preferred and known result.
Over the years, computer scientists and mathematicians alike have tried their hands at this problem to arrive at the optimal balance between taking risks and relishing what we already know is good. They havenât quite solved the problem for humanity entirely. However, they have solved enough specific problems that fields such as digital product design and clinical trials have immensely profited from the algorithms.
Let us go ahead and cover a couple of these approaches.
Win-Stay and Lose-Shift
Inthe 1950s, a mathematician named Herbert Robbins came up with an âapproximateâ solution to the âexplore/exploitâ problem. His approach came to be known as the âWin-Stay and Lose-Shiftâ strategy, and involved the following simple rules:
1. If a choice gives you a positive experience, make the same choice the next time around.
2. If a choice gives you a negative experience, choose the alternative next time around.
Robbins went on to mathematically show that this approach performs better than chance. That is indeed good news. However, there are a couple of issues with this approach. To understand them, let us consider a hypothetical example.
Imagine that you are choosing between visiting your favourite restaurant and trying out a new fancy restaurant in town. You have had hundreds of positive experiences with your favourite restaurant in the past. One fine evening, you get served a poor meal at your favourite restaurant. According to Robbinsâ algorithm, you should choose not to go to your favourite restaurant because of the poor experience you just had there.
The issue here is that the âwin-stay and lose-shiftâ strategy is memory-less. It does not take into account the number of good experiences you have made with a particular choice in the past. Furthermore, it comes across as a plain greedy âgold-diggingâ strategy with ethical baggage. Surely we could do better than this, right?
Threshold Algorithms to Choose Between Favourite Experiences and Trying New Ones
Gittins Index
While solving a problem of which chemical compound (from a bunch) has the most likelihood to have the desired effect against a disease, mathematician John Gittins came up with an ingenious algorithm. He used the now famous view that positive experiences today are worth more than they are in the future.
For instance, you care more about what you will be having for dinner today than what you will be having for dinner three years from now. In economics and finance, this concept is known as âfuture value discountingâ.
Gittins then imagined that when it comes to explore/exploit decisions, he could treat the cost of switching to the other side as a âbribeâ. He essentially developed an algorithm that answers the following question:
Given a known favoured experience and an unknown novel experience, what is the maximum bribe a rational player would reject in favour of the favoured experience (or the minimum bribe that would cause a shift of loyalty)?
The result was a table of âDynamic Allocation Indicesâ for a given array of situations and options. The âDynamic Allocation Indexâ later came to be famously known as âGittins Indexâ. Although this was an effective approach, it was calculation intensive, which made it impractical for dynamic situations.
Upper Confidence Bound
Following Gittinsâ lead, many other algorithms started popping up. A particularly effective algorithm was the âRegret and Optimismâ or the âUpper Confidence Boundâ algorithm that computed a single regret-index for a given option. Then, the algorithm aimed to minimize regret logarithmically.
Such an algorithm would explore a lot during the beginning of a given time interval, and then exploit a lot more towards the end of the said time interval. With all this said and done, this field continues to develop and thrive even today, especially in the domain of machine learning and artificial intelligence.
But how do we take these rigid and case-specific algorithms and come up with solutions to real-life day-to-day problems?
Real-Life Observations of Explore/Exploit
Imagine that you are visiting a new town or just moved in there. You would instinctually try out new places to build a model of whatâs good and whatâs not. Such a model would come in handy in your time there in the future.
Now imagine that you are leaving a town where you have lived for the past 5 years. You would instinctually avoid trying new places. Instead, you would revisit your favourite places and relish the nostalgia and memories you have made over the years there.
Do you notice the difference between the two situations? To drive home the point, let us revisit the hypothetical example where you are an eighty-year-old choosing between a cozy evening with your family and a date with your favourite musician. You are very likely to choose your family over the musician.
However, imagine that if the same option were presented to you as a teenager. It is a no-brainer: youâd ditch your family and go hang out with your music idol (been there; done that).
How to Really Choose Between Favourite Experiences and Trying New Ones?
The key here is to note the relevance of the time interval in the context of your decision, and your position relative to the limits of this time interval. If you are relatively at the beginning of the time interval (for example, an undergrad trying new courses), explore as much as possible.
If you are relatively near the end of the time interval (for example, an eighty-year-old choosing family over new experiences), exploit the information you have accumulated as much as possible.
The challenge is when you are somewhere around the mid-point of this time interval. In such a case, you are neither here nor there, and choices can be difficult. But then again, they donât call it a mid-life crisis for nothing. In such cases, turning to more complex algorithms might be the only alternative.
If behavioral psychology experiments are to be trusted (refer to the scientific article linked in the references section below), it seems to be the case that human beings tend to over-explore (and under exploit) as compared to the optimal strategy.
We are all seemingly more curious than we ought to be. Knowing this, would you be any less curious? I would not. Why? Because no gain is worth being less human; being less âmeâ.
References and credit: Brian Christian, Tom Griffiths, Amos Tversky and Ward Edwards (scientific article).
If youâd like to get notified when interesting content gets published here, consider subscribing.
Further reading that might interest you: How To Really Avoid P-Value Hacking In Statistics? and How To Perfectly Predict Improbable Events.
If you would like to support me as an author, consider contributing on Patreon.
Comments