Suppose that you would like to trade in a certain type of stock. You have analyzed the financial
market and found out that the stock you are interested in can always be at one of 4 different levels
(from 1 to 4) at any given time step. At each level, the buy price and the sell price of a single stock
differs, and are shown below. Also, the probabilities of transitioning to other levels in the next time
step are indicated below. Initially, you start with 1 unit of money and 0 stocks, and the initial level of
the stock at the market is Level 2. Your goal is to maximize the amount of money and the number of
stocks that you own. At each time step you have 3 possible actions to choose among: buying 1 stock
(if you have enough money), selling 1 stock (if you own at least 1 stock), or doing nothing (waiting
for next time step).
Level 1: Buy Price = 2, Sell Price = 1, 𝑃(𝐿4|𝐿1) = 0.1, 𝑃(𝐿3|𝐿1) = 0.2, 𝑃(𝐿2|𝐿1) = 0.4, 𝑃(𝐿1|𝐿1) = 0.3
Level 2: Buy Price = 1, Sell Price = 1, 𝑃(𝐿4|𝐿2) = 0.1, 𝑃(𝐿3|𝐿2) = 0.4, 𝑃(𝐿2|𝐿2) = 0.3, 𝑃(𝐿1|𝐿2) =
0.2
Level 4: Buy Price = 1, Sell Price = 2, 𝑃(𝐿4|𝐿4) = 0.3, 𝑃(𝐿3|𝐿4) = 0.4, 𝑃(𝐿2|𝐿4) = 0.2, 𝑃(𝐿1|𝐿4) =
0.1
Level 3: Buy Price = 1, Sell Price = 2, 𝑃(𝐿4|𝐿3) = 0.2, 𝑃(𝐿3|𝐿3) = 0.4, 𝑃(𝐿2|𝐿3) = 0.3, 𝑃(𝐿1|𝐿3) =
0.1
Model the given scenario as a Markov Decision Process (MDP) according to the additional
information given below, and by using the software framework provided to you, decide what a
trader should rationally and optimally do when s/he is in any state.