Dynamic Programming

October 18th, 2014

In this short post, I sketch the derivation of the dynamic programming principle in discrete time.

Optimization problem

Intertemporal program: \[ V_{t}(x_{t},W_{t})= \underset{(\pmb{\pi}_{[t,T]},\pmb{C}_{[t,T]})}{\text{max}} \; E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})]\] \[\text{s.t.}\] \[\; W_{k+1}=Y_{k+1}+(W_{k}-C_{k})\pi_{k}'R_{k+1},\] \[\; C_{k} \geq 0,\] \[\; W_{T} \geq 0.\]

Additive decomposition

We’ll define: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})=E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})],\] where I have packed all decision variables over \([t,T]\) into the notation \(\pmb{\alpha}_{[t,T]}\): \(\pmb{\alpha}_{[t,T]}=(C_{[t,T]},\pmb{\pi}_{[t,T]})\) and similarly \(\pmb{\alpha_{t}}=(C_{t},\pmb{\pi_{t}})\).
We have: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})= \, E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})],\] \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})= \, \delta^{t}u_{t}(C_{t})+E_{t}[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha}_{[t+1,T]})].\]
The latter equation is the starting point of everything.

You cannot beat backward induction

We have: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]}) \leq \, \delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})],\] \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]}) \leq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\},\] \[V_{t}(x_{t},W_{t}) \leq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}.\]
This can be paraphrased as saying that carrying out the program \(\underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1},w_{t+1})]\}\), if feasible, is necessarily optimal.

You can do it (1)

It remains to be shown that the value of the above program can be arbitrarily approached through an adequate choice of commands. For any \(\varepsilon>0\) , for each \((x_{t+1},W_{t+1})\) we can find a command \(\pmb{\alpha^{\varepsilon}}_{[t+1,T]}\) such that: \[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})\geq V_{t+1}(x_{t+1},W_{t+1})-\varepsilon.\] The dependence of the command on the state variable is ommitted.
But then we can build a command¹ \(\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]}=(\pmb{\alpha}_{t},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})\) for which:

\[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})=\] \[\delta^{t}u_{t}(C_{t})+E_{t}[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})].\] \[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})\geq \, \delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]-\varepsilon.\]

You can do it (2)

This being true for any \(\pmb{\alpha}_{t}=(C_{t},\pmb{\pi}_{t})\), we have:

\[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}-\varepsilon,\] \[V_{t}(x_{t},W_{t})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}-\varepsilon.\]

and since \(\varepsilon>0\) is arbitrary: \[V_{t}(x_{t},W_{t})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}.\]

This complete the proof of the dynamic programming principle.

InvestmentMath