Dynamic Programming
In this short post, I sketch the derivation of the dynamic programming principle in discrete time.
Optimization problem
- Intertemporal program: \[ V_{t}(x_{t},W_{t})= \underset{(\pmb{\pi}_{[t,T]},\pmb{C}_{[t,T]})}{\text{max}} \; E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})]\] \[\text{s.t.}\] \[\; W_{k+1}=Y_{k+1}+(W_{k}-C_{k})\pi_{k}'R_{k+1},\] \[\; C_{k} \geq 0,\] \[\; W_{T} \geq 0.\]
Additive decomposition
We’ll define: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})=E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})],\] where I have packed all decision variables over \([t,T]\) into the notation \(\pmb{\alpha}_{[t,T]}\): \(\pmb{\alpha}_{[t,T]}=(C_{[t,T]},\pmb{\pi}_{[t,T]})\) and similarly \(\pmb{\alpha_{t}}=(C_{t},\pmb{\pi_{t}})\).
We have: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})= \, E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})],\] \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})= \, \delta^{t}u_{t}(C_{t})+E_{t}[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha}_{[t+1,T]})].\]
The latter equation is the starting point of everything.
You cannot beat backward induction
We have: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]}) \leq \, \delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})],\] \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]}) \leq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\},\] \[V_{t}(x_{t},W_{t}) \leq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}.\]
This can be paraphrased as saying that carrying out the program \(\underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1},w_{t+1})]\}\), if feasible, is necessarily optimal.
You can do it (1)
It remains to be shown that the value of the above program can be arbitrarily approached through an adequate choice of commands. For any \(\varepsilon>0\) , for each \((x_{t+1},W_{t+1})\) we can find a command \(\pmb{\alpha^{\varepsilon}}_{[t+1,T]}\) such that: \[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})\geq V_{t+1}(x_{t+1},W_{t+1})-\varepsilon.\] The dependence of the command on the state variable is ommitted.
But then we can build a command1 \(\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]}=(\pmb{\alpha}_{t},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})\) for which:
\[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})=\] \[\delta^{t}u_{t}(C_{t})+E_{t}[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})].\] \[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})\geq \, \delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]-\varepsilon.\]
You can do it (2)
This being true for any \(\pmb{\alpha}_{t}=(C_{t},\pmb{\pi}_{t})\), we have:
\[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}-\varepsilon,\] \[V_{t}(x_{t},W_{t})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}-\varepsilon.\]
and since \(\varepsilon>0\) is arbitrary: \[V_{t}(x_{t},W_{t})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}.\]
- This complete the proof of the dynamic programming principle.
Links
Important technical details are ommitted here. Indeed, one has to select an \(\varepsilon\)-optimal command for each value of the state variable, uniformly with respect to the state variable. One cannot hope to do that without strong assumptions regarding the data of the optimization problem.↩︎