In this short post, I sketch the derivation of the dynamic programming principle in discrete time.


Optimization problem

  • Intertemporal program: \[ V_{t}(x_{t},W_{t})= \underset{(\pmb{\pi}_{[t,T]},\pmb{C}_{[t,T]})}{\text{max}} \; E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})]\] \[\text{s.t.}\] \[\; W_{k+1}=Y_{k+1}+(W_{k}-C_{k})\pi_{k}'R_{k+1},\] \[\; C_{k} \geq 0,\] \[\; W_{T} \geq 0.\]

Additive decomposition

  • We’ll define: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})=E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})],\] where I have packed all decision variables over \([t,T]\) into the notation \(\pmb{\alpha}_{[t,T]}\): \(\pmb{\alpha}_{[t,T]}=(C_{[t,T]},\pmb{\pi}_{[t,T]})\) and similarly \(\pmb{\alpha_{t}}=(C_{t},\pmb{\pi_{t}})\).

  • We have: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})= \, E_{t}[\sum_{k=t}^{T} \delta^{k}u_{k}(C_{k})],\] \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]})= \, \delta^{t}u_{t}(C_{t})+E_{t}[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha}_{[t+1,T]})].\]

  • The latter equation is the starting point of everything.

You cannot beat backward induction

  • We have: \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]}) \leq \, \delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})],\] \[J(t,x_{t},W_{t},\pmb{\alpha}_{[t,T]}) \leq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\},\] \[V_{t}(x_{t},W_{t}) \leq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}.\]

  • This can be paraphrased as saying that carrying out the program \(\underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1},w_{t+1})]\}\), if feasible, is necessarily optimal.

You can do it (1)

  • It remains to be shown that the value of the above program can be arbitrarily approached through an adequate choice of commands. For any \(\varepsilon>0\) , for each \((x_{t+1},W_{t+1})\) we can find a command \(\pmb{\alpha^{\varepsilon}}_{[t+1,T]}\) such that: \[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})\geq V_{t+1}(x_{t+1},W_{t+1})-\varepsilon.\] The dependence of the command on the state variable is ommitted.

  • But then we can build a command1 \(\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]}=(\pmb{\alpha}_{t},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})\) for which:

\[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})=\] \[\delta^{t}u_{t}(C_{t})+E_{t}[J(t+1,x_{t+1},W_{t+1},\pmb{\alpha^{\varepsilon}}_{[t+1,T]})].\] \[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})\geq \, \delta^{t}u_{t}(C_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]-\varepsilon.\]

You can do it (2)

This being true for any \(\pmb{\alpha}_{t}=(C_{t},\pmb{\pi}_{t})\), we have:

\[J(t,x_{t},W_{t},\pmb{\hat{\alpha}^{\varepsilon}}_{[t,T]})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}-\varepsilon,\] \[V_{t}(x_{t},W_{t})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}-\varepsilon.\]

and since \(\varepsilon>0\) is arbitrary: \[V_{t}(x_{t},W_{t})\geq \, \underset{\pmb{\alpha}_{t}}{\text{max}}\{\delta^{t}u_{t}(c_{t})+E_{t}[V_{t+1}(x_{t+1},W_{t+1})]\}.\]

  • This complete the proof of the dynamic programming principle.

  1. Important technical details are ommitted here. Indeed, one has to select an \(\varepsilon\)-optimal command for each value of the state variable, uniformly with respect to the state variable. One cannot hope to do that without strong assumptions regarding the data of the optimization problem.↩︎