TODO T sampled from random uniform, like hurst T∼U(100,550) prioritized experience replay instead of 100, ..., 550 for T do 2k each model iferred on all other T‘s finish rewrite in C sweep hyperparams in C