Jegyzetek kincstára

❯

❯

❯

2025.08.27 meeting notes

2025.08.27 meeting notes

Apr 20, 20261 min read

Where are we now?

Where are we at with the code

it works?

Where are we at with the project

Tanito algo pythonban van
Environment C-ben
Config .ini-ben
Custom fbm generalas C-ben, this is the biggest pain point probably

Experiments

fbm worked with forced contrarian reward: https://wandb.ai/leonardotoffalini-e-tv-s-lor-nd-university/pufferlib/runs/s2zmd7tl?nw=nwuserleonardotoffalini
fbm worked with delta riskless reward: https://wandb.ai/leonardotoffalini-e-tv-s-lor-nd-university/pufferlib/runs/2gpqre2a?nw=nwuserleonardotoffalini
there is something funky with liquidation which i dont understand:
- https://wandb.ai/leonardotoffalini-e-tv-s-lor-nd-university/pufferlib/runs/5bkt2t7g?nw=nwuserleonardotoffalini
- https://wandb.ai/leonardotoffalini-e-tv-s-lor-nd-university/pufferlib/runs/i0pof87e?nw=nwuserleonardotoffalini

DOING

generate static dataset of fbm trajectories with python, then sample from this db from c env

TODO

increase action space from -1, 0, 1 to ${- 10, \dots, 0, \dots, 10}$
- results: https://wandb.ai/leonardotoffalini-e-tv-s-lor-nd-university/pufferlib?nw=nwuserleonardotoffalini
instead of delta riskless do terminal riskless reward, see if it helps out with liquidation
instead of returning pre terminal reward in the last step, try out to retrun terminal reward
if something useful works run a sweep on it
run the experiments which return positive results on higher total_steps, bc they were dubious if they falttened out

Where are we now?
Experiments

Created with Quartz v4.4.0 © 2026

GitHub