

Open source isn’t really applicable to LLM models IMO.
There is open weights (the model), and available training data, and other nuances.
They actually went a step further and provided a very thorough breakdown of the training process, which does mean others could similarly train models from scratch with their own training data. HuggingFace seems to be doing just that as well. https://huggingface.co/blog/open-r1
A VPN introduces a new party who can harvest your data. It doesn’t avoid IP tracking, it just shifts it from your ISP to another entity.
You have to trust that your VPN provider’s claims of no logging/tracking are accurate, you can usually get fairly confident with research but it’s never 100%.