• msage@programming.dev
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    2
    ·
    4 hours ago

    Putting Python, the slowest popular language, alongside Java and C# really irks me bad.

    The real benefit of R1 is Mixture of Experts - the model is separated into smaller sections, that are trained and used independently, meaning you don’t need the entire model to be active all the time, just parts of it.

    Meaning it uses less resources during training and general usage. For example instead of 670 billion parameters all the time, it can use 30 billion for specific question, and you can get away with using 2% of the hardware used by competition.