• Aceticon@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 days ago

    It varies massivelly depending on the ML.

    For example things like voice generation or object recognition can absolutelly be done with entirelly legit training datasets - literally pay a bunch of people to read some texts and you can train a voice generation engine with it and the work in object recognition is mainly tagging what’s in the images on top of a ton of easilly made images of things - a researcher can literally go around taking photos to make their dataset.

    Image generation, on the other hand, not so much - you can only go so far with just plain photos a researcher can just go around and take on the street and they tend to relly a lot on artistic work of people who have never authorized the use of their work to train them, and LLMs clearly cannot be do without scrapping billions of pieces of actual work from billions of people.

    Of course, what we tend to talk about here when we say “AI” is LLMs, which are IMHO the worst of the bunch.