I wonder if it’s possible to bring public opinion into the error function - find weights for ChatGPT such that the next token is predicted correctly but also such that the overall output falls within the public average opinion… But then - is that a “good enough” metric?
The ways to control for algorithmic bias are typically through additional human developed layers to counteract bias present when you ingest large datasets to train. But that’s extremely work intensive. I’ve seen some interesting hypotheticals where algorithms designed specifically to identify bias can be used to tune layers with custom weighting to attempt to pull bias back down to acceptable levels, but even then we’ll probably need to watch how this changes language about groups for which there is bias.
I think the trouble with human oversight is that it’s still going to keep whatever bias the overseer has.
AI is programmed by humans or trained on human data. Either we’re dealing in extremes where it’s impossible to not have bias (which is important framing to measure bias) or we’re talking about how to minimize bias not make it perfect.
I don’t see how misaligning to public opinion = bias.
The public is already hugely biased; we surrender the general education of the entire adult population to news media, social media, and the entertainment industry. Which all sway public perception for their own financial and political gains.
Tbh the “public” as a mass entity is going to be more wrong than a language model; I see who y’all vote for, I wouldn’t trust you with anything 🫠
deleted by creator
Bias shouldn’t exist in a language model. Human beings continue to complicate reality because of boredom.
I’m lost here, what are you trying to say
Given a model is given a complete set of human data, there should be equal amounts of leaning in all directions. Therefore, no bias. If bias is found then the data set is incomplete and/or the person or persons creating the model are only feeding the model their own selection of data. A perfect AI is like Switzerland.
Given a model is given a complete set of human data, there should be equal amounts of leaning in all directions.
This assumes that humans are equally likely to lean in all directions, which is a false assumption. Voices are present as much as they are socially permissible. Voices are unequally represented depending on access. Some voices are absent because historically they were suppressed or persecuted. Even if it was a perfect data set of all human voices that ever existed, it would reflect the societies we create and the behavior we select for.
“Particularly underrepresented groups include Mormons and those over 65…”
What a disaster! I hope someone gets on that ASAP! /s
@Gaywallet I’m coming to think that expecting models to produce human-like values and underlying representations is a mistake, and we should recognize them as cognition tools which are entirely possible to misuse.
Why? LLMs get worse at tasks as you attempt to train them with RLHF - and those with the base models will use them without filtering for a significant intelligence-at-scale advantage. They’ll give the masses the moralized, literally dumber version.