Elon Musk filed a lawsuit in San Franciscoā€™s Superior Court accusing OpenAI and its CEO, Sam Altman, of betraying the startupā€™s initial commitment to openness, the betterment of society, and lack of profit as a motive. Among other things, Muskā€™s 35-page complaint argues that OpenAI has violated its original deal to share its GPT large language models with Microsoft, which stated that the software giant would lose access to new LLMs once OpenAI had achieved AGI. According to the complaint, OpenAI reached that epoch-shifting moment a year ago with GPT-4, its most powerful model to date.

Muskā€”who cofounded OpenAI but left in 2018ā€”is at least as entitled as anyone to come up with his own definition of AGI. His complaint describes it as ā€œa general purpose artificial intelligence systemā€”a machine having intelligence for a wide variety of tasks like a human.ā€ That does sound like GPT-4 as I, a mere layperson, experience it in ChatGPT Plus.

But Muskā€™s declaration that the AGI era is already upon us is hardly the consensus among AI scientists. Even those who think itā€™s not far off predict arrival dates that are least a few years away. And GPT-4 falls well short of meeting OpenAIā€™s own explanation of the term: ā€œA highly autonomous system that outperforms humans at most economically valuable work.ā€

Consider the evidence:

GPT-4 isnā€™t remotely autonomous; indeed, it does its best work when humans provide plenty of hand-holding in the form of detailed prompts. The world is still in the process of figuring out what tasks GPT-4 can do, and we frequently overrate its competence. Thatā€™s not even getting into the fact that OpenAIā€™s reference to ā€œmost economically valuable workā€ suggests that true AGI may involve not just software but also sophisticated robotics that donā€™t exist yet. To guess when OpenAIā€”or a rival such as Google, Anthropic, Meta, Mistral, or Perplexityā€”might reach AGI, as OpenAI defines it, is to expect that itā€™ll be an obvious moment in time. But OpenAIā€™s definition, like all the others, is squishy and difficult to put to a conclusive test. To riff on Supreme Court Justice Potter Stewartā€™s famous comment about pornography, maybe weā€™ll know it when we see it. At the moment, however, Iā€™m convinced that obsessing over AGIā€™s existence or nonexistence is counterproductive.

The whole notion of AGI is predicated on the assumption that AI started out dumber than a human but could someday match or exceed our level of thinking. Already, though, generative AI is different than human intelligenceā€”far closer to omniscient than any individual flesh-and-blood thinker, yet also preternaturally gullible and prone to blurring fact and fiction in ways that donā€™t map to common human frailties. Thatā€™s because itā€™s a predictive engine, trained to string together words without truly understanding them. If its present trajectory of simulated brilliance mixed with boneheadedness continues, it might wander off in a direction far afield from most definitions of AGI.

Even if the world lands on a new, more inclusive definition of AGI, it may be hard to prove whether a particular LLM has attained it. Muskā€™s lawsuit cites proof points of GPT-4ā€™s reasoning power, such as its scoring in the 90th percentile on the Uniform Bar Exam for lawyers and the 99th percentile on the GRE Verbal Assessment. That it can do so is astounding. But acing tests is not synonymous with performing useful work. And even if it were, who gets to decide how many tests an LLM must pass before itā€™s achieved AGI rather than just bobbled somewhere in its vicinity?

For decades, the Turing Testā€”which a computer would pass by fooling a human into thinking that it, too, was humanā€”was computer scienceā€™s beloved thought experiment for determining when AI had gotten real. Strangely enough, itā€™s useless as a tool for assessing todayā€™s LLM-based chatbots. But not because they know too little to fake humanity convincingly, or canā€™t express it glibly enoughā€”but because they betray their artificiality by being so good at churning out endless wordage on more topics than any human knows. AGI could end up in a similar predicament: a benchmark, devised by humans, thatā€™s rendered obsolete by the technology it was meant to measure.

DID YOU HEAR THE ONE ABOUT THE ā€œMAC CAR?ā€ Last week, Appleā€™s long, expensive quest to build an autonomous EV entered its rearview-mirror phaseā€”a sad fate my colleague Jared Newman blamed on the companyā€™s sometimes counterproductive pursuit of perfection. Wondering what an Apple car would be like has been an obsession for techies since 2012, when news broke that Steve Jobs had toyed with getting into the automobile business even before there was an iPhone. Or maybe it started in 2008, when reports of a meeting between Steve Jobs and Volkswagenā€™s CEO led to wild speculation about an ā€œiCar.ā€

Or how about 1998? According to Snopes, thatā€™s when a joke involving cars designed by software companies began spreading like crabgrass across the internet, eventually evolving into an urban legend involving a Bill Gates keynote and a General Motors press release. Along with a Microsoft car that crashed twice a day and occasionally needed its engine replaced for no apparent reason, it mentioned a ā€œMac carā€ that ā€œwas powered by the sun, was reliable, five times as fast, twice as easy to driveā€”but would only run on 5% of the roads.ā€

  • General_Effort@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    Ā·
    7 months ago

    Huh. An emotional subject, indeed. I didnā€™t think merely pointing it out would be enough to trigger you. Sorry for causing you distress. Iā€™m just not good at picking up emotional cues.

    • GiveMemes@jlai.lu
      link
      fedilink
      English
      arrow-up
      1
      Ā·
      edit-2
      7 months ago

      We can do this all day. The subject isnā€™t emotional for me at all. Perhaps youā€™re projecting your own insecurities about the debate onto me?

      Like I really donā€™t understand why you wonā€™t make a point and instead keep acting like an aloof teenager.

      • General_Effort@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        Ā·
        7 months ago

        If I were still a teenager, I would not have worried about causing anyone distress. Iā€™ve had many exchanges with people about matters that touch on the religious or spiritual. Iā€™ve come to understand some things. Some people, if they stop voicing the ā€œrightā€ opinion, they will be disowned by their families and shunned by their communities. Other people have specific ideas about life after death. To them, if anything contradicts these ideas, then itā€™s like they learn that their relatives are dead and they themselves will die soon. To me, all this is just interesting. It seems cruel to expose others to this kind of threat and emotional distress while Iā€™m just sitting here all comfortable. Iā€™m sure it took me way longer than others to understand that.

        I donā€™t know what your situation is. You could have told me not to worry but instead responded rather emotionally. I donā€™t know what to make of that.

        But you want a point. I guess I can do that.


        We need to step back and ask how we know things. In science, itā€™s all about experiments. You try things out. Itā€™s not quite as straight-forward as it seems but we donā€™t need the details. Another way to know things is a legal system. If you want to know whose property something is, science cannot help you. In case of doubt, you have to go to court and get a judgment. There are lots of other ways but we donā€™t need to bother.

        Obscenity is not a matter for science. There is no experiment which can determine if something is or isnā€™t obscene. The courts decide and they use no uniform standard.

        If reason is like obscenity, then it is for the courts to decide or the law-makers.

        • GiveMemes@jlai.lu
          link
          fedilink
          English
          arrow-up
          1
          Ā·
          7 months ago

          I really just donā€™t get why somebody would get emotional over an argument like this but to each their own I suppose. The reason for the emotionality of my reply is rather simply stated: I still donā€™t believe you had any intent to spare anybody ā€˜emotional distressā€™ and were trying to remain aloof and, honestly, rather cunty, by bringing up something literally everybody even mildly interested in AI knows all about as if itā€™s the end all be all of understanding the potential of thinking arising from a machine. On top of that, you purposefully havenā€™t engaged with any of the points directly refuting the things youā€™ve said. Honestly, some of the emotionality comes from when I remember being like you, thinking I knew everything, and whenever somebody would hold me to my words Iā€™d do something along the lines of what youā€™re doing (engaging in argumentative discussion dishonestly in order to maintain the appearance of ā€˜winningā€™ when I really should have been learning more and changing my mind instead of bringing up the same tired pop-culture ā€œsmart peopleā€ bs.)

          Anyway,

          My point wasnā€™t about obscenity. Itā€™s about the nebulousness of something like reason, and the Turing test isnā€™t scientific in the first place, so Iā€™m really not sure where you got all this ā€˜science vs lawā€™ bs from.

          The point wasnā€™t that reason is like obscenity, but that I can clearly see, from the way that we train LLMs, that they arenā€™t reasoning in any form, rather using values that have been derived over time from the training data fed in and the ā€˜rewardā€™ system used to get the right answers over time. An LLM is no more than a complicated calculator, controlled in many ways by the humans that train it, just as with any form of machine learning. Rather that I ā€œknow it when I see itā€

          Iā€™ve read some studies on ā€˜game statesā€™ which is the closest that ai scientists have come to anything resembling reason, but even in a model that played the relatively simple game of Othello, the metric they were testing the AI (which was trained on data of legal Othello boardstates) against to ā€˜proveā€™ that it was ā€˜thinkingā€™ (creating game states) was that it was doing better at choosing legal moves than random chance. Another reason it might have been doing better than random chance? Oh yeahā€¦ the training data full of legal boardstates. And when the AI was trained on less data? Oh? Would you look at that? The margin by which it beats random chance falls drastically. Almost like the LLM has no fucking clue whatā€™s going on and itā€™s just matching boardstatesā€¦ indexing. It doesnā€™t understand the rules of Othello; itā€™s just matching piece placement locations with the legal boardstates it was trained on. A human trained on even a few hundred (vs thousands) of such boardstates could likely start to reason out the rules of the game quite easily.

          Iā€™m not even against AI or anything, but to call the machine learning that we have now anything close to true, thinking AI is just foolish talk.

          • General_Effort@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            Ā·
            7 months ago

            ā€˜science vs lawā€™

            There is no versus. These are examples of how we know things. There are other ways of knowing. I chose these, because they were already brought up. You brought up obscenity as a matter of law, and I alluded to Turing.


            The ā€œTuring Testā€ comes from a scientific mindset. Methodology has evolved since then, and Turing was a mathematician; so perhaps not the best at designing experiments. It has features we would expect today: It is controlled and it is blinded. Today, weā€™d also want a sample size big enough to apply statistics.

            We could apply this thinking to ā€œobscenityā€. For example, we take a bunch of images and have people rate them as obscene or not. This could be a way for sociologists to learn something about community standards. We could also correlate the results to the subjectsā€™ cultural background, age, education and so on. One could also measure EG physiological arousal.

            However, knowing statistically what community members consider obscene is not the same as knowing what is legally obscene (or religiously). If we define obscenity as something that is considered obscene by a certain percentage of a community, then such an experiment would give the answer to what is obscene.

            Turing was interested in the question if machines can think. We can approach this experimentally. We let a machine perform a task that is agreed to require thinking. Humans perform the same task as a control. Then we look for differences. This is basically how a typical medical trial works.

            Scientifically, the only value of such an experiment would be sociological. It could probe how people construe ā€œthinkingā€. Learning the results of such an experiment, may change how people construe thinking, which is just how it goes in social science.

            In practice, we get methodological problems. We get effects from unblinding, for example. People might form an opinion on which the machine is or the human, and then be guided by bias. When that happens, we can no longer make conclusions about ā€œthinkingā€. In practice, the test always becomes a test of whether the machine can successfully pass as a human and not whether it can think. Ideally, we want to isolate a single variable. The only factor that should make a difference is whether thinking took place.

            Philosophically, one can also see problems. The implied assumption is that ā€œthinkingā€ is a function. If a laptop is playing music, we could not be confident that it was streaming. It might be playing a file, have a radio receiver, ā€¦ Some people might say that ā€œthinkingā€ requires some component unknown to science, like a soul. If a soulless entity (such as a machine or animal) were to perform the same task, they would just be computing or reacting to stimuli.


            So, youā€™ve brought up a number of things. Saying that a LLM is just a complicated calculator might be saying that some (non-physical?) component is missing.

            What the paragraph on Othello is saying is not quite clear to me. Training leading to better performance is consistent with reason?

            I think some issues need to be examined a bit more closely. You are interested in whether machines can reason, right? Is that a question that can answered empirically, IE through data, facts, observations and experiment? There must be some observable difference between an entity or being using reason and one that does not.

            Perhaps citizenship is a better analogy than obscenity. Citizenship is not a matter of science, yet a legal system can clearly establish the answer. It might be sufficient to inspect documentation. Establishing ethnicity is more difficult. In many cultures, ethnicity and citizenship are connected, but there often is no authoritative way to establish someoneā€™s ethnicity. There even may be no consensus on which observable features are necessary or sufficient.

            Basically, what are we looking for?

            • GiveMemes@jlai.lu
              link
              fedilink
              English
              arrow-up
              1
              Ā·
              edit-2
              7 months ago

              Isnā€™t this basically just what my comment about the edge of the knowable was and you snarkily replied with the Turing Test?

              Like go watch one of the videos I linked if you havenā€™t. I think theyā€™d be really interesting to you, especially the first one.

              I agree with you tho. What are we looking for is the question to ask. By that same notion, I can say with certainty for myself that what we have doesnā€™t reason, but I canā€™t elaborate on what it might take to make up something that does. Just as with obscenity in that famous SC case.

              To elaborate on the Othello point:

              They tested the LLM with a probe and changed a board piece. They used this change and probed the resultant probability distribution to determine whether or not the AI would change its probability distribution to ā€˜proveā€™ that it was creating world representations of the board. The problem is, and this is what makes it kinda fallacious thinking by the study authors, that if you change the input data of course the output data is going to change. Thatā€™s just a result of training the AI on different legal boardstates, as the way that moves that are made will have a direct result on the placement of the pieces.

              Furthermore, they showed that it outperformed random chance at predicting legal moves, but thatā€™s just the way that training AI works. An LLM is better at predicting the next word than random chance as a result of its training.

              If you donā€™t really get what Iā€™m talking about here I recommend this video: https://m.youtube.com/watch?v=wjZofJX0v4M&vl=en

              • General_Effort@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                Ā·
                7 months ago

                Isnā€™t this basically just what my comment about the edge of the knowable was

                No. Not even close.

                We know what obscenity is. A court will tell you if something is obscene. End of story.

                The problem with the SC quote is, that it is at odds with the rule of law. The meaning of the law must be known. It canā€™t be whatever some judge feels like. US courts use so-called tests to determine - with as much objectivity as possible - if something is meant by a statute or not. Currently, the ā€œMiller testā€ is used for obscenity.

                No true Scotsman is not an illustration of the edge of the knowable but of irrationality.

                To elaborate on the Othello point:

                I donā€™t see what point you are trying to make. A bit of googling leads to this: https://thegradient.pub/othello/

                Is that what you are writing about? You are trying to show that the conclusions are unwarranted? What do you think that would imply?

                • GiveMemes@jlai.lu
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  Ā·
                  edit-2
                  7 months ago

                  The legal system has nothing to do with understanding and everything to do with arbitrarily assigned human bullshit (just like the turing test). While law tends to be rational, itā€™s notoriously shit as a way of understanding the universe. (Live in a fascist country? Well, the lawā€™s the law). I really regret trying to use that quote as an example because youā€™ve ratcheted onto it like a bulldog and simply canā€™t let go.

                  Science is the only way by which we can advance our understanding of the universe. There are cases of unknowable questions in which people use philosophy or religion to try and fill the gap, but they still never actually know, just think.

                  That wasnā€™t the exact study I was referencing, but it is actually better at explaining some of the related concepts both in analogy and in their discussion (a discussion in which, they admit that what they think their findings indicate and what their findings actually indicate could be two different things.)

                  But, to conclude that somehow the multidimensional set of vectors is mapping the board out because when you change part of the input data, even counterfactual input data in which the computer hasnā€™t seen that move before as itā€™s illegal, the output data changes is another huge leap. Of course the data changes, as the patterns change, and the gpt has internalized the patterns in its training data, just as it internalizes syntax and rules of language.

                  I donā€™t think that it really has any meaningful impact if they were incorrect, but if they are correct it could mean that AI is somehow creating a representation of data within itself, which really also wouldnā€™t surprise me.

                  I guess I was more arguing against the guy trying to quote the study at me in the first place than the study itself, though I do have my issues with their analogy bc itā€™s simply clownish to compare a crow to a mathematical construct purposely created to internalize the rules and syntax of language.

                  Also that journal has a high schooler on the board of editorialists and has no name for itselfā€¦ not exactly The Journal of Machine Learning Research lol

                  • General_Effort@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    Ā·
                    7 months ago

                    I really regret trying to use that quote as an example

                    As an example of the unknowable? Perhaps you could elaborate what you feel to be unknowable.

                    Science is the only way by which we can advance our understanding of the universe.

                    Iā€™m actually surprised to read that from you. It doesnā€™t really square with your fairly dismissive attitude toward empiricism.


                    Apparently, you are sure that GPTs canā€™t reason. However, you donā€™t know what reason means. So, IDK how you could possibly know whether anything or anyone is capable of reason.