diz@awful.systems to

TechTakes@awful.systemsEnglish · 3 days ago

Gemini 2.5 "reasoning", no real improvement on river crossings.

39

Gemini 2.5 "reasoning", no real improvement on river crossings.

diz@awful.systems to

TechTakes@awful.systemsEnglish · 3 days ago

So I signed up for a free month of their crap because I wanted to test if it solves novel variants of the river crossing puzzle.

Like this one:

You have a duck, a carrot, and a potato. You want to transport them across the river using a boat that can take yourself and up to 2 other items. If the duck is left unsupervised, it will run away.

Unsurprisingly, it does not:

https://g.co/gemini/share/a79dc80c5c6c

https://g.co/gemini/share/59b024d0908b

The only 2 new things seem to be that old variants are no longer novel, and that it is no longer limited to producing incorrect solutions - now it can also incorrectly claim that the solution is impossible.

I think chain of thought / reasoning is a fundamentally dishonest technology. At the end of the day, just like older LLMs it requires that someone solved a similar problem (either online or perhaps in a problem solution pair they generated if they do that to augment the training data).

But it outputs quasi reasoning to pretend that it is actually solving the problem live.

Chat

SGforce@lemmy.ca
link
fedilink
English
arrow-up
1
arrow-down
9·
3 days ago
Bet
- o7___o7@awful.systems
  link
  fedilink
  English
  arrow-up
  12·
  3 days ago
  The accumulated filth of all their slop and murder will foam up about their waists and all the whores and prompt enjoyers will look up and shout: ‘Bet’ - and I’ll whisper ‘no.’
- istewart@awful.systems
  link
  fedilink
  English
  arrow-up
  9·
  3 days ago
  Butt
  
  ( Y )

TechTakes@awful.systems

techtakes@awful.systems

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !techtakes@awful.systems

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

117 users / day
393 users / week
1.25K users / month
5.13K users / 6 months
1 local subscriber
1.74K subscribers
786 Posts
21.8K Comments
Modlog

mods:
David Gerard@awful.systems