When billion-dollar AIs break down over puzzles a child can do, it's time to rethink the hype

e8d79@discuss.tchncs.de · 2 days ago

When billion-dollar AIs break down over puzzles a child can do, it's time to rethink the hype

Oniononon@sopuli.xyz · 5 hours ago

Ai may not be able to do the puzzles but they’re capable of manipulating humans into doing them for them.

SilverShark@lemmy.world · 8 hours ago

When will this end… when will the big news be about things like this, like an AI not being able to beat a chess game from the 70s…

The biggest thing that these trends and over hyped things like AI and Blockchain has shown me is that really the vast majority of people talking about themes really have no idea about anything said themes when it comes down to it.

It would be great, but this really affects people who actually do the work and have to put up with this.

Clent@lemmy.dbzer0.com · 11 hours ago

The Tower of Hanoi is a classic game with three pegs and multiple discs, in which you need to move all the discs on the left peg to the right peg, never stacking a larger disc on top of a smaller one. With practice, though, a bright (and patient) seven-year-old can do it.

What Apple found was that leading generative models could barely do seven discs, getting less than 80% accuracy, and pretty much can’t get scenarios with eight discs correct at all.

It’s funny because I created a scheme program to do this as a college course assignemnt many years ago.

It was the first of many increasingly complex assignments in an AI course. This was first because it has very basic logic requirements.

Alpha71@lemmy.world · 9 hours ago

My question is: when AI finally crashes and burns, what happens to Nvidia?

Communist@lemmy.frozeninferno.xyz · 7 hours ago

Not much, llm’s and other gen ai crashing and burning change nothing about the fact that ML in general is doing crazy shit. Protein folding getting solved in our lifetimes is insane and there are still way more legit applications, nvidia will be fine.

Spacehooks@reddthat.com · 8 hours ago

Id imagine nothing but we all get cheaper GPUs on a overstock sale for a bit.

yesman@lemmy.world · 14 hours ago

LLMs are just the average of internet-accessible writings and videos plus whatever custom training data provided.

That causes two problems. One, the proliferation of AI content is reducing the average quality of internet-accessible data. It’s also actively smoothing out the creative works of humans; lowering the average again.

4am@lemm.ee · 2 days ago

Why did anyone think that a LLM would be able to solve logic or math problems?

They’re literally autocomplete. Like, 100% autocomplete that is based on an enormous statistical model. They don’t think, they don’t reason, they don’t compute. They lay words out in the most likely order.

To be fair it’s pretty amazing they can do that from a user prompt - but it’s not doing whatever it is that our brains do. It’s not a brain. It’s not “intelligent”. LLMs are machine learning algorithms but they are not AI.

It’s a fucking hornswoggle, always has been 🔫🧑‍🚀

fodor@lemmy.zip · 9 hours ago

Many people in the world, they don’t know the difference between an expert system and an LLM. Or, to phrase it a different way, many people think that AI is equivalent to generative AI.

I think that’s largely a result of marketing bullshit and terrible reporting. Of course it would be good if people could educate themselves, but to some degree we expect that the newspaper won’t totally fail us, and then when it does, people just don’t realize they got played.

On a personal note, I’m a teacher, and some of my colleagues are furious that our students are using grammar checkers because they think grammar checkers are AI, and they think grammar checkers were invented in the last 3 years. It’s really wild because some of these colleagues are otherwise smart people who I’m certain have personal experience with Microsoft Word 20 years ago, but they’ve blocked it out of their mind, because somehow they’re afraid that all AI is evil.

WanderingThoughts@europe.pub · edit-2 1 day ago

They got very good results with just making the model bigger and train it on more data. It started doing stuff that was not programmed in the thing at all, like writing songs and having conversations, the sort of thing nobody expected an autocomplete to do. The reasoning was that if they keep making it bigger and feed it even more days, that the line would keep going up. The the fanboys believed it, investors believed it and many business leaders believed it. Until they ran out of data and datacenters.

lime!@feddit.nu · 16 hours ago

it’s such a weird stretch, honestly. songs and conversations are not different to predictive text, it’s just more of it. expecting it to do logic after ingesting more text is like expecting a chicken to lay kinder eggs just because you feed it more.

WanderingThoughts@europe.pub · 5 hours ago

It helped that this advanced autocorrect could get high scores on many exams at university level. That might also mean the exams don’t test logic and reasoning as well as the teachers think they do.

Kogasa@programming.dev · edit-2 10 hours ago

Not necessarily do logic, but mimic it, like it can mimic coherent writing and basic conversation despite only being a statistical token muncher. The hope is that there’s sufficient information in the syntax to model the semantics, in which case a sufficiently complex and well-trained model of the syntax is also an effective model of the semantics. This apparently holds up well for general language tasks, meaning “what we mean” is well-modeled by “how we say it.” It’s plausible, at face value, that rigorous argumentation is also a good candidate, which would give language models some way of mimicking logic by talking through a problem. It’s just not very good in practice right now. Maybe a better language model could do better, maybe not for a reasonable cost.

Daniel Quinn@lemmy.ca · 2 days ago

Because that’s how they’re marketed and hyped. “The next version of ChatGPT will be smarter than a Nobel laureate” etc. This article is an indictment of the claims these companies make.

ChapulinColorado@lemmy.world · 2 days ago

So fraud. It would be nice to get another FTX verdict at the very least. It could make those shit CEOs thinking twice before lying to peoples faces if it means years in prison.

Optional@lemmy.world · 2 days ago

In this administration? heh.

ignirtoq@fedia.io · 2 days ago

My running theory is that human evolution developed a heuristic in our brains that associates language sophistication with general intelligence, and especially with humanity. The very fact that LLMs are so good at composing sophisticated sentences triggers this heuristic and makes people anthropomorphize them far more than other kinds of AI, so they ascribe more capability to them than evidence justifies.

I actually think this may explain some earlier reporting of some weird behavior of AI researchers as well. I seem to recall reports of Google researchers believing they had created sentient AI (a quick search produced this article). The researcher was fooled by his own AI not because he drank the Koolaid, but because he fell prey to this neural heuristic that’s in all of us.

joel_feila@lemmy.world · 10 hours ago

Yeah it has a name. The more you talk the more people Believe you are smart. It partly based on the tendency to believe what we hear first and then we check if it is.

Optional@lemmy.world · 2 days ago

I think you’re right about that.

It didn’t help that The Average Person has just shy of absolutely zero understanding of how computers work despite using them mostly all day every day.

Put the two together and it’s a grifter’s dream.

Aceticon@lemmy.dbzer0.com · edit-2 1 day ago

IMHO, if one’s approach to the world is just - take it as it is and go with it - then probabilistic parrots creating the perceived elements of reality will work on that person because that’s what they use to decide what to do next, but if one has an analytical approach to the world - wanting to figure out what’s behind the façade to understand it and predict what might happen - then one will spot that the “logic” behind the façades created by the probabilistic parrots is segmented into little pieces of logic which are do not matched to the other little pieces of logic and do not add up to a greater building of logic (phrases are logic because all phrases have an inherent logic in how they are put together which is general, but the choice of which phrases get used in a higher logic which is far more varied than the logic inherent in phrases, so LLMs lose consistency at that level because the training material goes in a lot more directions at that level than it goes at the level of how phrases are put together).

null_dot@lemmy.dbzer0.com · 1 day ago

I don’t think the mechanisms of evolution are necessarily involved.

We’re just not used to interacting with this type of pseudo intelligence.

ignirtoq@fedia.io · 1 day ago

My point is that this kind of pseudo intelligence has never existed on Earth before, so evolution has had free reign to use language sophistication as a proxy for humanity and intelligence without encountering anything that would put selective pressure against this heuristic.

Human language is old. Way older than the written word. Our brains have evolved specialized regions for language processing, so evolution has clearly had time to operate while language has existed.

And LLMs are not the first sophisticated AI that’s been around. We’ve had AI for decades, and really good AI for a while. But people don’t anthropomorphize other kinds of AI nearly as much as LLMs. Sure, they ascribe some human like intelligence to any sophisticated technology, and some people in history have claimed some technology or another is alive/sentient. But with LLMs we’re seeing a larger portion of the population believing that that we haven’t seen in human behavior before.

null_dot@lemmy.dbzer0.com · 17 hours ago

so evolution has had free reign to use language sophistication as a proxy for humanity and intelligence

my point is, evolution doesn’t need to be involved in this paradigm. it could just be something children learn - this thing talks and is therefore more interactive than this other thing that doesn’t talk.

Additionally, at the time in pre-history when assessing the intelligence of something could determine your life or death and thereby ability to reproduce, language may not have been a great indicator of intelligence. For example, if you encounter a band of whatever hominid encroaching on your territory, there may not be a lot of talking. You would know they were intelligent because they might have clothing or tools, but it’s likely nothing would be said before the spears started to be thrown.

some_guy@lemmy.sdf.org · 2 days ago

If you’re not yet familiar with Ed Zitron, I think you’d enjoy either his newsletter or his podcast (or both).

Goodmorningsunshine@lemmy.world · 2 days ago

And the obscene levels of water waste when we were already facing a future of scarcity. Can we please stop destroying economies, ecologies, and lives for this now?

6nk06@sh.itjust.works · 2 days ago

But how will you be able to auto-complete this “2 sentences long email” to your team at work without killing humanity?

tuhriel@infosec.pub · 1 day ago

So…in the future it is “this e-mail could have been a meeting”

jsomae@lemmy.ml · 1 day ago

Admittedly, I also can’t solve Towers of Hanoi.

YesButActuallyMaybe@lemmy.ca · 1 day ago

First you take the Shrek and the cabbage, you leave Shrek and Return for the Wolf. You leave the cabbage and take the Wolf. Now you take Shrek to the cabbage and bring both to the wolf. Now you row into the sunset, your work here is done.

some_guy@lemmy.sdf.org · 2 days ago

The tide is turning. I can’t wait to see it all come crashing down.

Ledericas@lemm.ee · 1 day ago

it will once all the VC funds dries up, and the companies desparetely staving off that debt by enshittifying that service.

Kairos@lemmy.today · 2 days ago

The time for that was two years ago actually

Tartas1995@discuss.tchncs.de · edit-2 2 days ago

Hey for anyone who doesn’t know how to solve tower of Hanoi, there is a simple algorithm.

1.  |.  |
2.  |.  |
3.  |.  |

Let’s say, we want to move the center rod.

Count the stack of disks that you need to move: e.g. 3

If it is even, start with placing the first disk on the spot that you don’t want to move the tower to. If it is odd, start with placing the first disk on the spot that you want to move the tower to.


|.  |.  |
2.  |.  |
3.  1.  |

|.  |.  |
|.  |.  |
3.  1.  2


|.  |.   |
|.  |.   1
3.  |.   2


|.   |.  |
|.   |.  1
|.   3.  2

Now the 2 stack is basically a new Hanoi tower.

That tower is even and we start with placing the first disk on the spot that we don’t want to land on


|.  |.  |
|.  |.  |
1.  3.  2


|.  |.  |
|.  2.  |
1.  3.  |

|.  1.  |
|.  2.  |
|.  3.  |

And we solved the tower. It is that easy

ZDL@lazysoci.al · 2 days ago

Now do it for a stack of 12.

Tartas1995@discuss.tchncs.de · 1 day ago

It works the same way.

1  |  |
2  |  |
3  |  |
4  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 |  |
12 |  |

|  |  |
2  |  |
3  |  |
4  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 |  |
12 |  1

|  |  |
|  |  |
3  |  |
4  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 |  |
12 2  1

|  |  |
|  |  |
3  |  |
4  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 1  |
12 2  |

|  |  |
|  |  |
|  |  |
4  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 1  |
12 2 3

|  |  |
|  |  |
1  |  |
4  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 |  |
12 2  3

|  |  |
|  |  |
1  |  |
4  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 |  2
12 |  3

|  |  |
|  |  |
|  |  |
4  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  1
11 |  2
12 |  3

|  |  |
|  |  |
|  |  |
|  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  1
11 |  2
12 4  3

|  |  |
|  |  |
|  |  |
|  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 1  2
12 4  3


|  |  |
|  |  |
|  |  |
2  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 1  |
12 4  3

|  |  |
|  |  |
1  |  |
2  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 |  |
12 4  3

|  |  |
|  |  |
1  |  |
2  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 |  |
11 3  |
12 4  |

|  |  |
|  |  |
|  |  |
|  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  |  |
10 2  |
11 3  |
12 4  1

|  |  |
|  |  |
|  |  |
|  |  |
5  |  |
6  |  |
7  |  |
8  |  |
9  1  |
10 2  |
11 3  |
12 4  |

|  |  |
|  |  |
|  |  |
|  |  |
|  |  |
6  |  |
7  |  |
8  |  |
9  1  |
10 2  |
11 3  |
12 4  5

And so on… As you can see, when there was a 3 stack in the right pole, and moved the 4, for the solution to create space for the 5, we move the 3 stack as if it is just a 3 tower and we will end up with a 4 stack tower, allowing us to move the 5 and now we need to move the 4 stack on the 5. As the 4 stack is even, we would start by moving the 1 to the left stack, placing the 2 on the 5 and then the 1 on the 2, creating a 2 stack, now we can move the 3 on the left pole, now we solve the 2 stack, as it is even, we move the 1 to the 4 and the 2 on the 3 and then the 1 on the 2, creating a 3 stack and allowing us to move the 4 onto the 5. Now we solve the 3 stack onto the 4. It is odd, so we solve as we solve the previous 3 tower.

A bigger tower is just solving a 1 smaller tower basically twice.

So for solving 12, you solve 11 and move the 12 to the spot, to solve 11 again. To solve 11, you solve 10 and move the 11 and solve 10 again…

ZDL@lazysoci.al · 1 day ago

I was mostly hoping to have you print out all 1000 or so moves as a prank. :D

Kogasa@programming.dev · edit-2 8 hours ago

1000 or so moves

Ok, this nerd-snipes me a bit. The real point of the Towers of Hanoi problem, at least if you’re doing it in a discrete math class, is to count exactly the number of moves for n discs. Call this number f(n).

To make this post easier to read, we will label the discs from 1 to n, with 1 being the bottom disc and n being the top. We will also label the pegs A, B, and C, with A being the starting peg and C being the target peg. This lets us describe moves in a compact notation: 3: A -> C means disc 3 moves from A to C.

For n=0, we vacuously need 0 moves to win.

For n=1, we need just 1 move: 1:A->C.

For n=2, we can do it in 3 moves, so f(2) = 3. The sequence is:

2: A->B # unblock the next move
1: A->C # move 1 to its final location
2: B->C # move 2 to its final location

Now suppose we know f(n) for n between 1 and N, with N >= 2. For n=N+1, we can move N+1: A -> C and attempt to use our strategy for moving the remaining N discs from A to C, shuffling N+1 around as needed. Since it’s the smallest disc, we can always move it to any pillar, and any time we want to move something to the pillar it’s on, it must be moved first. Let’s see how that plays out for N=2:

3: A->C # move the N+1 disc
2: A->B # start the N-disc strategy
3: C->B # unblock the next step
1: A->C # next step of N-disc strategy
3: B->A # unblock the next step
2: B->C # last step of N-disc strategy
3: A->C # finish the N+1-disc problem

We hazard a guess that every step of the N-disc strategy is preceded by a move of the N+1 disc, plus one final move. In other words, f(N+1) = 2f(N) + 1. Some careful analysis will justify this guess.

Now we have a recurrence relation for f(n), but how to solve it? A classical technique is “guess the formula magically, then prove it by induction.” It’s certainly doable here if you compute a few values by hand:

f(2) = 3

f(3) = 2(3) + 1 = 7

f(4) = 2(7) + 1 = 15

f(5) = 2(15) + 1 = 31

f(6) = 2(31) + 1 = 63

You’ll probably see it right away: f(n) = 2^n - 1. Indeed, we can prove this by induction: the base case n=2 holds as f(2) = 3 = 2^(2) - 1, and if f(n) = 2^n - 1 then f(n+1) = 2f(n) + 1 = 2(2^n - 1) + 1 = (2^(n+1) - 2) + 1 = 2^(n+1) - 1 as desired.

We may conclude f(12) = 2^(12) - 1 = 4095. In other words, with 12 discs, exactly 4095 moves are required.

–

Bonus: an alternative to the “guess and prove” technique that is generalizable to a broad class of recurrence relations. The technique is called “generating functions.” Given the sequence f(n), we consider the formal power series

F(x) = f(0) + f(1)x + f(2)x^(2) + … + f(n)x^(n) + …

This F(x) is the “ordinary generating function” for the sequence f(n). Strictly speaking, it may not be a well-defined function of x since we haven’t made any assumptions about convergence, but we will assume (for now, and prove later) that the set of such formal power series behaves algebraically much like we expect. Namely, given the recurrence relation above, we can write:

F(x) - f(0) - f(1)x = f(2)x^(2) + f(3)x^(3) + f(4)x^(4) + … + f(n)x^n + …

= (2f(1) + 1)x^(2) + (2f(2) + 1)x^(3) + … + (2f(n-1) + 1)x^(n) + …

= 2x(f(1)x + f(2)x^(2) + … + f(n)x^(n)) + (x^(2) + x^(3) + … + x^(n) + …)

= 2x(F(x) - f(0)) + x^(2)/(1-x)

In our case, we have f(0) = 0, f(1) = 1, f(2) = 3 so we can write more succinctly:

F(x) - x = 2xF(x) + x^(2)/(1-x)

Solving for F,

F(x)(1 - 2x) = x + x^(2)/(1-x)

= x(1 + x/(1-x))

F(x) = x(1 + x/(1-x))/(1 - 2x)

= x/(2x^(2) - 3x + 1)

Ok, great. We’ve found that our generating function, convergence notwithstanding, is that rational function. We can use partial fraction decomposition to write it as

F(x) = 1/(1 - 2x) - 1/(1-x)

which has the advantage of telling us exactly how to compute the coefficients of the Taylor series for F(x). Namely,

1/(1-x) = 1 + x + x^(2) + … + x^(n) + …

1/(1 - 2x) = 1 + 2x + 4x^(2) + … + 2^(n) x^(n) + …

So F(x) = (1-1) + (2-1)x + (4-1)x^(2) + … + (2^(n)-1)x(n) + …

The nth coefficient of the Taylor series for F about 0 is 2^(n)-1, and by the definition of F as the ordinary generating function for f(n), we have f(n) = 2^(n) - 1. (The rigorous justification for ignoring convergence here still needs to be done; for now, this can be seen as a useful magic trick.)

LordOfLocksley@lemmy.world · 2 days ago

Anyone got a version of the article that doesn’t require me paying them so they won’t track me across the Internet?

teft@lemmy.world · 2 days ago

A research paper by Apple has taken the tech world by storm, all but eviscerating the popular notion that large language models (LLMs, and their newest variant, LRMs, large reasoning models) are able to reason reliably. Some are shocked by it, some are not. The well-known venture capitalist Josh Wolfe went so far as to post on X that “Apple [had] just GaryMarcus’d LLM reasoning ability” – coining a new verb (and a compliment to me), referring to “the act of critically exposing or debunking the overhyped capabilities of artificial intelligence … by highlighting their limitations in reasoning, understanding, or general intelligence”.

Apple did this by showing that leading models such as ChatGPT, Claude and Deepseek may “look smart – but when complexity rises, they collapse”. In short, these models are very good at a kind of pattern recognition, but often fail when they encounter novelty that forces them beyond the limits of their training, despite being, as the paper notes, “explicitly designed for reasoning tasks”.

As discussed later, there is a loose end that the paper doesn’t tie up, but on the whole, its force is undeniable. So much so that LLM advocates are already partly conceding the blow while hinting at, or at least hoping for, happier futures ahead.

In many ways the paper echoes and amplifies an argument that I have been making since 1998: neural networks of various kinds can generalise within a distribution of data they are exposed to, but their generalisations tend to break down beyond that distribution. A simple example of this is that I once trained an older model to solve a very basic mathematical equation using only even-numbered training data. The model was able to generalise a little bit: solve for even numbers it hadn’t seen before, but unable to do so for problems where the answer was an odd number.

More than a quarter of a century later, when a task is close to the training data, these systems work pretty well. But as they stray further away from that data, they often break down, as they did in the Apple paper’s more stringent tests. Such limits arguably remain the single most important serious weakness in LLMs.

The hope, as always, has been that “scaling” the models by making them bigger, would solve these problems. The new Apple paper resoundingly rebuts these hopes. They challenged some of the latest, greatest, most expensive models with classic puzzles, such as the Tower of Hanoi – and found that deep problems lingered. Combined with numerous hugely expensive failures in efforts to build GPT-5 level systems, this is very bad news.

The Tower of Hanoi is a classic game with three pegs and multiple discs, in which you need to move all the discs on the left peg to the right peg, never stacking a larger disc on top of a smaller one. With practice, though, a bright (and patient) seven-year-old can do it.

What Apple found was that leading generative models could barely do seven discs, getting less than 80% accuracy, and pretty much can’t get scenarios with eight discs correct at all. It is truly embarrassing that LLMs cannot reliably solve Hanoi.

And, as the paper’s co-lead-author Iman Mirzadeh told me via DM, “it’s not just about ‘solving’ the puzzle. We have an experiment where we give the solution algorithm to the model, and [the model still failed] … based on what we observe from their thoughts, their process is not logical and intelligent”.

The new paper also echoes and amplifies several arguments that Arizona State University computer scientist Subbarao Kambhampati has been making about the newly popular LRMs. He has observed that people tend to anthropomorphise these systems, to assume they use something resembling “steps a human might take when solving a challenging problem”. And he has previously shown that in fact they have the same kind of problem that Apple documents.

If you can’t use a billion-dollar AI system to solve a problem that Herb Simon (one of the actual godfathers of AI) solved with classical (but out of fashion) AI techniques in 1957, the chances that models such as Claude or o3 are going to reach artificial general intelligence (AGI) seem truly remote.

So what’s the loose thread that I warn you about? Well, humans aren’t perfect either. On a puzzle like Hanoi, ordinary humans actually have a bunch of (well-known) limits that somewhat parallel what the Apple team discovered. Many (not all) humans screw up on versions of the Tower of Hanoi with eight discs.

But look, that’s why we invented computers, and for that matter calculators: to reliably compute solutions to large, tedious problems. AGI shouldn’t be about perfectly replicating a human, it should be about combining the best of both worlds; human adaptiveness with computational brute force and reliability. We don’t want an AGI that fails to “carry the one” in basic arithmetic just because sometimes humans do.

Whenever people ask me why I actually like AI (contrary to the widespread myth that I am against it), and think that future forms of AI (though not necessarily generative AI systems such as LLMs) may ultimately be of great benefit to humanity, I point to the advances in science and technology we might make if we could combine the causal reasoning abilities of our best scientists with the sheer compute power of modern digital computers.

What the Apple paper shows, most fundamentally, regardless of how you define AGI, is that these LLMs that have generated so much hype are no substitute for good, well-specified conventional algorithms. (They also can’t play chess as well as conventional algorithms, can’t fold proteins like special-purpose neurosymbolic hybrids, can’t run databases as well as conventional databases, etc.)

What this means for business is that you can’t simply drop o3 or Claude into some complex problem and expect them to work reliably. What it means for society is that we can never fully trust generative AI; its outputs are just too hit-or-miss.

One of the most striking findings in the new paper was that an LLM may well work in an easy test set (such as Hanoi with four discs) and seduce you into thinking it has built a proper, generalisable solution when it has not.

To be sure, LLMs will continue to have their uses, especially for coding and brainstorming and writing, with humans in the loop.

But anybody who thinks LLMs are a direct route to the sort of AGI that could fundamentally transform society for the good is kidding themselves.

This essay was adapted from Gary Marcus’s newsletter, Marcus on AI

Gary Marcus is a professor emeritus at New York University, the founder of two AI companies, and the author of six books, including Taming Silicon Valley

Cheradenine@sh.itjust.works · 2 days ago

https://archive.vn/YUdhb

Are you on Safari? Firefox based browsers don’t seem to have this issue on Android, Windows, Linux (even Arch BTW)

LordOfLocksley@lemmy.world · 2 days ago

Phone browser

4am@lemm.ee · 2 days ago

Cool, that narrows it down to “all of the above”

can@sh.itjust.works · 2 days ago

So the default? But which phone?

LordOfLocksley@lemmy.world · 2 days ago

Samsung A52

can@sh.itjust.works · 2 days ago

So Samsung Internet? Have you tried installing Firefox Mobile?

LordOfLocksley@lemmy.world · 2 days ago

I have not. I’ll give it a shot

can@sh.itjust.works · 2 days ago

It’s worth it for uBlock Origin alone

Mearuu@kbin.melroy.org · 2 days ago

You can visit all the websites without tracking if you use https://mullvad.net/en/browser. There are other options such extensions for Firefox but I think Mullvad browser offers even more protection.

ZDL@lazysoci.al · 2 days ago

So what you’re saying is you don’t want to actually answer the question asked.

Check.

zr0@lemmy.dbzer0.com · 2 days ago

“When the car does not fly, it is time to rethink the car-hype”

Hawk@lemmy.dbzer0.com · 2 days ago

Cars are not marketed and expected to fly.

zr0@lemmy.dbzer0.com · 2 days ago

Oh, and Large Language Models say where exactly, that they are good at solving mathematical problems?

Tartas1995@discuss.tchncs.de · 5 hours ago

Some of the tested models were specifically “reasoning” LLM models. Please tell me that the “reasoning” model is not intended to be used for “reasoning”. Please.

pulsewidth@lemmy.world · 2 days ago

Here’s Microsoft advertising Copilot “makes solving maths problems a breeze” with an entire how-to article.

https://www.microsoft.com/en-us/microsoft-copilot/for-individuals/do-more-with-ai/learning-and-education/how-to-use-ai-for-math-calculations

zr0@lemmy.dbzer0.com · 2 days ago

That’s not the LLM solving the problems. The LLM understands the user request and has the ability to use a math solver. Additionally: “You can also integrate Copilot with the Wolfram Alpha plug-in to gain access to vast computational, mathematical, and scientific knowledge.”

pulsewidth@lemmy.world · 2 days ago

Cool, and anyone can just access Wolfram Alpha directly - as they’ve been able to for like 15 years - and make fewer mistakes than they would by using it via an LLM.

The LLM (Copilot) is being advertised as a maths tutor. It’s not good at maths. It doesnt understand anything about maths. All it can do is send your query through to Wolfram Alpha and then spit it back to you.

Hype.

When billion-dollar AIs break down over puzzles a child can do, it's time to rethink the hype

When billion-dollar AIs break down over puzzles a child can do, it's time to rethink the hype

When billion-dollar AIs break down over puzzles a child can do, it's time to rethink the hype | Gary Marcus