No. It literally cannot count the number of R letters in strawberry. It says 2, there are 3. ChatGPT had this problem, but it seems it is fixed. However if you say “are you sure?” It says 2 again.
Ask ChatGPT to make an image of a cat without a tail. Impossible. Odd, I know, but one of those weird AI issues
It searches the internet for cats without tails and then generates an image from a summary of what it finds, which contains more cats with tails than without.
It doesn’t search the internet for cats, it is pre-trained on a large set of labelled images and learns how to predict images from labels. The fact that there are lots of cats (most of which have tails) and not many examples of things “with no tail” is pretty much why it doesn’t work, though.
It’s not the “where” specifically I’m correcting, it’s the “when.” The model is trained, then the query is run against the trained model. The query doesn’t involve any kind of internet search.
And I care about “how” it works and “what” data it uses because I don’t have to walk on eggshells to preserve the sanctity of an autocomplete software
You need to curb your pathetic ego and really think hard about how feeding the open internet to an ML program with a LLM slapped onto it is actually any more useful than the sum of its parts.
Regardless of training data, it isn’t matching to anything it’s found and squigglying shit up or whatever was implied. Diffusion models are trained to iteratively convert noise into an image based on text and the current iteration’s features. This is why they take multiple runs and also they do that thing where the image generation sort of transforms over multiple steps from a decreasingly undifferentiated soup of shape and color. My point was that they aren’t doing some search across the web, either externally or via internal storage of scraped training data, to “match” your prompt to something. They are iterating from a start of static noise through multiple passes to a “finished” image, where each pass’s transformation of the image components is a complex and dynamic probabilistic function built from, but not directly mapping to in any way we’d consider it, the training data.
so… with all the supposed reasoning stuff they can do, and supposed “extrapolation of knowledge” they cannot figure out that a tail is part of a cat, and which part it is.
Is this some meme?
Non thinking prediction models can’t count the r’s in strawberry due to the nature of tokenization.
However openai o1 and deep seek r1 can both reliably do it correctly
No. It literally cannot count the number of R letters in strawberry. It says 2, there are 3. ChatGPT had this problem, but it seems it is fixed. However if you say “are you sure?” It says 2 again.
Ask ChatGPT to make an image of a cat without a tail. Impossible. Odd, I know, but one of those weird AI issues
I mean I tested it out, even tbough I am sure your trolling me and DeepSeek correctly counts the R’s
Not trolling you at all:
https://lemmy.world/comment/14735060
Because there aren’t enough pictures of tail-less cats out there to train on.
It’s literally impossible for it to give you a cat with no tail because it can’t find enough to copy and ends up regurgitating cats with tails.
Same for a glass of water spilling over, it can’t show you an overfilled glass of water because there aren’t enough pictures available for it to copy.
This is why telling a chatbot to generate a picture for you will never be a real replacement for an artist who can draw what you ask them to.
Not really it’s supposed to understand what a tail is, what a cat is, and which part of the cat is the tail. That’s how the “brain” behind AI works
It searches the internet for cats without tails and then generates an image from a summary of what it finds, which contains more cats with tails than without.
That’s how this Machine Learning progam works
It doesn’t search the internet for cats, it is pre-trained on a large set of labelled images and learns how to predict images from labels. The fact that there are lots of cats (most of which have tails) and not many examples of things “with no tail” is pretty much why it doesn’t work, though.
Unrelated to the convo but for those who’d like a visual on how LLM’s work: https://bbycroft.net/llm
And where did it happen to find all those pictures of cats?
It’s not the “where” specifically I’m correcting, it’s the “when.” The model is trained, then the query is run against the trained model. The query doesn’t involve any kind of internet search.
And I care about “how” it works and “what” data it uses because I don’t have to walk on eggshells to preserve the sanctity of an autocomplete software
You need to curb your pathetic ego and really think hard about how feeding the open internet to an ML program with a LLM slapped onto it is actually any more useful than the sum of its parts.
That isn’t at all how something like a diffusion based model works actually.
So what training data does it use?
They found data to train it that isn’t just the open internet?
Regardless of training data, it isn’t matching to anything it’s found and squigglying shit up or whatever was implied. Diffusion models are trained to iteratively convert noise into an image based on text and the current iteration’s features. This is why they take multiple runs and also they do that thing where the image generation sort of transforms over multiple steps from a decreasingly undifferentiated soup of shape and color. My point was that they aren’t doing some search across the web, either externally or via internal storage of scraped training data, to “match” your prompt to something. They are iterating from a start of static noise through multiple passes to a “finished” image, where each pass’s transformation of the image components is a complex and dynamic probabilistic function built from, but not directly mapping to in any way we’d consider it, the training data.
Oh ok so training data doesn’t matter?
It can generate any requested image without ever being trained?
Or does data not matter when it makes your agument invalid?
Tell me how you moving the bar proves that AI is more intelligent than the sum of its parts?
Oh, that’s another good test. It definitely failed.
There are lots of Manx photos though.
Manx images: https://duckduckgo.com/?q=manx&iax=images&ia=images
so… with all the supposed reasoning stuff they can do, and supposed “extrapolation of knowledge” they cannot figure out that a tail is part of a cat, and which part it is.
The “reasoning” models and the image generation models are not the same technology and shouldn’t be compared against the same baseline.
The “reasoning” you are seeing is it finding human conversations online, and summerizing them
I’m not seeing any reasoning, that was the point of my comment. That’s why I said “supposed”