original drawn by fanketchup

Imagine you show a picture to the AI, but instead of “seeing” it like you do, the AI turns the picture into a long list of numbers that capture its patterns, colors, shapes, and features.
This list is called an embedding—it’s like a “fingerprint” of the image.
The LLM itself doesn’t look at the raw pixels, it only reads that fingerprint and tries to work from there.
So if the fingerprint didn’t record something clearly (like exactly how many fingers there are), the LLM can’t guess it precisely.

3 Reply

Artist's commentary

LLM, please listen to the question!

How many fingers did the user put in?