This is an idea I’ve been toying with for a bit. There is a ton of media that includes unimportant information that doesn’t need to be stored pixel perfect. Storing large portions of the image data as text will save substantial amounts of storage, and as the reality of on-device image generation becoming commonplace sets in digital memories will become the main way people capture the world around them. I think this will inevitably be the next form of media capture (photography and video), not replacing other methods/ formats, but I could see things like phone cameras having saving images as digital memories set to default to save on storage.


You misunderstand. You take a picture of, say your dad at a family reunion, and in the background the rest of your family is just milling around. That’s not the subject, and so the AI model saves it as “people doing stuff” or whatever. When you load that photograph, the people in the background will be generated, and they won’t be your family.
This is all beside the fact that the AI may decide your subject is different from what you think it is.
This is just an extremely unreliable form of data compression, and extremely unnecessary. Phones and cameras can currently save hundreds or thousands of photographs locally, and cloud storage can save millions for free, and even more for extremely cheap. You’re solving a non-existent problem by shoehorning AI image generation in where it’s not needed.
Imagine going through a photo album, and each time the image is different. Instead of enjoying the photos, you’re looking for what AI changed this time
What you think of as important may change over time, as well - with the solution as written, you’d need to decide what the “subject” is at compress time, but what if you later realise that’s the last ever photo of grandma, or the AI decides that you were wearing different shoes than you actually were. Worst case, you need to rely on some detail in a photo later, like to absolve you of a crime.