If you want to make the most out of a world increasingly filled with AI tools, here’s a habit to develop: start taking screenshots. Lots of screenshots. Of anything and everything. Because for all the talk of voice modes, omnipresent cameras, and the multimodal future of everything, there might be no more valuable digital behavior than to press the buttons and save what you’re looking at.
Screenshots are the most universal method of capturing digital information. You can capture anything — well, almost anything, thanks a lot, Netflix! — with a few clicks, and save and share it to almost any device, app, or person. “It’s this portable data format,” says Johnny Bree, the founder of the digital storage app Fabric. “There’s nothing else that’s quite so portable that you can move between any piece of software.”
A screenshot contains a lot of information, like its source, contents, and even the time of the day in the corner of the screen. Most of all, it sends a crucial and complex signal; it says I care about this. We have countless new AI tools that aim to watch the world, our lives, and everything, and try to make sense of it all for us. These tools are mostly crap for lots of reasons but mostly because AI is pretty good at knowing what things are, but it’s rubbish at knowing whether they matter. A screenshot assigns value and tells the system it needs to pay attention.
Screenshots also put you, the user, in control in an important way. “If I give you access to all of my emails, all my WhatsApps, everything, there’s a lot of noise,” says Mattias Deserti, the head of smartphone marketing at Nothing. There’s simply no reason to save every email you receive or every webpage you visit — and that’s to say nothing of the privacy implications. “So what if, instead, you were able to start training the system yourself, feeding the system the information you want the system to know about you?” Rather than a tool like Microsoft Recall, which asks for unlimited access to everything, starting with screenshots lets you pick what you share.
Until now, screenshots have been a fairly blunt instrument. You snap one, and it gets saved to your camera roll, where it probably languishes, forgotten, until the end of time. (And don’t get me started on all the screenshots I take by accident, mostly of my lockscreen.) At best, you might be able to search for some text inside the image. But it’s more likely that you’ll just have to s scroll until you find it again.
The first step in making screenshots more useful is to figure out what’s actually in them
The first step in making screenshots more useful is to figure out what’s actually in them. This is, at first blush, not terribly complicated: optical character recognition technology has long done a good job of spotting text on a page. AI models take that one step further, so you can either search the title or just “movies” to find all your digital snaps of posters, Fandango results, TikTok recommendations, and more. “We use an OCR model,” says Shenaz Zack, a product manager at Google and part of the team behind the Pixel Screenshots app. “Then we use an entity-detection model, and then Gemini to understand the actual context of the screen.”
See, there’s far more to a screenshot than just the text inside. The right AI model should be able to tell that it came from WhatsApp, just by the specific green color. It should be able to identify a website by its header logo or understand when you’re saving a Spotify song name, a Yelp handyman review, or an Amazon listing. Armed with this information, a screenshot app might begin to automatically organize all those images for you. And even that is just the beginning.
With everything I’ve described so far, all we’ve really created is a very good app for looking at your screenshots, which no one really thinks is a good idea because it would be just one more thing to check — or forget to check. Where it gets vastly more interesting is when your device or app can actually start to use the screenshots on your behalf, to help you actually remember what you captured or even use that information to get stuff done.
In Nothing’s new Essential Space app, for instance, the app can generate reminders based on stuff you save. If you take a screenshot of a concert you’d like to go to, it can remind you that it’s coming up automatically. Pixel Screenshots is pushing the idea even further: if you save a concert listing, your Pixel phone can prompt you to listen to that band the next time you open Spotify. If you screenshot an ID card or a boarding pass, it might ask you to put it in the Wallet app. The idea, Zack says, is to think of screenshots as an input system for everything else.
Mike Choi, an indie developer, built an app called Camp in part to help him make use of his own screenshots. He began to work on turning every screenshot into a “card,” with the salient information stored alongside the picture. “You have a screenshot, and at the bottom there’s a button, and it flips the card over,” he says. “It shows you a map, if it was a location; a preview of a song, if it’s a song. The idea was, given an infinite pool of different types of screenshots, can AI just generate the perfect UI for that category on the fly?”
If all this sounds familiar, it’s because there’s another term for what’s going on here: it’s called agentic AI. Every company in tech seems to be working on ways to use AI to accomplish things on your behalf. It’s just that, in this case, you don’t have to write long prompts or chat back and forth with an assistant. You just take a screenshot and let the system go to work. “You’re building a knowledge base, when today that knowledge base is confined to your gallery and nothing happens with it,” Deserti says. He’s excited to get to the point where you screenshot a concert date, and Essential Space automatically prompts you to buy tickets when they go on sale.
Making sense of screenshots isn’t always so straightforward
Making sense of screenshots isn’t always so straightforward, though. Some you want to keep forever, like the ID card you might need often; other things, like a concert poster or a parking pass, have extremely limited shelf lives. For that matter, how is an app supposed to distinguish between the parking pass you use every day at work and the one you used once at the airport and never need again? Some of the screenshots on my phone were sent to me on WhatsApp; others I grabbed from Instagram memes to send to friends. No one’s camera roll should ever be fully held against them, and the same goes for screenshots. Lots of these screenshot apps are looking for ways to prompt you to add a note, or organize things yourself, in order to provide some additional helpful information to the system. But it’s hard work to do that without ruining what makes screenshots so seamless and easy in the first place.
One way to begin to solve this problem, to make screenshots even more automatically useful, is to collect some additional context from your device. This is where companies like Google and Nothing have an advantage: because they make the device, they can see everything that’s happening when you take a screenshot. If you grab a screenshot from your web browser, they can also store the link you were looking at. They can also see your physical location or note the time and the weather. Sometimes this is all useful, but sometimes it’s nonsense; the more data they collect, the more these apps risk running into the same noise problem that screenshots helped solve in the first place.
But the input system works. We all take screenshots, all the time, and we’re used to taking them as a way to put a marker on so many kinds of useful information. Getting access to that kind of relevant, personalized data is the hardest thing about building a great AI assistant. The future of computing is certainly multimodal, including cameras, microphones, and sensors of all kinds. But the first best way to use AI might be one screenshot at a time.