Apple research paper reveals AI that understands visual elements

Categories : Blog

Estimated reading time: 1 minutes

    Researchers at Apple have reportedly developed a new AI system called ReALM (Reference Resolution As Language Modeling), that can read and understand visual elements, essentially being able to decipher on-screen prompts.

    The research paper suggests that the new model reconstructs the screen using “parsed on-screen entities” and their locations in a textual layout. This essentially captures the visual layout of the on-screen page, and according to the researchers, when a model is specifically fine-tuned for this approach, it could outperform even GPT-4, and lead to more natural and intuitive interactions.

    “Being able to understand context, including references, is essential for a conversational assistant,” reads the research paper. “Enabling the user to issue queries about what they see on their screen is a crucial step in ensuring a true hands-free experience in voice assistants.” The development could one day make its way to Siri, helping it become more conversational and “true hands-free.”

    While it is unlikely that we’ll hear more about ReALM this year, we should be learning more about AI-related developments, including features coming to Siri at WWDC 2024 on June 10th.

    Read more about ReALM here.

    Image credit: Shutterstock

    Source: Apple Via: VentureBeat

    Discover more from Artificial Race!

    Subscribe to get the latest posts to your email.

    Leave a Reply

    Discover more from Artificial Race!

    Subscribe now to keep reading and get access to the full archive.

    Continue reading