Hackathon Showcase

VLA gen UI

Team consisting of CTU AI graduate and undergraduate researchers specializing in computer vision, C++ mixed-reality algorithms, and open-source generative AI prototyping.

2 members Watch Demo

YouTube Video

Project Description

VLA models are robot policies which execute plain language tasks like “pick up a cube”. Why should we be limited to text description of tasks? What if the model itself provided the UI for the actions that it can do. that’s what we are building.

This combines pi0.5 for the VLA model, Gemini Embodied Reasoning 1.6 to analyse the scene and identify actions. Finally a regular Gemini model to draw the UI to control these actions.

What you see is working app on a server gpu. I am unaware of anything similar in research or in practice and it wouldnt be possible a few weeks ago before the new gemini ER model.

Prior Work

Started from scratch, just had our experience

Team

Ondřej Baštař

Viktorie Valdmanová

Products & Tools

AI Tinkerers Google DeepMind Google ER model MuJoCO Tensor Ventures

Social Posts

https://x.com/OndrejBastar/status/2053163663184281831

Summarizing URL...