multi-modal apps via deepmind AI studio are VERY under-hyped with just a single prompt i built this app in <10 min that recorded me flipping through records and outputs every artist + album shown. video used to be one of the hardest things to work with, now it’s a prompt.
對於那些好奇的人,這是我使用的提示: "創建一個應用程序,拍攝一個人翻閱他們的唱片收藏的視頻,並提取每張專輯的專輯名稱和藝術家名稱。 你可以通過拍攝視頻,首先提取顯示不同黑膠唱片的幀,然後讓視覺模型分析這些幀以提取信息"
1.95K