Wow. I’ve been out of action for a little over a week with a week off from my day job, bestest pals visiting from England and picking up a nasty upper respiratory infection somewhere along the way (hurray - first time I’ve had anything more than a mild cold since COVID started). Wow because it’s been a heck of an interesting ten days ish in the GenAI space.
OpenAI launched GPT-4o, and said about it:
GPT-4o is our newest flagship model that provides GPT-4-level intelligence but is much faster and improves on its capabilities across text, voice, and vision.
I’ve been trying it out on my phone and although I have no doubt that it has some formidable capabilities. I’m disappointed that it can’t hold a continuous conversation. I don’t think that was promised, but still - the joy of using Pi makes me want to see that as almost table stakes with big version upgrades from the top tier chatbots.
There’s also been a whole lot of new about Apple and (very strongly) rumored partnerships with OpenAI, with Google, and maybe even with both. I get the vibe from several tech journalists covering these rumors that for them that will mean Game Over in terms of best GenAI experience on a smartphone. My reaction absolutely wants to channel Demi More in A Few Good Men:
I strenuously object, strongly disagree. Apple has been way behind Google for many years in AI. Siri vs Google Assistant has always been no contest. Now they are still way behind Google and also way behind OpenAI, Meta, Amazon, and probably a few others. Partner level ChatGPT will have to offer something very special for it to suddenly be better than the Pixel 8 Pro line - which already its own Gemini model built tightly into the Pixel 8 Pro hardware and software. And Android users can run ChatGPT, PI, Poe, Microsoft’s Copilot, and others already. They’ve been available for quite a while.
Anyway, on to my utterly unbiased Android user reaction on Google I/O this week and wall-to-wall AI announcements. I have quick scanned quite a few posts that run down those AI highlights, but honestly I need to do more reading and video watching and trying things out with Gemini on my Pixel 8 Pro. So just a few quick thoughts on a few of the things that caught my eye the most:
Gems - “Gemini GPTs”? Android Authority describes these like so:
Gemini will soon allow users to create custom Gemini-powered AI assistants with varying personalities. Google calls these chatbots “Gems” and they can be tuned in a way to help you with specific tasks.
I’m not excited about this news. I thought ChatGPT’s GPTs sounded great at first and after experimenting with them a lot, I found very few to be interesting or useful.
Gemini Pro upgrades and new features: Some of these definitely feel exciting. Staring with “Both 1.5 Pro and 1.5 Flash are available in public preview with a 1 million token context window …”. Gemini 1.5 Po is what runs on the Pixel 8 Pro, and a graphic rather than words brings this home:
Working on an AI Agent Project: This is the most exciting bit of news to me, though its arrival doesn’t have a stated timeframe yet. I’ve been seeing people who I consider leading thinkers on GenAI (and AI in general) say that AI agents, and ACI (artificial capable intelligence) are the next big step for GenAI tools for a long while now. The idea is, at least in part, that AI agents will be able to complete complex sets of tasks for us with minimal to decent guidance. I wrote a little on a similar line of thinking from Carl Pei on the need for a new metaphor for smartphones last August, which I think fits nicely with the AI agents future. Money quote from him on this:
I think it needs to slowly augment away the apps. Today, we’re using some really simple, mindless-scrolling apps, right? What if we wanted to accomplish more complicated tasks like 3D modeling or photo editing, or I don’t know what? It’s actually quite difficult to learn how to use these new apps. Maybe we can just tell the phone what we need to do, and it would use those apps for us without the apps even being visible in the foreground. Right. I think that could be enough utility to transition to a new metaphor.
Google’s project is called Project Astra, and here’s a slice of their intro for it:
As part of Google DeepMind’s mission to build AI responsibly to benefit humanity, we’ve always wanted to develop universal AI agents that can be helpful in everyday life. That’s why today, we’re sharing our progress in building the future of AI assistants with Project Astra (advanced seeing and talking responsive agent).
To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand context and take action. It also needs to be proactive, teachable and personal, so users can talk to it naturally and without lag or delay.
The short demo video for this is very impressive, features a Golden Retriever and a tiger stuffed animal, and is well worth a watch in this Gemini updates post: https://blog.google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/#gemini-model-updates
Which GenAI news is getting you excited these days? Lots of it, none of it, fed up with hearing about it?
Here‘s an interesting piece of comparison https://flip.it/_QEFCd
I tend to be wary of these previews and announcements till we see how they really perform. CoPilot is buggy and the interface needs work.
I‘m most excited about vision and the fact that it seems to be free. And Claude launched in Europe - with limits tho. I could only do 5 prompts then had to wait hours to continue so there’s that.
Exciting times.