Sundar Pichai, CEO of Google, says the Gemini era of AI has begun. Google’s latest huge language model, Gemini, was teased at the I/O developer conference in June and is now public. According to Pichai and Google DeepMind CEO Demis Hassabis, it’s a major AI model advancement that will impact almost all of Google’s products. “Our ability to refine a single underlying technology and see its benefits ripple across all our products is truly powerful.”
Gemini has multiple AI models. Android smartphones can run Gemini Nano, a lighter version, natively and offline. The beefier Gemini Pro will run many Google AI services and will be Bard’s backbone starting today. Google’s most powerful LLM, Gemini Ultra, is aimed at data centers and enterprise applications.
Google is now launching the model in several ways: Gemini Pro powers Bard, and Gemini Nano gives Pixel 8 Pro users new functionality. Gemini Ultra arrives next year. Developers and enterprises can access Gemini Pro through Google Generative AI Studio or Vertex AI on Google Cloud starting December 13. Gemini is now only available in English, but other languages are expected. But Pichai said the concept will eventually be integrated into Google’s search engine, ad products, Chrome browser, and more worldwide. Google’s future is here soon.
OpenAI debuted ChatGPT a year and a week ago, and the company and product became AI landmarks. Google, which created much of the foundational technology behind the current AI boom, has called itself an “AI-first” organization for nearly a decade and was embarrassingly caught off guard by ChatGPT’s success and OpenAI’s rapid adoption. It is now ready to fight back.
Following a comprehensive analysis comparing GPT-4 and Gemini across 32 benchmarks encompassing various areas like language understanding and code generation, Hassabis reveals Google’s confidence in Gemini’s capabilities. “We’re substantially ahead on 30 out of 32 benchmarks,” he announces with a smile, highlighting Google’s preparedness for this crucial AI showdown.
In those standards (which are typically close), Gemini’s biggest edge is its video and audio comprehension. Multimodality was always part of the Gemini plan. Google designed a multimodal model instead of training separate picture and audio models like OpenAI did with DALL-E and Whisper. “We’ve always been interested in very, very general systems,” says Hassabis. He wants to mix all of those modes to collect as much data as possible from any number of inputs and senses and deliver diverse replies.
The simplest Gemini models are text-in and text-out, whereas Gemini Ultra can handle graphics, video, and music. And “it’s going to get even more general than that,” says Hassabis. We still have action and touch—more like robotics. He claims Gemini will gain more senses, awareness, accuracy, and grounding over time. “These models just sort of understand better about the world.” Hallucinations, biases, and other issues persist in these simulations. As they learn, Hassabis believes, they’ll improve.
Benchmarks are merely benchmarks; therefore, Gemini’s genuine test will come from regular users who wish to brainstorm, research, write code, and more. Google claims that AlphaCode 2, its latest code-generating system, outperforms 85 percent of coding competition participants, up from 50 percent for the previous AlphaCode. However, Pichai claims that the approach will enhance almost everything.
Also crucial to Google is that Gemini is more efficient. It runs faster and cheaper than Google’s PaLM model because it was trained on tensor processing units. Google is also releasing the TPU v5p, a data center processing device for training and operating large-scale models, along with the new model.
According to Pichai and Hassabis, the Gemini launch represents both a start and a step forward. The paradigm Google has been pushing toward for years, Gemini, may have been ready before OpenAI and ChatGPT swept over the globe.
Google, which declared a “code red” after ChatGPT’s launch and has been playing catch-up since, appears to be sticking to its “bold and responsible” stance. Hassabis and Pichai said they won’t move too fast to keep up, especially as we get closer to artificial general intelligence, a self-improving, smarter-than-human AI that can revolutionize the world. “AGI will change things,” Hassabis says. I think we should be cautious with this active technology. Though cautious, optimistic.”
Google says it has tested and red-teamed Gemini internally and externally to verify its safety and responsibility. Pichai notes that enterprise-first offerings, where most generative AI earns money, require data security and reliability. Hassabis agrees that releasing a cutting-edge AI system may cause challenges and attack vectors no one could have imagined. “That’s why you have to release things,” he continues, “to see and learn.” Google’s Ultra release is gradual; Hassabis calls it a regulated beta with a “safer experimentation zone” for its most powerful and unfettered model. If Gemini has a marriage-destroying other personality, Google is attempting to find it first.
Pichai and other Google executives have praised AI for years. Pichai has often predicted that AI will revolutionize civilization more than fire or electricity. This first generation’s Gemini model may not change the world. It may help Google catch OpenAI in the battle to build great generative AI. Worst case, Bard stays boring and ChatGPT wins. But Pichai, Hassabis, and everyone else at Google appear to think this is the start of something big. The web made Google a tech powerhouse; Gemini could be bigger.