With the launch of its new generative AI platform, Gemini, earlier this month, Google is attempting to create a stir. However, while Gemini seems promising in many areas, it is lacking in others. What, then, is a Gemini? What applications does it have? And how does it compare with the other options?
We’ve created this helpful guide, which we’ll update when new Gemini models and features are launched, to make it simpler to stay up to date with the most recent Gemini advancements.
What is Gemini?
DeepMind and Google Research, Google’s AI research labs, are responsible for developing Gemini, the company’s much-anticipated next-generation generative AI model family.
There are three types available:
The main Gemini model, Gemini Ultra; a “lite” version of the flagship Gemini model; and a smaller, “distilled” version called Gemini Nano, which is compatible with smartphones like the Pixel 8 Pro.
Every Gemini model was educated to be “natively multimodal,” or capable of utilizing and interacting with media other than text. A wide range of music, pictures, videos, codebases, and text in several languages were used for pre-training and fine-tuning.
Gemini differs from other models like LaMDA, Google’s big language model, in that it was not trained on text data. Gemini models are able to comprehend and produce text, unlike LaMDA, which is limited to text (such as essays and draft emails). Even though their comprehension of sound, visual, and other modalities is still restricted, it’s still better than nothing.
What’s the difference between Bard and Gemini?
Google failed to make it obvious that Gemini is different from Bard right away, demonstrating once more that it lacks a sense of branding. Bard can be thought of as an application or client for Gemini and other generational AI models. It is essentially an interface that allows certain Gemini models to be accessed. Conversely, Gemini is a family of models rather than an application or front end. The Gemini experience is not and probably never will be a stand-alone encounter. In terms of OpenAI products, Bard is equivalent to ChatGPT, the company’s well-known conversational AI tool, while Gemini is equivalent to the language model that drives ChatGPT, which is GPT-3.5 or 4.
By the way, Gemini is completely separate from Imagen-2, a text-to-image model that might or might not be included in the business’s broader artificial intelligence plan. Rest assured that you are not the only one who finds this confusing!
What can Gemini do?
The multimodal nature of the Gemini models allows them to theoretically be used for a variety of activities, such as creating artwork, labeling photos and videos, and transcribing speech. Although not all of these features have made it to market yet (more on that later), Google promises to include them all and more at some point in the not-too-distant future.
Naturally, it’s a little difficult to believe what the corporation says.
Google significantly underperformed when it launched Bard in the first place. More recently, it caused controversy when a video that seemed to demonstrate Gemini’s skills was later found to have been significantly Photoshopped and to be essentially aspirational. To the credit of the IT giant, Gemini is currently accessible in limited quantities.
However, if Google is telling the truth, the following are the capabilities that the various Gemini model tiers will have when they launch:
Gemini Ultra, the “foundation” model that underpins the others, is now only available to a “select set” of users that utilize a few Google products and services. That won’t alter until Google’s largest model launches more widely later this year. It’s recommended to take the information regarding Ultra with a grain of salt because it primarily comes from product demos run by Google.
According to Google, Gemini Ultra can be used to assist with physics homework, provide step-by-step solutions for problems on a worksheet, and identify potential errors in responses that have already been completed. According to Google, Gemini Ultra can also be used for other activities like finding scientific publications that are pertinent to a certain issue, extracting data from those papers, and “updating” a chart by creating the formulas required to reproduce it with more recent data.
As previously mentioned, Gemini Ultra can theoretically handle image generation. However, Google claims that feature won’t be included in the productized version of the model when it launches, maybe because the mechanism is more intricate than the way apps like ChatGPT generate photos. Instead of sending commands to an image generator (such as DALL-E 3 in ChatGPT’s example), Gemini produces images “natively” and does so without the need for a middleman.
The public can currently purchase Gemini Pro, unlike Gemini Ultra. Surprisingly, though, its capabilities vary depending on the application.
Google claims that the reasoning, planning, and comprehension skills of Gemini Pro are better than those of LaMDA in Bard, where the model was first released in text-only form. Gemini Pro can handle longer and more complex reasoning chains than OpenAI’s GPT-3.5, according to a separate study by researchers at Carnegie Mellon and BerriAI.
However, the study also discovered that Gemini Pro, like all large language models, has trouble with math issues involving several digits, and users have discovered a ton of instances of incorrect reasoning and errors. For basic questions like who won the most recent Oscars, it made a lot of factual mistakes. Google has promised improvements, but it is unclear when they will happen.
Vertex AI, Google’s fully managed AI development platform, offers Gemini Pro via API as well. Vertex AI takes text as input and produces text as output. Gemini Pro Vision is an extra endpoint that can process text and imagery, including images and videos, and produce text that is similar to OpenAI’s GPT-4 with Vision model.
With Vertex AI, developers can employ a “grounding” or fine-tuning procedure to tailor Gemini Pro to particular use cases and circumstances. Additionally, Gemini Pro can be integrated with other third-party APIs to carry out certain functions.
Vertex customers will get access to Gemini Pro at some point in “early 2024,” which will enable them to power chatbots and custom-built conversational voice and speech agents. Additionally, Gemini Pro will be able to power Vertex AI’s search summarization, recommendation, and answer generation functions by using documents from various modalities (such as PDFs and photos) and sources (such as OneDrive and Salesforce) to respond to user inquiries.
Workflows for generating freeform, structured, and chat prompts using Gemini Pro are available in AI Studio, Google’s web-based solution for platform and app developers. Developers can modify the model temperature to manage the creative range of the output, provide examples to provide instructions on tone and style, and fine-tune the safety parameters. They have access to both the Gemini Pro and the Gemini Pro Vision endpoints.
A considerably more compact variant of the Gemini Pro and Ultra editions, the Gemini Nano is capable of running tasks directly on (certain) phones, eliminating the need to transfer them to a server. Thus far, it powers two Pixel 8 Pro features: Condense in Gboard and Smart Reply in Recorder.
You may record and transcribe audio using the Recorder app by simply pressing a button. Gemini is used to summarize recorded conversations, interviews, presentations, and other briefs. Even without a signal or Wi-Fi connection, users may still access these summaries, and in keeping with privacy, no data is sent from their phone during this procedure.
Additionally, Gemini Nano is available as a developer preview on Gboard, Google’s keyboard software. There, it drives a function known as Smart Reply, which assists in recommending what to say next during a messaging app conversation. Initially limited to WhatsApp, the capability will roll out to additional apps by 2024, according to Google.
Is Gemini better than GPT-4 from OpenAI?
Until Google publishes Ultra later this year, it will be impossible to determine how the Gemini family actually compares, but the company has claimed advances above the state of the art, which is often OpenAI’s GPT-4.
Google has repeatedly bragged about Gemini’s performance on benchmarks, saying that on “30 of the 32 widely used academic benchmarks used in large language model research and development,” Gemini Ultra achieves state-of-the-art results. According to the business, GPT-3.5 is not as good at things like writing, brainstorming, and content summarization as Gemini Pro is.
Nevertheless, the results Google points to seem to be just slightly better than OpenAI’s similar models, putting aside the question of whether benchmarks actually imply a superior model. Furthermore, as was already said, not all early impressions have been positive. Users and scholars have noted that Gemini Pro frequently provides inaccurate coding suggestions, problems with translations, and mistakes simple facts.
What is the cost of Gemini?
You can use Gemini Pro for free in Bard, AI Studio, and Vertex AI for the time being.
However, the model will cost $0.0025 per character after Gemini Pro exits preview in Vertex, and the output will cost $0.00005 per character. Customers of Vertex pay for 1,000 characters, or roughly 140–250 words, and, for certain versions like the Gemini Pro Vision, for each image ($0.0025).
Assume an article with 500 words has 2,000 characters. It would cost $5 to use Gemini Pro to summarize that article. On the other hand, producing an article with the same length would cost $0.1.
Where can I try Gemini?
Bard is the best location to use Gemini Pro. A refined iteration of Pro is currently responding to text-based Bard inquiries in English inside the United States; other languages and supported nations will follow.
Through an API, Gemini Pro can also be accessed in preview in Vertex AI. For the time being, the API is available for free “within limits” and covers 38 languages and locations, including Europe. It also has capabilities like filtering and chat functionality.
Gemini Pro is located elsewhere in AI Studio. Developers can export the code to a more feature-rich IDE or iterate prompts and Gemini-based chatbots using the service. After that, they can obtain API credentials to use the bots in their apps.
In the upcoming weeks, Duet AI for Developers, Google’s collection of AI-powered tools for code production and completion support, will begin utilizing a Gemini model. Additionally, Google intends to provide Gemini models at the same time, in early 2024, for use with its Firebase mobile development platform and Chrome dev tools.
In the future, Gemini Nano will be available on additional devices in addition to the Pixel 8 Pro. Developers can register for a sneak peek if they would like to use the model in their Android apps.