What is Gemma 3n and How to Access it?

Badrinarayan M Last Updated : 22 May, 2025

6 min read

After showing impressive efficiency with Gemma 3, running powerful AI on a single GPU, Google has pushed the boundaries even further with Gemma 3n. This new release brings state-of-the-art AI to mobile and edge devices, using minimal memory while delivering fast, multimodal performance. In this article, we’ll explore what makes Gemma 3n so powerful, how it works under the hood with innovations like Per-Layer Embeddings (PLE) and MatFormer architecture, and how to access Gemma 3n easily using Google AI Studio. If you’re a developer looking to build fast, smart, and lightweight AI apps, this is your starting point.

What is Gemma 3n?
How Does PLE Increase Gemma 3n’s Performance?
Key Features of Gemma 3n
How MatFormer Architecture Helps?
How to Access Gemma 3n?
Conclusion

What is Gemma 3n?

Gemma 3 showed us that powerful AI models can run efficiently, even on a single GPU, while outperforming larger models like DeepSeek V3 in chatbot Elo scores with significantly less compute. Now, Google has taken things further with Gemma 3n, designed to bring state-of-the-art performance to even smaller, on-device environments like mobile phones and edge devices.

To make this possible, Google partnered with hardware leaders like Qualcomm, MediaTek, and Samsung System LSI, introducing a new on-device AI architecture that powers fast, private, and multimodal AI experiences. The “n” in Gemma 3n stands for nano, reflecting its small size yet powerful capabilities.

This new architecture is built on two key innovations:

Per-Layer Embeddings (PLE): Innovated by Google DeepMind to reduces memory usage by caching and managing layer-specific data outside the model’s main memory. It enables larger models (5B and 8B parameters) to run with just 2GB to 3GB of RAM, similar to 2B and 4B models.
MatFormer (Matryoshka Transformer): A nested model architecture that allows smaller sub-models to function independently within a larger model. This gives developers flexibility to choose performance or speed without switching models or increasing memory usage.

Together, these innovations make Gemma 3n efficient for running high-performance, multimodal AI on low-resource devices.

How Does PLE Increase Gemma 3n’s Performance?

When Gemma 3n models are executed, Per-Layer Embedding (PLE) settings are employed to generate data that improves each model layer’s performance. As each layer executes, the PLE data can be created independently, outside the model’s working memory, cached to quick storage, and then incorporated to the model inference process. By preventing PLE parameters from entering the model memory space, this method lowers resource usage without sacrificing the quality of the model’s response.

Gemma 3n models are labeled with parameter counts like E2B and E4B, which refer to their Effective parameter usage, a value lower than their total number of parameters. The “E” prefix signifies that these models can operate using a reduced set of parameters, thanks to the flexible parameter technology embedded in Gemma 3n, allowing them to run more efficiently on lower-resource devices.

These models organize their parameters into four key categories: text, visual, audio, and per-layer embedding (PLE) parameters. For instance, while the E2B model normally loads over 5 billion parameters during standard execution, it can reduce its active memory footprint to just 1.91 billion parameters by using parameter skipping and PLE caching, as shown in the following image:

How Does PLE Increase Gemma 3n’s Performance? — Source: Google Blog

Key Features of Gemma 3n

Gemma 3n is finetuned for device tasks:

This is the model’s capacity to use user input to initiate or call specific operations directly on the device, such as launching apps, sending reminders, turning on a flashlight, etc. It enables the AI to do more than just respond; it can also communicate with the device itself.
Gemma 3n can comprehend and react to inputs that combine text and graphics if they are interleaved. For instance, the model can handle both when you upload an image and ask a text inquiry about it.
For the first time in the Gemma family, it has the ability to comprehend both audio and visual inputs. Audio and video were not supported by earlier Gemma models. Gemma 3n is now able to view videos and listen to sound in order to comprehend what is happening, such as recognizing actions, detecting speech, or responding to inquiries based on a video clip.

This allows the model to interact with the environment and allows users to naturally interact with applications. Gemma 3n is 1.5 times faster than Gemma 3 4B on mobile. This increases the fluidity in the user experience (Overcomes the generation latency in LLMs).

Gemma 3n has a smaller submodel as a unique 2 in 1 matformer architecture. This lets users dynamically choose performance and speed as necessary. And to do this we do not have to manage a separate model. All this happens in the same memory footprint.

How MatFormer Architecture Helps?

A Matryoshka Transformer or MatFormer model architecture, which consists of nested smaller models inside a bigger model, is used by Gemma 3n models. It is possible to make inferences using the layered sub-models without triggering the enclosing models’ parameters while reacting to queries. Running only the smaller, core models inside a MatFormer model helps lower the model’s energy footprint, response time, and compute cost. The E2B model’s parameters are included in the E4B model for Gemma 3n. You can also choose settings and put together models in sizes that fall between 2B and 4B with this architecture.

How to Access Gemma 3n?

Gemma 3n preview is available in Google AI Studio, Google GenAI SDK and MediaPipe (Huggingface and Kaggle). We will access Gemma 3n using Google AI Studio.

Step 1: Login to Google AI studio
Step 2: Click on the Get API key

Step 3: Click on the Create API key

Step 4: Select a project of your choice and click on Create API Key

Step 5: Copy the API and save it for further use to access Gemma 3n.
Step 6: Now that we have the API Lets spin up a colab instance. Use colab.new in the browser to create a new notebook.
Step 7: Install dependencies

!pip install google-genai

Step 8: Use secret keys in colab to store GEMINI_API_KEY, enable the notebook access as well.

Step 9: Use the below code to set environment variables:

from google.colab import userdata

import os

os.environ["GEMINI_API_KEY"] = userdata.get('GEMINI_API_KEY')

Step 10: Run the below code to infer results from Gemma 3n:

import base64

import os

from google import genai

from google.genai import types

def generate():

   client = genai.Client(

       api_key=os.environ.get("GEMINI_API_KEY"),

   )

   model = "gemma-3n-e4b-it"

   contents = [

       types.Content(

           role="user",

           parts=[

               types.Part.from_text(text="""Anu is a girl. She has three brothers. Each of her brothers has the same two sisters. How many sisters does Anu have?"""),

           ],

       ),

   ]

   generate_content_config = types.GenerateContentConfig(

       response_mime_type="text/plain",

   )

   for chunk in client.models.generate_content_stream(

       model=model,

       contents=contents,

       config=generate_content_config,

   ):

       print(chunk.text, end="")

if __name__ == "__main__":

   generate()

Output:

Also Read: Top 13 Small Language Models (SLMs)

Conclusion

Gemma 3n is a big leap for AI on small devices. It runs powerful models with less memory and faster speed. Thanks to PLE and MatFormer, it’s efficient and smart. It works with text, images, audio, and even video all on-device. Google has made it easy for developers to test and use Gemma 3n through Google AI Studio. If you’re building mobile or edge AI apps, Gemma 3n is definitely worth exploring. Checkout Google AI Edge to run the Gemma 3n Locally.

Badrinarayan M

Data science Trainee at Analytics Vidhya, specializing in ML, DL and Gen AI. Dedicated to sharing insights through articles on these subjects. Eager to learn and contribute to the field's advancements. Passionate about leveraging data to solve complex problems and drive innovation.

Generative AI Intermediate

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

What is Gemma 3n and How to Access it?

Table of contents

What is Gemma 3n?

How Does PLE Increase Gemma 3n’s Performance?

Key Features of Gemma 3n

How MatFormer Architecture Helps?

How to Access Gemma 3n?

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

What is Gemma 3n and How to Access it?

Table of contents

What is Gemma 3n?

How Does PLE Increase Gemma 3n’s Performance?

Key Features of Gemma 3n

How MatFormer Architecture Helps?

How to Access Gemma 3n?

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques