agent enginegcpgradiovertex ai

Gradio with Vertex AI Agent Engine

January 24, 2026
3 min read

Stop thinking of agents as just chat boxes. Learn how to integrate agents deployed on Vertex AI Agent Engineinto any custom application, starting with a local Gradio app. This guide walks you through the manual integration process, highlighting how agents function as flexible APIs that can bring intelligence to every corner of your product.

Introduction

In this post, we'll explore how to consume an agent deployed on Vertex AI Agent Engine through a custom local Gradio app.

Currently, because things move so fast and libraries are still stabilizing, it's difficult to get an end-to-end solution working through LLM-generated code (Gemini, ChatGPT, Claude).

Anyway, I think it's important to understand how agent integration actually works within an application. There's a lot of misunderstanding regarding the different ways to test and deploy agents—and what they actually are, but that's a story for another time.

At a high level, agents surface an API interface wrapped by an SDK to simplify interactions (specifically session management, persistence, etc.). Agents don't need to live in a chat input box.

Once you learn how to manually integrate and call an agent through Gradio, you'll be able to integrate your agents anywhere.

I believe the real power of agents will be realized through custom integrations. You won't even realize that when you're clicking a button, an agent/LLM is actually thinking through your request instead of just executing deterministic functions.

Intelligence will complement our apps and enable us to do things that previously required a massive amount of business logic. Instead, we will leverage LLMs, tools, and system instructions to define the input, output, and rules.

Let's get started.

Setup

We need to set up a new project. These days, I typically use uv. If you don't have it installed yet, follow these installation steps.

Additionally, make sure you are authenticated with Google Cloud locally by running gcloud auth application-default login

Terminal
$uv init gradio-agent-engine
$cd gradio-agent-engine
$uv add "google-cloud-aiplatform>=1.132.0" "gradio>=6.2.0" "python-dotenv>=1.2.1"

Since libraries are constantly evolving, I suggest you use the exact versions provided above. You can copy this into your terminal and everything should be set up properly.

Implementation

We'll need to create a .env file to hold our environment variables.

I am assuming you already have an agent deployed to Vertex AI Agent Engine.

If so, define the following environment variables using your agent's deployment settings. Navigate to Vertex AI → Agent Engine and copy the Resource name, from which you can easily extract all required settings: Project ID, Location, and the Deployed Agent ID.

.env
txt
1PROJECT_ID=""
2LOCATION=""
3AGENT_ID=""

Create a new file app.py. Add the following imports and load the environment variables using python-dotenv:

app.py
python
1import os
2import uuid
3import gradio as gr
4from dotenv import load_dotenv
5import vertexai
6from vertexai import agent_engines
7
8load_dotenv()
9
10PROJECT_ID = os.getenv("PROJECT_ID")
11LOCATION = os.getenv("LOCATION")
12DEFAULT_AGENT_ID = os.getenv("AGENT_ID")
13
14vertexai.init(project=PROJECT_ID, location=LOCATION)

First, we need to set up the Vertex AI SDK using the init function.

Next, we need an object to manage our session variables.

app.py
python
1class AppSession:
2 """
3 Holds the state for a single user's browser session.
4 """
5
6 def __init__(self, agent_id=DEFAULT_AGENT_ID):
7 self.user_id = str(uuid.uuid4())
8 self.agent_id = agent_id
9 self.remote_app = None
10 self.session_id = None
11
12 if self.agent_id:
13 self.setup_agent(self.agent_id)
14
15 def setup_agent(self, agent_id: str):
16 self.agent_id = agent_id
17 try:
18 resource_name = f"projects/{PROJECT_ID}/locations/{LOCATION}/reasoningEngines/{agent_id}"
19 self.remote_app = agent_engines.get(resource_name)
20 except Exception as e:
21 print(f"Failed to load agent: {e}")
22 self.remote_app = None
23
24 async def create_session(self):
25 """Creates a new session with the Agent Engine."""
26 if not self.remote_app:
27 return "Error: Agent not loaded."
28
29 try:
30 # Create session via the SDK
31 agent_session = await self.remote_app.async_create_session(
32 user_id=self.user_id
33 )
34
35 # Handle different return types from the SDK (dict vs object)
36 if isinstance(agent_session, dict):
37 self.session_id = agent_session.get("id")
38 else:
39 self.session_id = getattr(agent_session, "id", "unknown_id")
40
41 return self.session_id
42 except Exception as e:
43 return f"Error creating session: {e}"

A unique user_id is generated for each new session. Additionally, the remote app is built from the Project ID, Location, and Agent ID.

Then, we use our remote app (deployed on Agent Engine) to create a new session. We are making a network call here, which is why this function is asynchronous. The server, not the client, is responsible for building the session.

Once we have the session, we’ll hold on to the session_id, which is required for every subsequent call to the agent during that specific session.

Finally, to complete our integration with the Gradio chat interface, we need a function to manage session creation (on the fly), sending user messages, and receiving agent responses.

app.py
python
1async def agent_chat_response(message, history, app_session):
2 """
3 Main chat handler.
4 Note: 'app_session' is passed in via additional_inputs
5 """
6 if not app_session or not app_session.remote_app:
7 yield "Please ensure the Agent ID is loaded correctly."
8 return
9
10 # If no session exists yet, create one on the fly
11 if not app_session.session_id:
12 await app_session.create_session()
13
14 try:
15 full_response = ""
16
17 # Using the streaming query from the SDK
18 async for event in app_session.remote_app.async_stream_query(
19 user_id=app_session.user_id,
20 session_id=app_session.session_id,
21 message=message,
22 ):
23 # Parse the specific response structure of Agent Engine
24 # Note: Verify this structure matches your specific SDK version (Preview vs GA)
25 is_text_part = (
26 "content" in event
27 and "parts" in event["content"]
28 and event["content"]["parts"]
29 and "text" in event["content"]["parts"][0]
30 )
31
32 if is_text_part:
33 text_chunk = event["content"]["parts"][0]["text"]
34 if text_chunk:
35 full_response += text_chunk
36 yield full_response
37
38 except Exception as ex:
39 yield f"Error during generation: {ex}"

You can then integrate this function within a usual gradio app (but could be anything else). The following is an example:

app.py
python
1with gr.Blocks(theme=gr.themes.Soft(), title="Vertex AI Agent Chat") as app:
2 session_state = gr.State()
3
4 gr.Markdown("# Demo: Agent Engine on Vertex AI")
5
6 with gr.Row():
7 agent_id_input = gr.Textbox(
8 label="Agent ID", value=DEFAULT_AGENT_ID, interactive=True, scale=4
9 )
10 load_btn = gr.Button("Load Agent", variant="secondary", scale=1)
11
12 with gr.Row():
13 session_info = gr.Textbox(
14 label="Current Session ID",
15 value="Not initialized",
16 interactive=False,
17 scale=4,
18 )
19 reset_btn = gr.Button("New Session", variant="primary", scale=1)
20
21 chatbot = gr.Chatbot(height=500, label="Agent Chat")
22
23 chat_interface = gr.ChatInterface(
24 fn=agent_chat_response,
25 additional_inputs=[session_state],
26 chatbot=chatbot,
27 textbox=gr.Textbox(placeholder="Ask the agent...", container=False, scale=7),
28 )

You can start your local application with uv run app.py

Gradio app

Conclusion

We walked through a complete example of how to integrate an agent deployed on Vertex AI Agent Engine with a local Gradio app.

The main goal was to help everyone understand the details of SDK interactions with a deployed agent. To look under the hood and realize that a deployed agent is an API—nothing more.

And since an agent is an API, it shouldn't be confined to just a chat box.

So, go ahead: think about where in your website or app a non-deterministic, truly intelligent action could improve the user experience—or enable a completely unique feature that was either impossible before or simply required too much work.

Share this post

Comments