Skip to content

Conversation

@AryanBagade
Copy link

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

pytest results:
35 passed, 71 warnings in 3.90s

Added 3 new tests:

  • test_native_audio_model_forces_audio_modality - Verifies native-audio models force AUDIO modality
  • test_non_native_audio_model_keeps_requested_modality - Verifies regular models keep requested modality
  • test_agent_without_model_attribute - Verifies BaseAgent (no model attr) doesn't crash

Manual End-to-End (E2E) Tests:
To test manually:

  1. Create an agent with model="gemini-live-2.5-flash-native-audio"
  2. Connect to /run_live WebSocket with modalities=["TEXT"]
  3. Verify the modality is forced to ["AUDIO"] instead of failing

Full E2E testing was not possible as it requires access to gemini-live-2.5-flash-native-audio model which is not publicly available.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • [] I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Added hasattr() check to safely handle BaseAgent instances that don't have a model attribute (improvement over issue's proposed solution).

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @AryanBagade, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where specific 'native-audio' models were not correctly handling modality requests in the /run_live endpoint, potentially leading to incorrect behavior. The changes introduce logic to automatically enforce the AUDIO modality for these specialized models, ensuring they operate as intended. Additionally, the update enhances the system's resilience by gracefully handling agents that may not possess a model attribute.

Highlights

  • Modality Enforcement for Native-Audio Models: The /run_live WebSocket endpoint now explicitly checks if an agent uses a 'native-audio' model. If detected, the response modality is automatically forced to AUDIO, ensuring compatibility and preventing errors for models that exclusively support audio.
  • Robust Handling of Agent Models: Improved robustness by adding a hasattr() check when accessing the model attribute of an agent. This prevents crashes for BaseAgent instances or other agents that might not define a model attribute.
  • New Unit Tests: Three new unit tests have been added to validate the new modality forcing logic: one for native-audio models, one for regular models, and one for agents without a model attribute, ensuring comprehensive coverage.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the live [Component] This issue is related to live, voice and video chat label Jan 22, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly forces the AUDIO modality for native-audio models in the /run_live endpoint. The implementation is sound and includes a safe check for the model attribute. The new unit tests cover the different cases for native-audio, non-native, and agents without a model attribute. My main feedback is regarding the structure of the tests, which currently test implementation details and involve code duplication. I've suggested a refactoring that would improve test quality and maintainability.

Comment on lines 1415 to 1551
def test_native_audio_model_forces_audio_modality():
"""Test that native-audio models force AUDIO modality in run_live endpoint."""
from google.adk.cli.adk_web_server import AdkWebServer

# Create an LlmAgent with a native-audio model
native_audio_agent = LlmAgent(
name="native_audio_agent",
model="gemini-live-2.5-flash-native-audio",
)

class NativeAudioAgentLoader:

def load_agent(self, app_name):
return native_audio_agent

def list_agents(self):
return ["test_native_audio_app"]

session_service = AsyncMock()
session = Session(
id="session_id",
app_name="test_native_audio_app",
user_id="user",
state={},
events=[],
)
session_service.get_session.return_value = session

adk_web_server = AdkWebServer(
agent_loader=NativeAudioAgentLoader(),
session_service=session_service,
memory_service=MagicMock(),
artifact_service=MagicMock(),
credential_service=MagicMock(),
eval_sets_manager=MagicMock(),
eval_set_results_manager=MagicMock(),
agents_dir=".",
)

# Verify the _get_root_agent method works correctly
root_agent = adk_web_server._get_root_agent(native_audio_agent)
assert root_agent.model == "gemini-live-2.5-flash-native-audio"
assert "native-audio" in root_agent.model

# Verify the model name detection logic
model_name = (
root_agent.model
if hasattr(root_agent, "model") and isinstance(root_agent.model, str)
else ""
)
assert "native-audio" in model_name

# Test with App wrapping the agent
app_agent = App(name="test_app", root_agent=native_audio_agent)
root_agent_from_app = adk_web_server._get_root_agent(app_agent)
assert root_agent_from_app.model == "gemini-live-2.5-flash-native-audio"


def test_non_native_audio_model_keeps_requested_modality():
"""Test that non-native-audio models keep the requested modality."""
from google.adk.cli.adk_web_server import AdkWebServer

# Create an LlmAgent with a regular model (not native-audio)
regular_agent = LlmAgent(
name="regular_agent",
model="gemini-2.5-flash",
)

class RegularAgentLoader:

def load_agent(self, app_name):
return regular_agent

def list_agents(self):
return ["test_regular_app"]

adk_web_server = AdkWebServer(
agent_loader=RegularAgentLoader(),
session_service=MagicMock(),
memory_service=MagicMock(),
artifact_service=MagicMock(),
credential_service=MagicMock(),
eval_sets_manager=MagicMock(),
eval_set_results_manager=MagicMock(),
agents_dir=".",
)

root_agent = adk_web_server._get_root_agent(regular_agent)
model_name = (
root_agent.model
if hasattr(root_agent, "model") and isinstance(root_agent.model, str)
else ""
)

# For regular models, the modality should NOT be forced to AUDIO
assert "native-audio" not in model_name


def test_agent_without_model_attribute():
"""Test handling of agents without model attribute (BaseAgent)."""
from google.adk.cli.adk_web_server import AdkWebServer

# Create a BaseAgent (which doesn't have a model attribute)
base_agent = DummyAgent(name="base_agent")

class BaseAgentLoader:

def load_agent(self, app_name):
return base_agent

def list_agents(self):
return ["test_base_app"]

adk_web_server = AdkWebServer(
agent_loader=BaseAgentLoader(),
session_service=MagicMock(),
memory_service=MagicMock(),
artifact_service=MagicMock(),
credential_service=MagicMock(),
eval_sets_manager=MagicMock(),
eval_set_results_manager=MagicMock(),
agents_dir=".",
)

root_agent = adk_web_server._get_root_agent(base_agent)

# BaseAgent (DummyAgent) doesn't have a model attribute
model_name = (
root_agent.model
if hasattr(root_agent, "model") and isinstance(root_agent.model, str)
else ""
)

# Should default to empty string when no model attribute
assert model_name == ""
assert "native-audio" not in model_name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These new tests are a good start, but they have a couple of issues that make them less effective and harder to maintain:

  1. They don't test the actual behavior. For example, test_native_audio_model_forces_audio_modality asserts that the model name contains native-audio, but it doesn't verify that the response_modalities are actually forced to ['AUDIO']. The tests are verifying implementation details rather than the behavior described by their names.

  2. There is significant code duplication. The logic to extract the model_name is copied from adk_web_server.py and then duplicated across all three new tests. This makes the tests brittle; if the implementation in adk_web_server.py changes, these tests will need to be updated in multiple places, or worse, they might pass while the actual logic is broken.

A better approach would be to refactor the logic for determining the effective_modalities in adk_web_server.py into its own helper method. This would make the logic in run_agent_live cleaner and allow you to test the behavior directly and without duplication.

For example, you could add a method to AdkWebServer (this would be outside the current diff, in adk_web_server.py):

def _get_effective_modalities(self, app_name: str, requested_modalities: list[str]) -> list[str]:
    """Determines the effective modalities, forcing AUDIO for native-audio models."""
    agent_or_app = self.agent_loader.load_agent(app_name)
    root_agent = self._get_root_agent(agent_or_app)
    model = getattr(root_agent, "model", None)
    model_name = model if isinstance(model, str) else ""
    if "native-audio" in model_name:
        return ["AUDIO"]
    return requested_modalities

Then your tests could be simplified to something like this, which is much more direct and robust:

def test_native_audio_model_forces_audio_modality():
  # ... setup adk_web_server with NativeAudioAgentLoader ...
  modalities = adk_web_server._get_effective_modalities("test_app", ["TEXT"])
  assert modalities == ["AUDIO"]

def test_non_native_audio_model_keeps_requested_modality():
  # ... setup adk_web_server with RegularAgentLoader ...
  modalities = adk_web_server._get_effective_modalities("test_app", ["TEXT"])
  assert modalities == ["TEXT"]

def test_agent_without_model_attribute():
  # ... setup adk_web_server with BaseAgentLoader ...
  modalities = adk_web_server._get_effective_modalities("test_app", ["TEXT"])
  assert modalities == ["TEXT"]

This would make the tests much stronger and easier to maintain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, will fix this!

@AryanBagade
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly forces the AUDIO modality for native-audio models by adding a check in _get_effective_modalities. The implementation is sound and includes comprehensive unit tests for various scenarios.

My main feedback is to refactor _get_effective_modalities to improve performance by avoiding redundant agent loading. The agent is currently loaded twice per request in run_agent_live. I've provided suggestions to pass the already-loaded agent object to the new method. This will make the implementation more efficient.

Comment on lines 553 to 575
def _get_effective_modalities(
self, app_name: str, requested_modalities: List[str]
) -> List[str]:
"""Determines effective modalities, forcing AUDIO for native-audio models.
Native-audio models only support AUDIO modality. This method detects
native-audio models by checking if the model name contains "native-audio"
and forces AUDIO modality for those models.
Args:
app_name: The name of the application/agent.
requested_modalities: The modalities requested by the client.
Returns:
The effective modalities to use.
"""
agent_or_app = self.agent_loader.load_agent(app_name)
root_agent = self._get_root_agent(agent_or_app)
model = getattr(root_agent, "model", None)
model_name = model if isinstance(model, str) else ""
if "native-audio" in model_name:
return ["AUDIO"]
return requested_modalities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve efficiency, this method can be refactored to accept a root_agent: BaseAgent object directly instead of app_name: str. This avoids reloading the agent, as the root_agent is already available in the run_agent_live function after get_runner_async is called. Reloading the agent can be an expensive operation.

  def _get_effective_modalities(
      self, root_agent: BaseAgent, requested_modalities: List[str]
  ) -> List[str]:
    """Determines effective modalities, forcing AUDIO for native-audio models.

    Native-audio models only support AUDIO modality. This method detects
    native-audio models by checking if the model name contains "native-audio"
    and forces AUDIO modality for those models.

    Args:
      root_agent: The root agent of the application.
      requested_modalities: The modalities requested by the client.

    Returns:
      The effective modalities to use.
    """
    model = getattr(root_agent, "model", None)
    model_name = model if isinstance(model, str) else ""
    if "native-audio" in model_name:
      return ["AUDIO"]
    return requested_modalities

Comment on lines 1679 to 1682
effective_modalities = self._get_effective_modalities(
app_name, modalities
)
run_config = RunConfig(response_modalities=effective_modalities)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Following the suggested refactoring of _get_effective_modalities to accept a root_agent object, you should update this call site to pass runner.app.root_agent. This avoids loading the agent twice.

Note that you will also need to update the corresponding unit tests for _get_effective_modalities to reflect this change.

        effective_modalities = self._get_effective_modalities(
            runner.app.root_agent, modalities
        )
        run_config = RunConfig(response_modalities=effective_modalities)

@AryanBagade
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and fixes an issue where native-audio models could be used with incorrect modalities in /run_live. The approach of adding a _get_effective_modalities helper function is clean and isolates the logic well. The accompanying unit tests are thorough and cover the intended fix for various scenarios.

My main feedback is on the implementation of _get_effective_modalities, which doesn't fully handle all possible types for the model attribute on an agent, potentially leading to the fix not being applied in all cases. I've left a specific suggestion to make this more robust.

Overall, this is a good fix that improves the robustness of the web server.

Comment on lines +569 to +570
model = getattr(root_agent, "model", None)
model_name = model if isinstance(model, str) else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for extracting the model name only handles the case where the model attribute is a string. The LlmAgent.model attribute can also be a BaseLlm object, in which case isinstance(model, str) would be false, model_name would become an empty string, and the check for "native-audio" would fail.

To make this more robust, you should also handle the case where model is an object (like BaseLlm) that has a model string attribute. It would also be beneficial to add a test case for an LlmAgent initialized with a BaseLlm object to ensure full coverage.

Suggested change
model = getattr(root_agent, "model", None)
model_name = model if isinstance(model, str) else ""
model = getattr(root_agent, "model", None)
model_name = ""
if isinstance(model, str):
model_name = model
elif hasattr(model, "model") and isinstance(getattr(model, "model"), str):
model_name = getattr(model, "model")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

live [Component] This issue is related to live, voice and video chat

Projects

None yet

Development

Successfully merging this pull request may close these issues.

adk web: Native-audio models fail when TEXT modality is requested

2 participants