-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fix: Force AUDIO modality for native-audio models in /run_live (#4206) #4232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Force AUDIO modality for native-audio models in /run_live (#4206) #4232
Conversation
…e#4206) Signed-off-by: Aryan Bagade <[email protected]>
Summary of ChangesHello @AryanBagade, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where specific 'native-audio' models were not correctly handling modality requests in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly forces the AUDIO modality for native-audio models in the /run_live endpoint. The implementation is sound and includes a safe check for the model attribute. The new unit tests cover the different cases for native-audio, non-native, and agents without a model attribute. My main feedback is regarding the structure of the tests, which currently test implementation details and involve code duplication. I've suggested a refactoring that would improve test quality and maintainability.
| def test_native_audio_model_forces_audio_modality(): | ||
| """Test that native-audio models force AUDIO modality in run_live endpoint.""" | ||
| from google.adk.cli.adk_web_server import AdkWebServer | ||
|
|
||
| # Create an LlmAgent with a native-audio model | ||
| native_audio_agent = LlmAgent( | ||
| name="native_audio_agent", | ||
| model="gemini-live-2.5-flash-native-audio", | ||
| ) | ||
|
|
||
| class NativeAudioAgentLoader: | ||
|
|
||
| def load_agent(self, app_name): | ||
| return native_audio_agent | ||
|
|
||
| def list_agents(self): | ||
| return ["test_native_audio_app"] | ||
|
|
||
| session_service = AsyncMock() | ||
| session = Session( | ||
| id="session_id", | ||
| app_name="test_native_audio_app", | ||
| user_id="user", | ||
| state={}, | ||
| events=[], | ||
| ) | ||
| session_service.get_session.return_value = session | ||
|
|
||
| adk_web_server = AdkWebServer( | ||
| agent_loader=NativeAudioAgentLoader(), | ||
| session_service=session_service, | ||
| memory_service=MagicMock(), | ||
| artifact_service=MagicMock(), | ||
| credential_service=MagicMock(), | ||
| eval_sets_manager=MagicMock(), | ||
| eval_set_results_manager=MagicMock(), | ||
| agents_dir=".", | ||
| ) | ||
|
|
||
| # Verify the _get_root_agent method works correctly | ||
| root_agent = adk_web_server._get_root_agent(native_audio_agent) | ||
| assert root_agent.model == "gemini-live-2.5-flash-native-audio" | ||
| assert "native-audio" in root_agent.model | ||
|
|
||
| # Verify the model name detection logic | ||
| model_name = ( | ||
| root_agent.model | ||
| if hasattr(root_agent, "model") and isinstance(root_agent.model, str) | ||
| else "" | ||
| ) | ||
| assert "native-audio" in model_name | ||
|
|
||
| # Test with App wrapping the agent | ||
| app_agent = App(name="test_app", root_agent=native_audio_agent) | ||
| root_agent_from_app = adk_web_server._get_root_agent(app_agent) | ||
| assert root_agent_from_app.model == "gemini-live-2.5-flash-native-audio" | ||
|
|
||
|
|
||
| def test_non_native_audio_model_keeps_requested_modality(): | ||
| """Test that non-native-audio models keep the requested modality.""" | ||
| from google.adk.cli.adk_web_server import AdkWebServer | ||
|
|
||
| # Create an LlmAgent with a regular model (not native-audio) | ||
| regular_agent = LlmAgent( | ||
| name="regular_agent", | ||
| model="gemini-2.5-flash", | ||
| ) | ||
|
|
||
| class RegularAgentLoader: | ||
|
|
||
| def load_agent(self, app_name): | ||
| return regular_agent | ||
|
|
||
| def list_agents(self): | ||
| return ["test_regular_app"] | ||
|
|
||
| adk_web_server = AdkWebServer( | ||
| agent_loader=RegularAgentLoader(), | ||
| session_service=MagicMock(), | ||
| memory_service=MagicMock(), | ||
| artifact_service=MagicMock(), | ||
| credential_service=MagicMock(), | ||
| eval_sets_manager=MagicMock(), | ||
| eval_set_results_manager=MagicMock(), | ||
| agents_dir=".", | ||
| ) | ||
|
|
||
| root_agent = adk_web_server._get_root_agent(regular_agent) | ||
| model_name = ( | ||
| root_agent.model | ||
| if hasattr(root_agent, "model") and isinstance(root_agent.model, str) | ||
| else "" | ||
| ) | ||
|
|
||
| # For regular models, the modality should NOT be forced to AUDIO | ||
| assert "native-audio" not in model_name | ||
|
|
||
|
|
||
| def test_agent_without_model_attribute(): | ||
| """Test handling of agents without model attribute (BaseAgent).""" | ||
| from google.adk.cli.adk_web_server import AdkWebServer | ||
|
|
||
| # Create a BaseAgent (which doesn't have a model attribute) | ||
| base_agent = DummyAgent(name="base_agent") | ||
|
|
||
| class BaseAgentLoader: | ||
|
|
||
| def load_agent(self, app_name): | ||
| return base_agent | ||
|
|
||
| def list_agents(self): | ||
| return ["test_base_app"] | ||
|
|
||
| adk_web_server = AdkWebServer( | ||
| agent_loader=BaseAgentLoader(), | ||
| session_service=MagicMock(), | ||
| memory_service=MagicMock(), | ||
| artifact_service=MagicMock(), | ||
| credential_service=MagicMock(), | ||
| eval_sets_manager=MagicMock(), | ||
| eval_set_results_manager=MagicMock(), | ||
| agents_dir=".", | ||
| ) | ||
|
|
||
| root_agent = adk_web_server._get_root_agent(base_agent) | ||
|
|
||
| # BaseAgent (DummyAgent) doesn't have a model attribute | ||
| model_name = ( | ||
| root_agent.model | ||
| if hasattr(root_agent, "model") and isinstance(root_agent.model, str) | ||
| else "" | ||
| ) | ||
|
|
||
| # Should default to empty string when no model attribute | ||
| assert model_name == "" | ||
| assert "native-audio" not in model_name | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new tests are a good start, but they have a couple of issues that make them less effective and harder to maintain:
-
They don't test the actual behavior. For example,
test_native_audio_model_forces_audio_modalityasserts that the model name containsnative-audio, but it doesn't verify that theresponse_modalitiesare actually forced to['AUDIO']. The tests are verifying implementation details rather than the behavior described by their names. -
There is significant code duplication. The logic to extract the
model_nameis copied fromadk_web_server.pyand then duplicated across all three new tests. This makes the tests brittle; if the implementation inadk_web_server.pychanges, these tests will need to be updated in multiple places, or worse, they might pass while the actual logic is broken.
A better approach would be to refactor the logic for determining the effective_modalities in adk_web_server.py into its own helper method. This would make the logic in run_agent_live cleaner and allow you to test the behavior directly and without duplication.
For example, you could add a method to AdkWebServer (this would be outside the current diff, in adk_web_server.py):
def _get_effective_modalities(self, app_name: str, requested_modalities: list[str]) -> list[str]:
"""Determines the effective modalities, forcing AUDIO for native-audio models."""
agent_or_app = self.agent_loader.load_agent(app_name)
root_agent = self._get_root_agent(agent_or_app)
model = getattr(root_agent, "model", None)
model_name = model if isinstance(model, str) else ""
if "native-audio" in model_name:
return ["AUDIO"]
return requested_modalitiesThen your tests could be simplified to something like this, which is much more direct and robust:
def test_native_audio_model_forces_audio_modality():
# ... setup adk_web_server with NativeAudioAgentLoader ...
modalities = adk_web_server._get_effective_modalities("test_app", ["TEXT"])
assert modalities == ["AUDIO"]
def test_non_native_audio_model_keeps_requested_modality():
# ... setup adk_web_server with RegularAgentLoader ...
modalities = adk_web_server._get_effective_modalities("test_app", ["TEXT"])
assert modalities == ["TEXT"]
def test_agent_without_model_attribute():
# ... setup adk_web_server with BaseAgentLoader ...
modalities = adk_web_server._get_effective_modalities("test_app", ["TEXT"])
assert modalities == ["TEXT"]This would make the tests much stronger and easier to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, will fix this!
Signed-off-by: Aryan Bagade <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly forces the AUDIO modality for native-audio models by adding a check in _get_effective_modalities. The implementation is sound and includes comprehensive unit tests for various scenarios.
My main feedback is to refactor _get_effective_modalities to improve performance by avoiding redundant agent loading. The agent is currently loaded twice per request in run_agent_live. I've provided suggestions to pass the already-loaded agent object to the new method. This will make the implementation more efficient.
| def _get_effective_modalities( | ||
| self, app_name: str, requested_modalities: List[str] | ||
| ) -> List[str]: | ||
| """Determines effective modalities, forcing AUDIO for native-audio models. | ||
| Native-audio models only support AUDIO modality. This method detects | ||
| native-audio models by checking if the model name contains "native-audio" | ||
| and forces AUDIO modality for those models. | ||
| Args: | ||
| app_name: The name of the application/agent. | ||
| requested_modalities: The modalities requested by the client. | ||
| Returns: | ||
| The effective modalities to use. | ||
| """ | ||
| agent_or_app = self.agent_loader.load_agent(app_name) | ||
| root_agent = self._get_root_agent(agent_or_app) | ||
| model = getattr(root_agent, "model", None) | ||
| model_name = model if isinstance(model, str) else "" | ||
| if "native-audio" in model_name: | ||
| return ["AUDIO"] | ||
| return requested_modalities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve efficiency, this method can be refactored to accept a root_agent: BaseAgent object directly instead of app_name: str. This avoids reloading the agent, as the root_agent is already available in the run_agent_live function after get_runner_async is called. Reloading the agent can be an expensive operation.
def _get_effective_modalities(
self, root_agent: BaseAgent, requested_modalities: List[str]
) -> List[str]:
"""Determines effective modalities, forcing AUDIO for native-audio models.
Native-audio models only support AUDIO modality. This method detects
native-audio models by checking if the model name contains "native-audio"
and forces AUDIO modality for those models.
Args:
root_agent: The root agent of the application.
requested_modalities: The modalities requested by the client.
Returns:
The effective modalities to use.
"""
model = getattr(root_agent, "model", None)
model_name = model if isinstance(model, str) else ""
if "native-audio" in model_name:
return ["AUDIO"]
return requested_modalities| effective_modalities = self._get_effective_modalities( | ||
| app_name, modalities | ||
| ) | ||
| run_config = RunConfig(response_modalities=effective_modalities) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the suggested refactoring of _get_effective_modalities to accept a root_agent object, you should update this call site to pass runner.app.root_agent. This avoids loading the agent twice.
Note that you will also need to update the corresponding unit tests for _get_effective_modalities to reflect this change.
effective_modalities = self._get_effective_modalities(
runner.app.root_agent, modalities
)
run_config = RunConfig(response_modalities=effective_modalities)Signed-off-by: Aryan Bagade <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly identifies and fixes an issue where native-audio models could be used with incorrect modalities in /run_live. The approach of adding a _get_effective_modalities helper function is clean and isolates the logic well. The accompanying unit tests are thorough and cover the intended fix for various scenarios.
My main feedback is on the implementation of _get_effective_modalities, which doesn't fully handle all possible types for the model attribute on an agent, potentially leading to the fix not being applied in all cases. I've left a specific suggestion to make this more robust.
Overall, this is a good fix that improves the robustness of the web server.
| model = getattr(root_agent, "model", None) | ||
| model_name = model if isinstance(model, str) else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic for extracting the model name only handles the case where the model attribute is a string. The LlmAgent.model attribute can also be a BaseLlm object, in which case isinstance(model, str) would be false, model_name would become an empty string, and the check for "native-audio" would fail.
To make this more robust, you should also handle the case where model is an object (like BaseLlm) that has a model string attribute. It would also be beneficial to add a test case for an LlmAgent initialized with a BaseLlm object to ensure full coverage.
| model = getattr(root_agent, "model", None) | |
| model_name = model if isinstance(model, str) else "" | |
| model = getattr(root_agent, "model", None) | |
| model_name = "" | |
| if isinstance(model, str): | |
| model_name = model | |
| elif hasattr(model, "model") and isinstance(getattr(model, "model"), str): | |
| model_name = getattr(model, "model") |
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
Testing Plan
Unit Tests:
pytest results:
35 passed, 71 warnings in 3.90s
Added 3 new tests:
test_native_audio_model_forces_audio_modality- Verifies native-audio models force AUDIO modalitytest_non_native_audio_model_keeps_requested_modality- Verifies regular models keep requested modalitytest_agent_without_model_attribute- Verifies BaseAgent (no model attr) doesn't crashManual End-to-End (E2E) Tests:
To test manually:
model="gemini-live-2.5-flash-native-audio"/run_liveWebSocket withmodalities=["TEXT"]["AUDIO"]instead of failingFull E2E testing was not possible as it requires access to
gemini-live-2.5-flash-native-audiomodel which is not publicly available.Checklist
Additional context
Added
hasattr()check to safely handleBaseAgentinstances that don't have amodelattribute (improvement over issue's proposed solution).