Skip to content

[agentserver-responses] Spec compliance: error shapes, eager eviction, chat isolation, storage logging, diagnostic logging#46364

Open
RaviPidaparthi wants to merge 13 commits intomainfrom
feature/spec-compliance-error-shapes
Open

[agentserver-responses] Spec compliance: error shapes, eager eviction, chat isolation, storage logging, diagnostic logging#46364
RaviPidaparthi wants to merge 13 commits intomainfrom
feature/spec-compliance-error-shapes

Conversation

@RaviPidaparthi
Copy link
Copy Markdown
Member

@RaviPidaparthi RaviPidaparthi commented Apr 17, 2026

What

Spec compliance improvements for azure-ai-agentserver-responses v1.0.0b2.

Changes

Error shapes

  • Error code field now uses spec-compliant values: "invalid_request_error" for 400/404, "server_error" for 500.
  • Deleted-resource errors return HTTP 404 (was 400).
  • Cancel terminal-state message updated to "Cannot cancel a response in terminal state.".
  • SSE replay rejection messages use spec-compliant wording.
  • Foundry storage errors explicitly caught and mapped to appropriate HTTP status codes.

Eager eviction

  • Terminal responses (completed, failed, cancelled, incomplete) are immediately evicted from in-memory runtime state after persistence.
  • Subsequent GET/DELETE/Cancel operations fall through to the provider (storage) path.
  • store=false responses are also evicted (nothing to fall back to → 404).
  • try_evict() and mark_deleted() on _RuntimeState.

Chat isolation enforcement

  • When a response is created with x-agent-chat-isolation-key, all subsequent operations must include the same key.
  • Mismatched or missing keys return an indistinguishable 404.

Malformed ID validation

  • All endpoints reject malformed response IDs with HTTP 400 before touching storage.

Storage logging

  • FoundryStorageLoggingPolicy — Azure Core pipeline policy for Foundry HTTP call logging.

Diagnostic logging

  • InboundRequestLoggingMiddleware — pure-ASGI middleware logging every inbound HTTP request (method, path, status, duration, correlation headers, OTel trace ID). Status >= 400 → WARNING. Query strings excluded.
  • Handler-level INFO logs at all 5 endpoints (create, get, delete, cancel, input_items) with response ID, status, output count.
  • Orchestrator handler invocation log with handler function name and response ID.

Tests

  • 846 tests pass, 1 skipped.
  • New test files: test_chat_isolation_enforcement.py, test_malformed_id_validation.py, test_eager_eviction.py, test_inbound_request_logging.py.

Align error payloads with the container-spec behaviour contract:

Error code compliance:
- error.code uses 'invalid_request_error' for 400/404 (was 'invalid_request',
  'not_found', 'invalid_mode')
- error.code uses 'server_error' for 500 (was 'internal_error')
- RequestValidationError default code updated to 'invalid_request_error'

Post-delete behaviour (spec alignment with .NET PR #58252):
- GET, input_items, and second DELETE on deleted responses now return 404
  (was 400)
- deleted_response() factory now delegates to not_found_response()

Cancel/SSE message alignment:
- Cancel incomplete: 'Cannot cancel a response in terminal state.'
  (was 'Cannot cancel an incomplete response.')
- SSE replay non-bg: 'This response cannot be streamed because it was not
  created with background=true.'
- SSE replay non-stream: '...stream=true.'

Storage error propagation:
- FoundryStorageError subclasses now explicitly caught in GET, cancel, and
  input_items handlers instead of being swallowed by broad except clauses
- FoundryResourceNotFoundError -> 404, FoundryBadRequestError -> 400,
  FoundryApiError -> error_response (500)

Storage call logging:
- FoundryStorageLoggingPolicy: per-retry pipeline policy logging method, URI,
  status code, duration (ms), and correlation headers at the
  azure.ai.agentserver logger
- Replaces built-in HttpLoggingPolicy to avoid double-logging

Tests:
- Added error.code assertions to all existing error tests across
  cancel, delete, get, create, and input_items endpoint tests
- Updated post-delete tests from expecting 400 to 404
- Added new tests: SSE replay unknown ID, 404 message contains ID,
  500 error body shape, SSE replay message variants
- Added FoundryStorageLoggingPolicy unit tests (4 tests)
- 791 tests passing

Version bumped to 1.0.0b2.
Copilot AI review requested due to automatic review settings April 17, 2026 03:39
@github-actions github-actions bot added the Hosted Agents sdk/agentserver/* label Apr 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns azure-ai-agentserver-responses behavior and error payload shapes with the container-spec contract (mirroring the .NET implementation), including post-delete 404 semantics and improved Foundry storage observability.

Changes:

  • Standardizes error.code values (invalid_request_error for 400/404, server_error for 500) and updates related endpoint behaviors/messages.
  • Changes post-delete behavior so GET/input_items/second DELETE return HTTP 404 and routes deleted responses through not_found_response().
  • Introduces FoundryStorageLoggingPolicy and wires it into the Foundry storage pipeline; expands handler mapping for Foundry storage errors.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/agentserver/azure-ai-agentserver-responses/tests/unit/test_foundry_logging_policy.py Adds unit coverage for the new Foundry storage logging policy.
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_input_items_endpoint.py Updates error envelope assertions and post-delete expectations (400 → 404).
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_get_endpoint.py Adds/updates 404 error-shape and SSE replay rejection message assertions.
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_delete_endpoint.py Updates delete-related error-code assertions and post-delete GET expectation (400 → 404).
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_cross_api_e2e.py Adds assertions for spec-aligned cancel error message/code.
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_create_endpoint.py Adds error.code assertions and validates 500 error envelope shape.
sdk/agentserver/azure-ai-agentserver-responses/tests/contract/test_cancel_endpoint.py Extends test helper to assert error.code and updates expected messages.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/store/_foundry_provider.py Replaces built-in HTTP logging policy with FoundryStorageLoggingPolicy in the async pipeline.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/store/_foundry_logging_policy.py Adds the custom per-retry logging policy implementation.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/models/errors.py Updates RequestValidationError default code.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/hosting/_validation.py Updates error code mapping and makes deleted responses return 404 via not_found_response().
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/hosting/_endpoint_handler.py Updates server error code shape, SSE replay rejection messages, and expands Foundry error handling.
sdk/agentserver/azure-ai-agentserver-responses/azure/ai/agentserver/responses/_version.py Bumps package version to 1.0.0b2.
sdk/agentserver/azure-ai-agentserver-responses/CHANGELOG.md Documents breaking changes and new logging policy for 1.0.0b2.

Comment thread sdk/agentserver/azure-ai-agentserver-responses/CHANGELOG.md Outdated
Chat isolation key enforcement:
- Store chat_isolation_key on ResponseExecution and _RuntimeState
- Enforce key matching on GET, DELETE, Cancel, and InputItems endpoints
- Mismatched/missing keys return indistinguishable 404
- Backward-compatible: no enforcement when created without a key

Malformed ID validation:
- All endpoints reject malformed response_id path params (wrong prefix,
  too short) with 400 before touching storage
- previous_response_id in POST body also validated
- Update existing tests using fake IDs to use well-formed IdGenerator IDs

14 chat isolation tests + 19 malformed ID tests (33 new, 824 total)
@RaviPidaparthi RaviPidaparthi changed the title [agentserver-responses] Spec compliance: error shapes, post-delete 404, storage logging [agentserver-responses] Spec compliance: error shapes, chat isolation, malformed ID validation, storage logging Apr 17, 2026
Port eager eviction from .NET PR #58252. After a response reaches a
terminal state (completed, failed, cancelled, incomplete), the in-memory
record is removed from RuntimeState so that subsequent GET, DELETE,
Cancel, and SSE replay requests fall through to the durable storage
provider.

Key changes:
- RuntimeState.try_evict(): removes terminal records while preserving
  chat isolation keys for provider-fallback enforcement
- RuntimeState.mark_deleted(): supports DELETE provider fallback
- Eviction wired into all 5 orchestrator terminal paths
  (bg non-stream, sync, bg+stream Path A, non-bg stream Path B, cancel)
- Provider fallback paths added to handle_get, handle_delete,
  handle_cancel for evicted responses
- B1 background check in cancel provider fallback (matches .NET)
- Cancel idempotency: cancelled responses return 200 via provider
- B2 stream/background checks in SSE replay provider fallback
- background + stream mode flags stamped on all persisted responses
- SSE events saved for replay after eviction (including fallback events)
- store=false cancel returns 404 (matching .NET)
- SSE datetime serialization fix in _build_sse_frame
- 9 new eager eviction unit tests
…sted responses

The stream flag is not part of the ResponseObject contract and should
not be persisted.  After eager eviction, the server cannot distinguish
bg+non-stream from bg+stream-with-expired-TTL, so the SSE replay
fallback now uses a combined error message matching .NET's
SseReplayResult:

  'This response cannot be streamed because it was not created with
   stream=true or the stream TTL has expired.'

Added TODO documenting the deliberate spec violation — the container
spec prescribes distinct error messages but the provider doesn't carry
enough context to distinguish the two cases.
@RaviPidaparthi RaviPidaparthi changed the title [agentserver-responses] Spec compliance: error shapes, chat isolation, malformed ID validation, storage logging [agentserver-responses] Spec compliance: error shapes, eager eviction, chat isolation, storage logging Apr 17, 2026
…andler diagnostic logging

- Add InboundRequestLoggingMiddleware (pure ASGI): logs method, path, status,
  duration, correlation headers (x-request-id, x-ms-client-request-id), and
  OTel trace ID. Status >= 400 → WARNING; exceptions → forced 500 WARNING.
  Query strings are excluded from logs.
- Add INFO-level handler diagnostic logs to all 5 endpoints: create (params),
  get (entry + retrieval), delete (entry + success), cancel (entry + success),
  input_items (entry).
- Add orchestrator handler invocation log with handler function name.
- Wire middleware in ResponsesAgentServerHost via add_middleware().
- 13 new contract tests for middleware and handler logging.
- Update CHANGELOG.md with logging features.

Matches .NET PR #58274 (InboundRequestLoggingMiddleware + handler logging).
@RaviPidaparthi RaviPidaparthi changed the title [agentserver-responses] Spec compliance: error shapes, eager eviction, chat isolation, storage logging [agentserver-responses] Spec compliance: error shapes, eager eviction, chat isolation, storage logging, diagnostic logging Apr 17, 2026
return None


class InboundRequestLoggingMiddleware:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to core?

The azure-ai-agentserver-core package was bumped to 2.0.0b1 but the
githubcopilot package still had the old <1.0.0b18 upper bound, causing
the Analyze dependencies CI gate to fail.
Moves the middleware from azure-ai-agentserver-responses to
azure-ai-agentserver-core so all protocol hosts get consistent
inbound request logging automatically.

- Created _middleware.py in core with the middleware class
- Wired into AgentServerHost.__init__ middleware list
- Exported from core __init__.py
- Removed explicit add_middleware() call from ResponsesAgentServerHost
- Updated CHANGELOG to reflect the move

Addresses review feedback from @ankitbko.
- core 2.0.0b2: Added InboundRequestLoggingMiddleware, CHANGELOG updated
- invocations 1.0.0b2: Core dep bumped to >=2.0.0b2, CHANGELOG updated
- responses: Core dep bumped to >=2.0.0b2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hosted Agents sdk/agentserver/*

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants