feat: Tabular SK analysis — pagination, auto-trim, return_columns forwarding, and 100 K handoff limit (#893) by vivche · Pull Request #894 · microsoft/simplechat

vivche · 2026-05-08T18:47:04Z

Description

Summary

This PR delivers five enhancements to the Tabular Semantic Kernel analysis pipeline that together prevent silent data truncation on large Excel / CSV files and give the LLM finer control over what data it retrieves per tool call.

Change 1 — Pagination (`start_row` / `max_rows` / `has_more` / `next_start_row`)

Every analysis tool in TabularProcessingPlugin now accepts:

Parameter	Type	Default	Description
`start_row`	`str`	`"0"`	Zero-based first row to return
`max_rows`	`str`	`"100"`	Maximum rows to return in one call

Every tool response now includes:

Field	Description
`has_more`	`true` when rows were truncated by the page limit
`next_start_row`	Value to pass as `start_row` on the next call
`total_matched`	Total rows that matched the filter (when computable)

The LLM is instructed (via the @kernel_function docstrings) to call the same tool again with start_row = next_start_row when has_more is true.

Change 2 — Auto-trim (`_auto_trim_df_for_output`)

A new private helper estimates the serialised JSON size from a 20-row sample. When the estimated output would exceed the 50 K-char per-tool budget and return_columns was not supplied by the caller:

Phase 1 — Column drop: columns are dropped heaviest-first (by average serialised length) until the estimate fits within budget, preserving at least one column.
Phase 2 — Row truncation: if the estimate is still over budget after all but one column is retained, rows are truncated so that has_more = True fires and pagination picks up the remainder.

A trimmed_columns_excluded field in the tool response lists any dropped columns so the LLM knows what was omitted.

Change 3 — `return_columns` forwarding

The existing return_columns parameter was already accepted by some tools but not threaded through to the DataFrame slice. This PR ensures it is applied consistently across all eleven analysis tools and that _auto_trim_df_for_output is skipped whenever return_columns is supplied (the caller has already projected to the columns they need).

Change 4 — Handoff limit raised to 100 K

max_handoff_chars in route_backend_chats.py was raised from 24_000 to 100_000. A WARNING-level log_event is emitted when truncation still occurs, including the original and truncated lengths, so operators can detect and investigate cases where even 100 K is insufficient.

Change 5 — Elapsed timing logging

Each SK tabular analysis invocation now logs the wall-clock elapsed time at INFO level, making it straightforward to correlate slow responses with large files or high row counts.

Files Changed

File	Change
`application/single_app/semantic_kernel_plugins/tabular_processing_plugin.py`	Pagination params & response fields on all 11 analysis tools; `_auto_trim_df_for_output`; `return_columns` forwarding
`application/single_app/route_backend_chats.py`	`max_handoff_chars` 24 K → 100 K; truncation `WARNING` log; elapsed timing
`docs/explanation/feature/v0.241.008/TABULAR_SK_PAGINATION_AUTOTRIM_AND_HANDOFF_TRUNCATION.md`	Fix documentation

Testing

Tested with a Excel file that has over 1,000 rows using gpt-5.1 models.
Pagination confirmed: second filter_rows call with start_row = next_start_row returned the correct continuation page.
Auto-trim confirmed: wide freetext columns were dropped on oversized results; trimmed_columns_excluded listed the omitted columns in the response.
return_columns confirmed: tool response contained only the requested columns with no auto-trim applied.
Handoff limit confirmed: analysis results previously truncated at 24 K now pass through at up to 100 K; truncation WARNING fires only for pathological cases.
Elapsed timing log lines confirmed in dev server output.

…ntent - Rename all spec files to use sample_ prefix for consistency - Convert asset spec files from .json to .yaml format - Sync spec content (security, components, paths) with Cosmos DB source of truth - Replace hardcoded dev222288 instance URL with YOUR-INSTANCE placeholder - Add missing sysparm_input_display_value param to incident update operation - Expand updateIncident requestBody with full field set and descriptions - Alphabetize Incident and Asset schema properties across all specs - Update SERVICENOW_ASSET_MANAGEMENT_SETUP.md links to new filenames/paths

…agent prompts - Replace passive 'extract the INSTANCE part' guidance with explicit step-by-step instructions for deriving the real instance subdomain from the plugin base URL - Add SELF-CHECK rule: never output 'INSTANCE', 'YOUR-INSTANCE', or 'yourinstance' as placeholder text in any displayed URL - Rewrite HOW TO EXTRACT section to use generic [instance-name] pattern (template-friendly) with dev222288 kept only as a concrete example - Apply consistent wording to both servicenow_agent_instructions.txt and servicenow_kb_management_agent_instructions.txt

… stats queries - Switch all stats date filters from sys_created_on to opened_at to match portal counts - Add YEAR QUERY RULE: named year always uses YYYY-MM-DD range, never 'This year' - Add two-call pattern for breakdowns: grand total + grouped (handles null-category records) - Replace ASCII bar chart with markdown table (avoids rendering as solid black boxes) - Add (No category) row handling for blank groupby_value responses - Enforce exact table structure: single Total row, no split totals - Fix text search to always query both short_description AND description fields - Update OpenAPI spec to match: opened_at for stats, dual-field text search, sysparm_count required=true - Add FIELD RULE explaining opened_at vs sys_created_on distinction

Fix 1: run_tabular_sk_analysis always used the default Azure OpenAI endpoint from app settings, causing DeploymentNotFound 404 when the active chat model was resolved via the multi-endpoint feature. Added gpt_endpoint, gpt_api_version_override, gpt_auth, and gpt_provider override parameters to run_tabular_sk_analysis and run_tabular_analysis_with_multi_file_support, and updated all four call sites to pass multi-endpoint credentials when active. Fix 2: Semantic Kernel's KernelArguments type-coercion on Python 3.13 raises FunctionExecutionException for Optional[str] parameters typed as 'typing.Optional[str]'. Updated all Optional[str] kernel function parameters in tabular_processing_plugin.py to use 'str | None' union syntax which is natively supported in Python 3.13.

…3_OPTIONAL_FIX.md Co-authored-by: Copilot <copilot@github.com>

Co-authored-by: Copilot <copilot@github.com>

…velopment into fix/tabular-sk-multiendpoint-py313-optional

…rd, elapsed logging (v0.241.009) - Add _auto_trim_df_for_output(max_chars=50K): two-phase trim (drop heavy cols, truncate rows) - _filter_rows_across_sheets: add return_columns/start_row, remove capacity cap, add note/has_more fields - _query_tabular_data_across_sheets: same pattern as filter_rows cross-sheet - filter_rows public: add return_columns + start_row params, single-sheet pagination + auto-trim - query_tabular_data public: same as filter_rows - route_backend_chats: import time, _analysis_start_time, per-attempt elapsed, 20K->100K truncation guard, total_elapsed_seconds at exit points

…241.009)

Chen, Vivien and others added 14 commits March 4, 2026 23:28

Checked in the latest enhanced agent instructions

602d8c3

Merge remote-tracking branch 'upstream/Development' into Development

4a6f8b8

Merge remote-tracking branch 'upstream/Development' into Development

0601395

Added issue number reference in the TABULAR_SK_MULTIENDPOINT_AND_PY31…

5bc5ac9

…3_OPTIONAL_FIX.md Co-authored-by: Copilot <copilot@github.com>

Corrected the version number in both readme as well the config.py

e811ca0

Co-authored-by: Copilot <copilot@github.com>

chore: remove CSV direct-format fallback from tabular SK analysis

5049862

Merge branch 'Development' of https://github.com/vivche/simplechat-de…

29069f2

…velopment into fix/tabular-sk-multiendpoint-py313-optional

fix: raise tabular handoff truncation limit to 100K and add docs (v0.…

f25f398

…241.009)

Moved the readme file to feature folder

3806eab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Tabular SK analysis — pagination, auto-trim, return_columns forwarding, and 100 K handoff limit (#893)#894

feat: Tabular SK analysis — pagination, auto-trim, return_columns forwarding, and 100 K handoff limit (#893)#894
vivche wants to merge 14 commits intomicrosoft:Developmentfrom
vivche:feature/tabular-sk-enhancements-ui

vivche commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vivche commented May 8, 2026

Description

Summary

Change 1 — Pagination (start_row / max_rows / has_more / next_start_row)

Change 2 — Auto-trim (_auto_trim_df_for_output)

Change 3 — return_columns forwarding

Change 4 — Handoff limit raised to 100 K

Change 5 — Elapsed timing logging

Files Changed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Change 1 — Pagination (`start_row` / `max_rows` / `has_more` / `next_start_row`)

Change 2 — Auto-trim (`_auto_trim_df_for_output`)

Change 3 — `return_columns` forwarding