From e325b5dbdceb06a52e076cc270f202f563cf2838 Mon Sep 17 00:00:00 2001 From: "google-labs-jules[bot]" <161369871+google-labs-jules[bot]@users.noreply.github.com> Date: Wed, 25 Mar 2026 15:08:28 +0000 Subject: [PATCH] docs: replace dynamic sitemap with static sitemap and enrich docstrings Removes the sphinx_sitemap extension and its configuration in docs/conf.py. Adds a static docs/sitemap.xml with the core URLs requested for Google Search indexing, and copies it to the root using html_extra_path. Also enriches index.rst, user_guide/index.rst, reference/index.rst, bigframes/pandas/__init__.py, bigframes/bigquery/__init__.py, and bigframes/bigquery/ai.py with targeted keywords emphasizing use-cases for data scientists, data engineers, and data analysts. This addresses potential "thin" content concerns by making the pages more informative and relevant. Co-authored-by: tswast <247555+tswast@users.noreply.github.com> --- bigframes/bigquery/__init__.py | 28 ++++++++++++++-------------- bigframes/bigquery/ai.py | 34 ++++++++++++++++++---------------- bigframes/pandas/__init__.py | 28 +++++++++++++++------------- docs/conf.py | 12 +----------- docs/index.rst | 12 ++++++------ docs/reference/index.rst | 14 +++++++++++--- docs/sitemap.xml | 21 +++++++++++++++++++++ docs/user_guide/index.rst | 4 ++++ 8 files changed, 90 insertions(+), 63 deletions(-) create mode 100644 docs/sitemap.xml diff --git a/bigframes/bigquery/__init__.py b/bigframes/bigquery/__init__.py index f083887045..64338f4b0c 100644 --- a/bigframes/bigquery/__init__.py +++ b/bigframes/bigquery/__init__.py @@ -16,30 +16,30 @@ Access BigQuery-specific operations and namespaces within BigQuery DataFrames. This module provides specialized functions and sub-modules that expose BigQuery's -advanced capabilities to DataFrames and Series. It acts as a bridge between the -pandas-compatible API and the full power of BigQuery SQL. +advanced analytics capabilities directly to DataFrames and Series. Designed for data scientists, +data engineers, and data analysts, it acts as a bridge between the intuitive +pandas-compatible API and the massive scale and power of BigQuery SQL. Key sub-modules include: -* :mod:`bigframes.bigquery.ai`: Generative and predictive AI functions (Gemini, BQML). -* :mod:`bigframes.bigquery.ml`: Direct access to BigQuery ML model operations. -* :mod:`bigframes.bigquery.obj`: Support for BigQuery object tables. +* :mod:`bigframes.bigquery.ai`: Generative and predictive AI functions (Gemini, LLMs, BQML) for AI developers and data scientists. +* :mod:`bigframes.bigquery.ml`: Direct access to BigQuery ML model operations for building scalable ML pipelines. +* :mod:`bigframes.bigquery.obj`: Support for BigQuery object tables, essential for handling unstructured data like images and PDFs. -This module also provides direct access to optimized BigQuery functions for: +This module also provides direct access to optimized BigQuery functions tailored for data engineering and advanced analytics workflows: * **JSON Processing:** High-performance functions like ``json_extract``, ``json_value``, - and ``parse_json`` for handling semi-structured data. + and ``parse_json`` for transforming semi-structured log data. * **Geospatial Analysis:** Comprehensive geographic functions such as ``st_area``, - ``st_distance``, and ``st_centroid`` (``ST_`` prefixed functions). + ``st_distance``, and ``st_centroid`` (``ST_`` prefixed functions) to unlock location-based insights. * **Array Operations:** Tools for working with BigQuery arrays, including ``array_agg`` - and ``array_length``. + and ``array_length``, handling nested repeated fields efficiently. * **Vector Search:** Integration with BigQuery's vector search and indexing - capabilities for high-dimensional data. -* **Custom SQL:** The ``sql_scalar`` function allows embedding raw SQL snippets for - advanced operations not yet directly mapped in the API. + capabilities for high-dimensional data, semantic search, and RAG architectures. +* **Custom SQL:** The ``sql_scalar`` function allows embedding raw SQL snippets, giving data engineers an escape hatch for complex, custom BigQuery operations. -By using these functions, you can leverage BigQuery's high-performance engine for -domain-specific tasks while maintaining a Python-centric development experience. +By using these functions, data professionals can leverage BigQuery's distributed compute engine for +domain-specific tasks at petabyte scale, while maintaining a productive Python-centric development experience. For the full list of BigQuery standard SQL functions, see: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-reference diff --git a/bigframes/bigquery/ai.py b/bigframes/bigquery/ai.py index 25a7df7781..4450a52062 100644 --- a/bigframes/bigquery/ai.py +++ b/bigframes/bigquery/ai.py @@ -15,28 +15,30 @@ """ Integrate BigQuery built-in AI functions into your BigQuery DataFrames workflow. -The ``bigframes.bigquery.ai`` module provides a Pythonic interface to leverage BigQuery ML's -generative AI and predictive functions directly on BigQuery DataFrames and Series objects. -These functions enable you to perform advanced AI tasks at scale without moving data -out of BigQuery. +The ``bigframes.bigquery.ai`` module provides a powerful, Pythonic interface for data scientists +and data engineers to leverage BigQuery ML's Generative AI, Large Language Models (LLMs), +and predictive functions directly on big data via BigQuery DataFrames and Series objects. +These functions enable AI developers to construct scalable MLOps pipelines and perform advanced AI +tasks—such as automated text generation and semantic search—without moving data out of BigQuery's +secure perimeter. -Key capabilities include: +Key capabilities for AI workflows include: -* **Generative AI:** Use :func:`bigframes.bigquery.ai.generate` (Gemini) to - perform text analysis, translation, or - content generation. Specialized versions like +* **Generative AI & LLMs (Gemini):** Use :func:`bigframes.bigquery.ai.generate` + to orchestrate Gemini models for text analysis, translation, summarization, or + content generation directly on big data. Specialized versions like :func:`~bigframes.bigquery.ai.generate_bool`, :func:`~bigframes.bigquery.ai.generate_int`, and :func:`~bigframes.bigquery.ai.generate_double` are available for structured - outputs. -* **Embeddings:** Generate vector embeddings for text using - :func:`~bigframes.bigquery.ai.generate_embedding`, which are essential for - semantic search and retrieval-augmented generation (RAG) workflows. -* **Classification and Scoring:** Apply machine learning models to your data for - predictive tasks with :func:`~bigframes.bigquery.ai.classify` and - :func:`~bigframes.bigquery.ai.score`. + outputs, perfect for data pipelines. +* **Embeddings & Semantic Search:** Generate vector embeddings for text using + :func:`~bigframes.bigquery.ai.generate_embedding`. Essential for modern data science, + enabling robust semantic search and Retrieval-Augmented Generation (RAG) architectures. +* **Classification and Scoring:** Apply robust machine learning models to your data for + predictive analytics with :func:`~bigframes.bigquery.ai.classify` and + :func:`~bigframes.bigquery.ai.score`, accelerating the time-to-insight for data analysts. * **Forecasting:** Predict future values in time-series data using - :func:`~bigframes.bigquery.ai.forecast`. + :func:`~bigframes.bigquery.ai.forecast` for advanced analytics and business intelligence. **Example usage:** diff --git a/bigframes/pandas/__init__.py b/bigframes/pandas/__init__.py index 4db900e776..7ddd05bdb8 100644 --- a/bigframes/pandas/__init__.py +++ b/bigframes/pandas/__init__.py @@ -17,24 +17,26 @@ **BigQuery DataFrames** provides a Pythonic DataFrame and machine learning (ML) API powered by the BigQuery engine. The ``bigframes.pandas`` module implements a large -subset of the pandas API, allowing you to perform large-scale data analysis -using familiar pandas syntax while the computations are executed in the cloud. +subset of the pandas API, allowing you to perform large-scale data analysis, +data engineering, and AI/ML workflows using familiar pandas syntax while the computations +are seamlessly executed in the cloud. -**Key Features:** +**Key Features for Data Scientists, Data Engineers, and Data Analysts:** -* **Petabyte-Scale Scalability:** Handle datasets that exceed local memory by - offloading computation to the BigQuery distributed engine. +* **Petabyte-Scale Scalability:** Handle huge datasets that exceed local memory limits by + offloading big data computation directly to the BigQuery distributed engine. * **Pandas Compatibility:** Use common pandas methods like :func:`~bigframes.pandas.DataFrame.groupby`, :func:`~bigframes.pandas.DataFrame.merge`, :func:`~bigframes.pandas.DataFrame.pivot_table`, and more on BigQuery-backed - :class:`~bigframes.pandas.DataFrame` objects. + :class:`~bigframes.pandas.DataFrame` objects without rewriting existing pandas pipelines. * **Direct BigQuery Integration:** Read from and write to BigQuery tables and queries with :func:`bigframes.pandas.read_gbq` and - :func:`bigframes.pandas.DataFrame.to_gbq`. -* **User-defined Functions (UDFs):** Effortlessly deploy Python functions - functions using the :func:`bigframes.pandas.remote_function` and - :func:`bigframes.pandas.udf` decorators. + :func:`bigframes.pandas.DataFrame.to_gbq`. Perfect for data engineers constructing scalable ETL pipelines. +* **Seamless AI and Machine Learning:** Rapidly train models or use Generative AI (like Gemini) directly on large datasets, reducing data movement and time-to-insight for data scientists. +* **User-defined Functions (UDFs):** Effortlessly deploy custom Python functions + using the :func:`bigframes.pandas.remote_function` and + :func:`bigframes.pandas.udf` decorators for custom business logic. * **Data Ingestion:** Support for various formats including CSV, Parquet, JSON, and Arrow via :func:`bigframes.pandas.read_csv`, :func:`bigframes.pandas.read_parquet`, etc., which are automatically uploaded @@ -66,9 +68,9 @@ >>> local_df = top_names.to_pandas() # doctest: +SKIP -BigQuery DataFrames is designed for data scientists and analysts who need the -power of BigQuery with the ease of use of pandas. It eliminates the "data -movement bottleneck" by keeping your data in BigQuery for processing. +BigQuery DataFrames is designed for data scientists, data engineers, and data analysts who need the +power of BigQuery's distributed compute with the ease of use of pandas. It eliminates the "data +movement bottleneck" by keeping your big data within BigQuery for secure, scalable processing. """ from __future__ import annotations diff --git a/docs/conf.py b/docs/conf.py index b518ac074f..fddfad2594 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -58,7 +58,6 @@ "sphinx.ext.napoleon", "sphinx.ext.todo", "sphinx.ext.viewcode", - "sphinx_sitemap", "myst_nb", ] @@ -199,7 +198,7 @@ # Add any extra paths that contain custom files (such as robots.txt or # .htaccess) here, relative to this directory. These files are copied # directly to the root of the documentation. -# html_extra_path = [] +html_extra_path = ["sitemap.xml"] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. @@ -259,15 +258,6 @@ # Output file base name for HTML help builder. htmlhelp_basename = "bigframes-doc" -# https://sphinx-sitemap.readthedocs.io/en/latest/getting-started.html#usage -html_baseurl = "https://dataframes.bigquery.dev/" -sitemap_locales = [None] - -# We don't have any immediate plans to translate the API reference, so omit the -# language from the URLs. -# https://sphinx-sitemap.readthedocs.io/en/latest/advanced-configuration.html#configuration-customizing-url-scheme -sitemap_url_scheme = "{link}" - # -- Options for warnings ------------------------------------------------------ diff --git a/docs/index.rst b/docs/index.rst index 19b05bc1b6..8825699b95 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -4,9 +4,9 @@ Scalable Python Data Analysis with BigQuery DataFrames (BigFrames) ================================================================== .. meta:: - :description: BigQuery DataFrames (BigFrames) provides a scalable, pandas-compatible Python API for data analysis and machine learning on petabyte-scale datasets using the BigQuery engine. + :description: BigQuery DataFrames (BigFrames) provides a scalable, pandas-compatible Python API for data analysis and machine learning on petabyte-scale datasets using the BigQuery engine. Designed for data scientists, data engineers, and data analysts. -**BigQuery DataFrames** (``bigframes``) is an open-source Python library that brings the power of **distributed computing** to your data science workflow. By providing a familiar **pandas** and **scikit-learn** compatible API, BigFrames allows you to analyze and model massive datasets where they live—directly in **BigQuery**. +**BigQuery DataFrames** (``bigframes``) is an open-source Python library that brings the power of **distributed computing** to your data science and data engineering workflows. By providing a familiar **pandas** and **scikit-learn** compatible API, BigFrames allows data scientists, data engineers, and data analysts to analyze, transform, and model massive datasets where they live—directly in **BigQuery**. Why Choose BigQuery DataFrames? ------------------------------- @@ -15,17 +15,17 @@ BigFrames eliminates the "data movement bottleneck." Instead of downloading larg * **Petabyte-Scale Scalability:** Effortlessly process datasets that far exceed local memory limits. * **Familiar Python Ecosystem:** Use the same ``read_gbq``, ``groupby``, ``merge``, and ``pivot_table`` functions you already know from pandas. -* **Integrated Machine Learning:** Access BigQuery ML's powerful algorithms via a scikit-learn-like interface (``bigframes.ml``), including seamless **Gemini AI** integration. +* **Integrated Machine Learning:** Access BigQuery ML's powerful algorithms via a scikit-learn-like interface (``bigframes.ml``), including seamless **Gemini AI** integration for generative AI workflows and MLOps. * **Enterprise-Grade Security:** Maintain data governance and security by keeping your data within the BigQuery perimeter. * **Hybrid Flexibility:** Easily move between distributed BigQuery processing and local pandas analysis with ``to_pandas()``. Core Components of BigFrames ---------------------------- -BigQuery DataFrames is organized into specialized modules designed for the modern data stack: +BigQuery DataFrames is organized into specialized modules designed for the modern data stack, empowering big data analytics, AI/ML pipelines, and data engineering: -1. :mod:`bigframes.pandas`: A high-performance, pandas-compatible API for scalable data exploration, cleaning, and transformation. -2. :mod:`bigframes.bigquery`: Specialized utilities for direct BigQuery resource management, including integrations with Gemini and other AI models in the :mod:`bigframes.bigquery.ai` submodule. +1. :mod:`bigframes.pandas`: A high-performance, pandas-compatible API for scalable data exploration, cleaning, and transformation for data analysts. +2. :mod:`bigframes.bigquery`: Specialized utilities for direct BigQuery resource management, including integrations with Gemini and other AI models in the :mod:`bigframes.bigquery.ai` submodule for data engineers and AI developers. Quickstart: Scalable Data Analysis in Seconds diff --git a/docs/reference/index.rst b/docs/reference/index.rst index 0de668c4fa..ce3547ccb4 100644 --- a/docs/reference/index.rst +++ b/docs/reference/index.rst @@ -1,8 +1,15 @@ API Reference ============= -Refer to these pages for details about the public objects in the ``bigframes`` -packages. +The **BigQuery DataFrames (BigFrames) API Reference** documents the pandas-compatible and scikit-learn-compatible Python interfaces powered by BigQuery's distributed compute engine. + +Designed to support the modern data stack, these APIs empower: + +* **Data Analysts** to write familiar pandas code for scalable data exploration, cleaning, and aggregation without hitting memory limits. +* **Data Engineers** to build robust big data pipelines, leveraging advanced geospatial, array, and JSON functions native to BigQuery. +* **Data Scientists** to train, evaluate, and deploy machine learning models directly on BigQuery using the ML modules, or integrate Generative AI via BigQuery ML and Gemini. + +Use this reference to discover the classes, methods, and functions that make up the BigQuery DataFrames ecosystem. .. autosummary:: :toctree: api @@ -33,7 +40,8 @@ ML APIs ~~~~~~~ BigQuery DataFrames provides many machine learning modules, inspired by -scikit-learn. +scikit-learn, enabling data scientists to quickly build, train, and deploy models +on large datasets natively within BigQuery. .. autosummary:: diff --git a/docs/sitemap.xml b/docs/sitemap.xml new file mode 100644 index 0000000000..a4af19b204 --- /dev/null +++ b/docs/sitemap.xml @@ -0,0 +1,21 @@ + + + + https://dataframes.bigquery.dev/ + + + https://dataframes.bigquery.dev/user_guide/index.html + + + https://dataframes.bigquery.dev/reference/index.html + + + https://dataframes.bigquery.dev/reference/api/bigframes.pandas.html + + + https://dataframes.bigquery.dev/reference/api/bigframes.bigquery.html + + + https://dataframes.bigquery.dev/reference/api/bigframes.bigquery.ai.html + + diff --git a/docs/user_guide/index.rst b/docs/user_guide/index.rst index af09616e05..ce0585bfbc 100644 --- a/docs/user_guide/index.rst +++ b/docs/user_guide/index.rst @@ -1,6 +1,10 @@ User Guide ********** +Welcome to the BigQuery DataFrames User Guide! This guide is designed to help data scientists, data engineers, and data analysts build scalable data pipelines, perform advanced analytics, and train machine learning models using BigQuery's distributed compute power, all while staying within the familiar pandas and scikit-learn Python ecosystem. + +Whether you're exploring big data, deploying an AI model, integrating with LLMs like Gemini, or architecting robust data engineering workflows, these tutorials and notebooks will provide the practical foundations you need. + .. include:: ../README.rst .. toctree::