Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions bigframes/bigquery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,30 +16,30 @@
Access BigQuery-specific operations and namespaces within BigQuery DataFrames.
This module provides specialized functions and sub-modules that expose BigQuery's
advanced capabilities to DataFrames and Series. It acts as a bridge between the
pandas-compatible API and the full power of BigQuery SQL.
advanced analytics capabilities directly to DataFrames and Series. Designed for data scientists,
data engineers, and data analysts, it acts as a bridge between the intuitive
pandas-compatible API and the massive scale and power of BigQuery SQL.
Key sub-modules include:
* :mod:`bigframes.bigquery.ai`: Generative and predictive AI functions (Gemini, BQML).
* :mod:`bigframes.bigquery.ml`: Direct access to BigQuery ML model operations.
* :mod:`bigframes.bigquery.obj`: Support for BigQuery object tables.
* :mod:`bigframes.bigquery.ai`: Generative and predictive AI functions (Gemini, LLMs, BQML) for AI developers and data scientists.
* :mod:`bigframes.bigquery.ml`: Direct access to BigQuery ML model operations for building scalable ML pipelines.
* :mod:`bigframes.bigquery.obj`: Support for BigQuery object tables, essential for handling unstructured data like images and PDFs.
This module also provides direct access to optimized BigQuery functions for:
This module also provides direct access to optimized BigQuery functions tailored for data engineering and advanced analytics workflows:
* **JSON Processing:** High-performance functions like ``json_extract``, ``json_value``,
and ``parse_json`` for handling semi-structured data.
and ``parse_json`` for transforming semi-structured log data.
* **Geospatial Analysis:** Comprehensive geographic functions such as ``st_area``,
``st_distance``, and ``st_centroid`` (``ST_`` prefixed functions).
``st_distance``, and ``st_centroid`` (``ST_`` prefixed functions) to unlock location-based insights.
* **Array Operations:** Tools for working with BigQuery arrays, including ``array_agg``
and ``array_length``.
and ``array_length``, handling nested repeated fields efficiently.
* **Vector Search:** Integration with BigQuery's vector search and indexing
capabilities for high-dimensional data.
* **Custom SQL:** The ``sql_scalar`` function allows embedding raw SQL snippets for
advanced operations not yet directly mapped in the API.
capabilities for high-dimensional data, semantic search, and RAG architectures.
* **Custom SQL:** The ``sql_scalar`` function allows embedding raw SQL snippets, giving data engineers an escape hatch for complex, custom BigQuery operations.
By using these functions, you can leverage BigQuery's high-performance engine for
domain-specific tasks while maintaining a Python-centric development experience.
By using these functions, data professionals can leverage BigQuery's distributed compute engine for
domain-specific tasks at petabyte scale, while maintaining a productive Python-centric development experience.
For the full list of BigQuery standard SQL functions, see:
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-reference
Expand Down
34 changes: 18 additions & 16 deletions bigframes/bigquery/ai.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,30 @@
"""
Integrate BigQuery built-in AI functions into your BigQuery DataFrames workflow.
The ``bigframes.bigquery.ai`` module provides a Pythonic interface to leverage BigQuery ML's
generative AI and predictive functions directly on BigQuery DataFrames and Series objects.
These functions enable you to perform advanced AI tasks at scale without moving data
out of BigQuery.
The ``bigframes.bigquery.ai`` module provides a powerful, Pythonic interface for data scientists
and data engineers to leverage BigQuery ML's Generative AI, Large Language Models (LLMs),
and predictive functions directly on big data via BigQuery DataFrames and Series objects.
These functions enable AI developers to construct scalable MLOps pipelines and perform advanced AI
tasks—such as automated text generation and semantic search—without moving data out of BigQuery's
secure perimeter.
Key capabilities include:
Key capabilities for AI workflows include:
* **Generative AI:** Use :func:`bigframes.bigquery.ai.generate` (Gemini) to
perform text analysis, translation, or
content generation. Specialized versions like
* **Generative AI & LLMs (Gemini):** Use :func:`bigframes.bigquery.ai.generate`
to orchestrate Gemini models for text analysis, translation, summarization, or
content generation directly on big data. Specialized versions like
:func:`~bigframes.bigquery.ai.generate_bool`,
:func:`~bigframes.bigquery.ai.generate_int`, and
:func:`~bigframes.bigquery.ai.generate_double` are available for structured
outputs.
* **Embeddings:** Generate vector embeddings for text using
:func:`~bigframes.bigquery.ai.generate_embedding`, which are essential for
semantic search and retrieval-augmented generation (RAG) workflows.
* **Classification and Scoring:** Apply machine learning models to your data for
predictive tasks with :func:`~bigframes.bigquery.ai.classify` and
:func:`~bigframes.bigquery.ai.score`.
outputs, perfect for data pipelines.
* **Embeddings & Semantic Search:** Generate vector embeddings for text using
:func:`~bigframes.bigquery.ai.generate_embedding`. Essential for modern data science,
enabling robust semantic search and Retrieval-Augmented Generation (RAG) architectures.
* **Classification and Scoring:** Apply robust machine learning models to your data for
predictive analytics with :func:`~bigframes.bigquery.ai.classify` and
:func:`~bigframes.bigquery.ai.score`, accelerating the time-to-insight for data analysts.
* **Forecasting:** Predict future values in time-series data using
:func:`~bigframes.bigquery.ai.forecast`.
:func:`~bigframes.bigquery.ai.forecast` for advanced analytics and business intelligence.
**Example usage:**
Expand Down
28 changes: 15 additions & 13 deletions bigframes/pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,26 @@
**BigQuery DataFrames** provides a Pythonic DataFrame and machine learning (ML) API
powered by the BigQuery engine. The ``bigframes.pandas`` module implements a large
subset of the pandas API, allowing you to perform large-scale data analysis
using familiar pandas syntax while the computations are executed in the cloud.
subset of the pandas API, allowing you to perform large-scale data analysis,
data engineering, and AI/ML workflows using familiar pandas syntax while the computations
are seamlessly executed in the cloud.
**Key Features:**
**Key Features for Data Scientists, Data Engineers, and Data Analysts:**
* **Petabyte-Scale Scalability:** Handle datasets that exceed local memory by
offloading computation to the BigQuery distributed engine.
* **Petabyte-Scale Scalability:** Handle huge datasets that exceed local memory limits by
offloading big data computation directly to the BigQuery distributed engine.
* **Pandas Compatibility:** Use common pandas methods like
:func:`~bigframes.pandas.DataFrame.groupby`,
:func:`~bigframes.pandas.DataFrame.merge`,
:func:`~bigframes.pandas.DataFrame.pivot_table`, and more on BigQuery-backed
:class:`~bigframes.pandas.DataFrame` objects.
:class:`~bigframes.pandas.DataFrame` objects without rewriting existing pandas pipelines.
* **Direct BigQuery Integration:** Read from and write to BigQuery tables and
queries with :func:`bigframes.pandas.read_gbq` and
:func:`bigframes.pandas.DataFrame.to_gbq`.
* **User-defined Functions (UDFs):** Effortlessly deploy Python functions
functions using the :func:`bigframes.pandas.remote_function` and
:func:`bigframes.pandas.udf` decorators.
:func:`bigframes.pandas.DataFrame.to_gbq`. Perfect for data engineers constructing scalable ETL pipelines.
* **Seamless AI and Machine Learning:** Rapidly train models or use Generative AI (like Gemini) directly on large datasets, reducing data movement and time-to-insight for data scientists.
* **User-defined Functions (UDFs):** Effortlessly deploy custom Python functions
using the :func:`bigframes.pandas.remote_function` and
:func:`bigframes.pandas.udf` decorators for custom business logic.
* **Data Ingestion:** Support for various formats including CSV, Parquet, JSON,
and Arrow via :func:`bigframes.pandas.read_csv`,
:func:`bigframes.pandas.read_parquet`, etc., which are automatically uploaded
Expand Down Expand Up @@ -66,9 +68,9 @@
>>> local_df = top_names.to_pandas() # doctest: +SKIP
BigQuery DataFrames is designed for data scientists and analysts who need the
power of BigQuery with the ease of use of pandas. It eliminates the "data
movement bottleneck" by keeping your data in BigQuery for processing.
BigQuery DataFrames is designed for data scientists, data engineers, and data analysts who need the
power of BigQuery's distributed compute with the ease of use of pandas. It eliminates the "data
movement bottleneck" by keeping your big data within BigQuery for secure, scalable processing.
"""

from __future__ import annotations
Expand Down
12 changes: 1 addition & 11 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@
"sphinx.ext.napoleon",
"sphinx.ext.todo",
"sphinx.ext.viewcode",
"sphinx_sitemap",
"myst_nb",
]

Expand Down Expand Up @@ -199,7 +198,7 @@
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
# html_extra_path = []
html_extra_path = ["sitemap.xml"]

# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
Expand Down Expand Up @@ -259,15 +258,6 @@
# Output file base name for HTML help builder.
htmlhelp_basename = "bigframes-doc"

# https://sphinx-sitemap.readthedocs.io/en/latest/getting-started.html#usage
html_baseurl = "https://dataframes.bigquery.dev/"
sitemap_locales = [None]

# We don't have any immediate plans to translate the API reference, so omit the
# language from the URLs.
# https://sphinx-sitemap.readthedocs.io/en/latest/advanced-configuration.html#configuration-customizing-url-scheme
sitemap_url_scheme = "{link}"

# -- Options for warnings ------------------------------------------------------


Expand Down
12 changes: 6 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Scalable Python Data Analysis with BigQuery DataFrames (BigFrames)
==================================================================

.. meta::
:description: BigQuery DataFrames (BigFrames) provides a scalable, pandas-compatible Python API for data analysis and machine learning on petabyte-scale datasets using the BigQuery engine.
:description: BigQuery DataFrames (BigFrames) provides a scalable, pandas-compatible Python API for data analysis and machine learning on petabyte-scale datasets using the BigQuery engine. Designed for data scientists, data engineers, and data analysts.

**BigQuery DataFrames** (``bigframes``) is an open-source Python library that brings the power of **distributed computing** to your data science workflow. By providing a familiar **pandas** and **scikit-learn** compatible API, BigFrames allows you to analyze and model massive datasets where they live—directly in **BigQuery**.
**BigQuery DataFrames** (``bigframes``) is an open-source Python library that brings the power of **distributed computing** to your data science and data engineering workflows. By providing a familiar **pandas** and **scikit-learn** compatible API, BigFrames allows data scientists, data engineers, and data analysts to analyze, transform, and model massive datasets where they live—directly in **BigQuery**.

Why Choose BigQuery DataFrames?
-------------------------------
Expand All @@ -15,17 +15,17 @@ BigFrames eliminates the "data movement bottleneck." Instead of downloading larg

* **Petabyte-Scale Scalability:** Effortlessly process datasets that far exceed local memory limits.
* **Familiar Python Ecosystem:** Use the same ``read_gbq``, ``groupby``, ``merge``, and ``pivot_table`` functions you already know from pandas.
* **Integrated Machine Learning:** Access BigQuery ML's powerful algorithms via a scikit-learn-like interface (``bigframes.ml``), including seamless **Gemini AI** integration.
* **Integrated Machine Learning:** Access BigQuery ML's powerful algorithms via a scikit-learn-like interface (``bigframes.ml``), including seamless **Gemini AI** integration for generative AI workflows and MLOps.
* **Enterprise-Grade Security:** Maintain data governance and security by keeping your data within the BigQuery perimeter.
* **Hybrid Flexibility:** Easily move between distributed BigQuery processing and local pandas analysis with ``to_pandas()``.

Core Components of BigFrames
----------------------------

BigQuery DataFrames is organized into specialized modules designed for the modern data stack:
BigQuery DataFrames is organized into specialized modules designed for the modern data stack, empowering big data analytics, AI/ML pipelines, and data engineering:

1. :mod:`bigframes.pandas`: A high-performance, pandas-compatible API for scalable data exploration, cleaning, and transformation.
2. :mod:`bigframes.bigquery`: Specialized utilities for direct BigQuery resource management, including integrations with Gemini and other AI models in the :mod:`bigframes.bigquery.ai` submodule.
1. :mod:`bigframes.pandas`: A high-performance, pandas-compatible API for scalable data exploration, cleaning, and transformation for data analysts.
2. :mod:`bigframes.bigquery`: Specialized utilities for direct BigQuery resource management, including integrations with Gemini and other AI models in the :mod:`bigframes.bigquery.ai` submodule for data engineers and AI developers.


Quickstart: Scalable Data Analysis in Seconds
Expand Down
14 changes: 11 additions & 3 deletions docs/reference/index.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
API Reference
=============

Refer to these pages for details about the public objects in the ``bigframes``
packages.
The **BigQuery DataFrames (BigFrames) API Reference** documents the pandas-compatible and scikit-learn-compatible Python interfaces powered by BigQuery's distributed compute engine.

Designed to support the modern data stack, these APIs empower:

* **Data Analysts** to write familiar pandas code for scalable data exploration, cleaning, and aggregation without hitting memory limits.
* **Data Engineers** to build robust big data pipelines, leveraging advanced geospatial, array, and JSON functions native to BigQuery.
* **Data Scientists** to train, evaluate, and deploy machine learning models directly on BigQuery using the ML modules, or integrate Generative AI via BigQuery ML and Gemini.

Use this reference to discover the classes, methods, and functions that make up the BigQuery DataFrames ecosystem.

.. autosummary::
:toctree: api
Expand Down Expand Up @@ -34,7 +41,8 @@ ML APIs
~~~~~~~

BigQuery DataFrames provides many machine learning modules, inspired by
scikit-learn.
scikit-learn, enabling data scientists to quickly build, train, and deploy models
on large datasets natively within BigQuery.


.. autosummary::
Expand Down
21 changes: 21 additions & 0 deletions docs/sitemap.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://dataframes.bigquery.dev/</loc>
</url>
<url>
<loc>https://dataframes.bigquery.dev/user_guide/index.html</loc>
</url>
<url>
<loc>https://dataframes.bigquery.dev/reference/index.html</loc>
</url>
<url>
<loc>https://dataframes.bigquery.dev/reference/api/bigframes.pandas.html</loc>
</url>
<url>
<loc>https://dataframes.bigquery.dev/reference/api/bigframes.bigquery.html</loc>
</url>
<url>
<loc>https://dataframes.bigquery.dev/reference/api/bigframes.bigquery.ai.html</loc>
</url>
</urlset>
4 changes: 4 additions & 0 deletions docs/user_guide/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
User Guide
**********

Welcome to the BigQuery DataFrames User Guide! This guide is designed to help data scientists, data engineers, and data analysts build scalable data pipelines, perform advanced analytics, and train machine learning models using BigQuery's distributed compute power, all while staying within the familiar pandas and scikit-learn Python ecosystem.

Whether you're exploring big data, deploying an AI model, integrating with LLMs like Gemini, or architecting robust data engineering workflows, these tutorials and notebooks will provide the practical foundations you need.

.. include:: ../README.rst

.. toctree::
Expand Down
Loading