fix: Respect remote function config changes even if logic unchanged#2512
fix: Respect remote function config changes even if logic unchanged#2512TrevorBergeron wants to merge 13 commits intomainfrom
Conversation
| cloud_function_memory_mib=cloud_function_memory_mib, | ||
| cloud_function_cpus=cloud_function_cpus, | ||
| cloud_function_ingress_settings=cloud_function_ingress_settings, | ||
| bq_metadata=bqrf_metadata, |
There was a problem hiding this comment.
Does this mean we don't have to annotate the BQ Functions we create with virtual types in the metadata anymore? If so, probably worth a mini design if only so you can get the proper attribution for this simplification, but also to share the technique with the team.
There was a problem hiding this comment.
Nah, I'm just using the signature object now to capture everything about type mappings, including virtual ones.
bigframes/functions/_utils.py
Outdated
| session_id: str | None = None, | ||
| ): | ||
| """Get a name for the bigframes managed function for the given user defined function.""" | ||
| # TODO: Move over to logic used by remote functions |
There was a problem hiding this comment.
I would like some more context behind this TODO. Could you file a bug and put some more information there?
There was a problem hiding this comment.
Created bug 495508827, moved todo, added bug reference
| routine: bigquery.Routine, session: bigframes.Session | ||
| ) -> BigqueryCallableRoutine: | ||
| udf_def = _routine_as_udf_def(routine) | ||
| override_type = _get_output_type_override(routine) |
There was a problem hiding this comment.
I suspect this removal also means we don't need to look at the routine's metadata anymore, right?
Side thought: I wonder if this means we don't support timedelta/duration in remote functions?
There was a problem hiding this comment.
I'm just encapsulating more in the udf_def. If anything, overall available metadata is more rich now
| # Protocol version 4 is available in python version 3.4 and above | ||
| # https://docs.python.org/3/library/pickle.html#data-stream-format | ||
| _pickle_protocol_version = 4 | ||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Nit: I've recently been trying to follow a different pattern in some of my projects where there is one logger instance per package (e.g. bigframes) rather than per-module, as the per-module version makes applying filters much more annoying.
There was a problem hiding this comment.
Mind if we do this as a separate PR? Already I'm probably doing too much in one PR. This was the pre-existing logger config
| def __post_init__(self): | ||
| assert isinstance(self.name, str) | ||
| assert isinstance(self.dtype, (DirectScalarType, RowSeriesInputFieldV1)) |
There was a problem hiding this comment.
Let's add some comments for why this is necessary. If it's for the type checker, I'm surprised the class attributes above weren't sufficient.
There was a problem hiding this comment.
The dataclasses class attributes aren't actually checked at runtime. This is why people use tools like pydantic to actually validate their records. This validation here is just a convenience to catch runtime issues earlier than getting some missing attribute later.
| inputs: tuple[UdfArg, ...] = dataclasses.field() | ||
| output: DirectScalarType | VirtualListTypeV1 | ||
|
|
||
| def __post_init__(self): |
There was a problem hiding this comment.
Same here. Docs for why this is necessary.
There was a problem hiding this comment.
Its not for any special reason, just runtime validation to catch things static type checking doesn't
bigframes/functions/udf_def.py
Outdated
| def protocol_metadata(self) -> str | None: | ||
| import bigframes.functions._utils | ||
|
|
||
| # TODO: The output field itself should handle this, to handle protocol versioning. |
There was a problem hiding this comment.
removing this comment, not a functional requirement rn, just speculation about a refactor.
bigframes/functions/udf_def.py
Outdated
| def_copy, protocol=_pickle_protocol_version | ||
| ) | ||
|
|
||
| hash_val = hashlib.md5() |
There was a problem hiding this comment.
Same here. crc32c or at least a todo and bug to use a better hash
There was a problem hiding this comment.
switched to crc32c
|
|
||
|
|
||
| @dataclasses.dataclass(frozen=True) | ||
| class CloudRunFunctionConfig: |
There was a problem hiding this comment.
Thoughts on how to keep this in sync with the arguments we add to our decorators?
There was a problem hiding this comment.
The function that creates the artifact uses this struct solely (along with a name), so that is the forcing mechanism - anything that needs to reflect in the artifact must first be a field in this struct.
bigframes/operations/array_ops.py
Outdated
| @dataclasses.dataclass(frozen=True) | ||
| class ArrayMapOp(base_ops.UnaryOp): | ||
| name: typing.ClassVar[str] = "array_map" | ||
| # TODO: Generalize to chained expressions |
There was a problem hiding this comment.
created bug: 495513753
a67973b to
4f982fa
Compare
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕