Depthwise Conv2d performance degrades at non-mod-16 spatial dimensions

Depthwise Conv2d throughput degrades significantly when spatial dimensions are not divisible by 16. This surfaces in multi-stage vision encoders where repeated 2x downsampling produces non-mod-16 intermediate feature maps.

## Repro

[repro_depthwise_mod16.py](https://github.com/user-attachments/files/26258190/repro_depthwise_mod16.py)

## Results

| Channels | Mod-16 res | Other res | Mod-16 ms | Other ms | Gap  | Other mod 16 |
|-:|-|-|-:|-:|-:|-:|
| 96 | 256x256 | 240x240 | 0.52 | 0.50 | -5% | 0 |
| 192 | 128x128 | 120x120 | 0.47 | 0.45 | -5% | 8 |
| 384 | 64x64 | 60x60 | 0.43 | 0.83 | +94% | 12 |
| 768 | 32x32 | 30x30 | 0.37 | 0.55 | +47% | 14 |
| 1536 | 16x16 | 15x15 | 0.37 | 0.47 | +27% | 15 |


When the non-mod-16 dimension is 240 (still divisible by 16), there is no penalty. The gap appears at 120 and below, where the remainder is nonzero, and compounds across stages to produce ~53% end-to-end throughput loss through a 5-stage encoder.

## Expected behavior

Depthwise Conv2d throughput should scale proportionally with pixel count regardless of whether spatial dimensions are divisible by 16.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depthwise Conv2d performance degrades at non-mod-16 spatial dimensions #3324

Repro

Results

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Channels	Mod-16 res	Other res	Mod-16 ms	Other ms	Gap	Other mod 16
96	256x256	240x240	0.52	0.50	-5%	0
192	128x128	120x120	0.47	0.45	-5%	8
384	64x64	60x60	0.43	0.83	+94%	12
768	32x32	30x30	0.37	0.55	+47%	14
1536	16x16	15x15	0.37	0.47	+27%	15

Depthwise Conv2d performance degrades at non-mod-16 spatial dimensions #3324

Description

Repro

Results

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions