Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

ERROR - Error response from daemon: Head "https://mcr.microsoft.com/v2/azure-batch/shipyard/manifests/3.9.1-cargo": dial tcp 204.79.197.219:443: i/o timeout #376

@AndyPengYong

Description

@AndyPengYong

Problem Description

2022-06-08T05:12:34,397626166+0000 - DEBUG - Pulling Docker Image: mcr.microsoft.com/azure-batch/shipyard:3.9.1-cargo (fallback: 0)
2022-06-08T05:13:04,792495588+0000 - ERROR - Error response from daemon: Head "https://mcr.microsoft.com/v2/azure-batch/shipyard/manifests/3.9.1-cargo": dial tcp 204.79.197.219:443: i/o timeout
2022-06-08T05:13:04,794030532+0000 - ERROR - No fallback registry specified, terminating

Batch Shipyard Version

3.9.1

Batch Pool Configuration

pool_specification:
id: scrum-pool
vm_configuration:
platform_image:
offer: centos-container
publisher: microsoft-azure-batch
sku: 7-8
vm_count:
dedicated: 1
low_priority: 0
vm_size: Standard_A2_v2
reboot_on_start_task_failed: true

Additional Logs

Linux 15f0febd29bf4e55b151a692b61a1eac000000 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
2022-06-08T05:12:25,187903573+0000 - WARNING - Unknown DISTRIB_CODENAME.
2022-06-08T05:12:25,202975095+0000 - INFO - Prep start
Configuration:
Custom image: 0
Native mode: 1
OS Distribution: centos 7
Batch Shipyard version: 3.9.1
Blobxfer version: 1.9.4
Singularity version:
User mountpoint:
Mount path: /mnt/resource/batch/tasks/mounts
Batch Insights: 0
Prometheus: NE=, CA=,
Network optimization: 0
Encryption cert thumbprint:
Install Kata Containers: 0
Default container runtime: runc
Install BeeGFS BeeOND: 0
Storage cluster mounts (1):
Custom mount:
Install LIS:
GPU:
GPU ignore warnings:
Azure Blob: 0
Azure File: 0
GlusterFS on compute: 0
HPN-SSH: 0
Enable Azure Batch group for Docker access:
Fallback registry:
Docker image preload delay: 0
Cascade via container: 1
Concurrent source downloads: 10
Block on images: #
Singularity decryption certs:

2022-06-08T05:12:25,265201338+0000 - INFO - Ephemeral disk discovered as /dev/sdb
2022-06-08T05:12:25,281373591+0000 - DEBUG - lsblk: /dev/sdb1 8:17 0 20G 0 part /mnt/resource
2022-06-08T05:12:25,298301166+0000 - INFO - ephemeral: /dev/sdb1 (encrypted=0 user=/mnt/resource)
2022-06-08T05:12:25,370942501+0000 - INFO - VmSize=standard_a2_v2 RDMA=0
2022-06-08T05:12:25,374938513+0000 - INFO - LIS installation not required
2022-06-08T05:12:25,376557558+0000 - INFO - No singularity decryption certificates defined
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2022-06-08 05:10:04 UTC; 2min 20s ago
Docs: https://docs.docker.com
Main PID: 1164 (dockerd)
Tasks: 12
Memory: 843.6M
CGroup: /system.slice/docker.service
└─1164 /usr/bin/dockerd -H fd:// --containerd /var/run/containerd/containerd.sock

Jun 08 05:09:59 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:09:59.844055023Z" level=info msg="scheme "unix" not registered, fallback to default scheme" module=grpc
Jun 08 05:09:59 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:09:59.844076823Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 }] }" module=grpc
Jun 08 05:09:59 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:09:59.844087822Z" level=info msg="ClientConn switching balancer to "pick_first"" module=grpc
Jun 08 05:10:00 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:10:00.549123795Z" level=info msg="Loading containers: start."
Jun 08 05:10:02 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:10:02.622634623Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jun 08 05:10:03 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:10:03.637093010Z" level=info msg="Loading containers: done."
Jun 08 05:10:04 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:10:04.350500980Z" level=info msg="Docker daemon" commit=847da184ad5048b27f5bdf9d53d070f731b43180 graphdriver(s)=overlay2 version=20.10.11+azure-3
Jun 08 05:10:04 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:10:04.351261171Z" level=info msg="Daemon has completed initialization"
Jun 08 05:10:04 15f0febd29bf4e55b151a692b61a1eac000000 systemd[1]: Started Docker Application Container Engine.
Jun 08 05:10:04 15f0febd29bf4e55b151a692b61a1eac000000 dockerd[1164]: time="2022-06-08T05:10:04.707034818Z" level=info msg="API listen on /var/run/docker.sock"
Client:
Version: 20.10.11+azure-3
API version: 1.41
Go version: go1.16.12
Git commit: dea9396e184290f638ea873c76db7c80efd5a1d2
Built: Wed Nov 17 23:49:46 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server:
Engine:
Version: 20.10.11+azure-3
API version: 1.41 (minimum version 1.12)
Go version: go1.16.12
Git commit: 847da184ad5048b27f5bdf9d53d070f731b43180
Built: Thu Nov 18 00:21:59 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.12+azure-1
GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc:
Version: 1.0.3
GitCommit: f46b6ba2c9314cfc8caae24a32ec5fe9ef1059fe
docker-init:
Version: 0.19.0
GitCommit:
Client:
Context: default
Debug Mode: false

Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 2
Server Version: 20.10.11+azure-3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
Default Runtime: runc
Init Binary: docker-init
containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc version: f46b6ba2c9314cfc8caae24a32ec5fe9ef1059fe
init version:
Kernel Version: 3.10.0-1127.19.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.701GiB
Name: 15f0febd29bf4e55b151a692b61a1eac000000
ID: 5NBN:SBAR:2RJ4:4N7S:MW57:D6OV:YS2L:LIOZ:BXZ7:QMWC:J5AU:KNJJ
Docker Root Dir: /mnt/resource/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

2022-06-08T05:12:25,770292688+0000 - INFO - Docker root dir within ephemeral temp disk: /mnt/resource/docker
2022-06-08T05:12:25,771988035+0000 - INFO - Checking for Nvidia Hardware
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
2022-06-08T05:12:26,119040957+0000 - INFO - No Nvidia card(s) detected!
2022-06-08T05:12:26+0000 - DEBUG - Logging into 1 Docker registry servers...
2022-06-08T05:12:26+0000 - DEBUG - Logging into Docker registry: scrumsalesdockerprd.azurecr.io with user: scrumsalesdockerprd
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /mnt/resource/batch/tasks/startup/wd/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
2022-06-08T05:12:26+0000 - INFO - Docker registry logins completed.
2022-06-08T05:12:26+0000 - WARNING - No Singularity registry servers found.
2022-06-08T05:12:26,348386782+0000 - DEBUG - VM size standard_a2_v2 does not have IB RDMA
2022-06-08T05:12:26,349782721+0000 - DEBUG - Not an RDMA capable VM size, skipping IB detection/setup
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:22:48:e8:38:ff brd ff:ff:ff:ff:ff:ff
inet 10.6.3.23/25 brd 10.6.3.127 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::222:48ff:fee8:38ff/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:bf:43:8c:ee brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
2022-06-08T05:12:26,354430051+0000 - INFO - Batch Insights disabled.
2022-06-08T05:12:26,355991295+0000 - INFO - Prometheus node exporter disabled.
2022-06-08T05:12:26,360116311+0000 - INFO - Prometheus cAdvisor disabled.
2022-06-08T05:12:26,361768457+0000 - DEBUG - Pulling Docker Image: mcr.microsoft.com/blobxfer:1.9.4 (fallback: 0)
1.9.4: Pulling from blobxfer
89d9c30c1d48: Pulling fs layer
6de18253c5d3: Pulling fs layer
89d9c30c1d48: Verifying Checksum
89d9c30c1d48: Download complete
6de18253c5d3: Verifying Checksum
6de18253c5d3: Download complete
89d9c30c1d48: Pull complete
6de18253c5d3: Pull complete
Digest: sha256:94192812382de05b77d8766720cf4c22cc84fd15c86a838a043366fa2047af83
Status: Downloaded newer image for mcr.microsoft.com/blobxfer:1.9.4
mcr.microsoft.com/blobxfer:1.9.4
2022-06-08T05:12:34,397626166+0000 - DEBUG - Pulling Docker Image: mcr.microsoft.com/azure-batch/shipyard:3.9.1-cargo (fallback: 0)
2022-06-08T05:13:04,792495588+0000 - ERROR - Error response from daemon: Head "https://mcr.microsoft.com/v2/azure-batch/shipyard/manifests/3.9.1-cargo": dial tcp 204.79.197.219:443: i/o timeout
2022-06-08T05:13:04,794030532+0000 - ERROR - No fallback registry specified, terminating

Additonal Comments

This issue happens around 1 time per month.
we added the following parameter in pool.yaml file. But this parameter did not work as expected.
reboot_on_start_task_failed: true

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions