add a script to feed fastpath from failed measurements s3 bucket by aagbsn · Pull Request #404 · ooni/devops

aagbsn · 2026-04-15T12:35:26Z

this script reads from ooniprobe-failed-reports and submits them to fastpath.

github-actions · 2026-04-15T12:36:49Z

Ansible Run Output 🤖

Ansible Playbook Recap 🔍

Ansible playbook output 📖`success`

Show Execution


$ ansible-playbook playbook.yml --check --diff -i ../tf/modules/ansible_inventory/inventories/inventory-dev.ini
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
[ERROR]: the role 'geerlingguy.docker' was not found in /home/runner/work/devops/devops/ansible/roles:/home/runner/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/home/runner/work/devops/devops/ansible
Origin: /home/runner/work/devops/devops/ansible/deploy-testlists.yml:16:7

14         node_exporter_host: "0.0.0.0"
15         node_exporter_options: ""
16     - role: geerlingguy.docker
         ^ column 7


Pusher	@aagbsn
Action	pull_request
Working Directory
Workflow	.github/workflows/check_ansible.yml
Last updated	Wed, 15 Apr 2026 12:36:48 GMT

hellais · 2026-04-15T12:46:58Z

+
+# Configuration from environment (set these in your shell)
+AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")           # required if not using IAM role/profile
+AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")   # required if not using IAM role/profile


The source bucket and destination bucket are different and require two different access keys, so these parameters should be separated.

hellais · 2026-04-15T12:47:33Z

+        p = Path(local_path)
+        msmt_id = p.stem
+        with p.open("r", encoding="utf-8") as f:
+            data = json.load(f)


You should use ujson here since it's significantly faster than the stock python json parser.

hellais · 2026-04-15T12:48:30Z

+        print("S3_BUCKET_NAME environment variable is required.")
+        return
+    s3 = get_s3_client()
+    for prefix, subs, objs in walk(s3, BUCKET_NAME, ""):


You should support a dry-run mode which doesn't actually do any copy or submission, to make sure that everything is working as intended.

hellais · 2026-04-15T12:49:06Z

+    s3 = get_s3_client()
+    for prefix, subs, objs in walk(s3, BUCKET_NAME, ""):
+        print(f"PREFIX: {prefix}  subdirs={len(subs)} objects={len(objs)}")
+        with ThreadPoolExecutor(max_workers=50) as _exe:


I wouldn't run this inside of a thread pool to avoid concurrency issues. I think we are OK with this not going too fast.

hellais · 2026-04-15T12:51:31Z

+            content = data.get('content')
+            endpoint = f"{FASTPATH_API}/{msmt_id}"
+            try:
+                resp = requests.post(endpoint, json=content, timeout=30)


You need to post the full content of the payload (including the format and content keys), not just the content of content.

Because of that you don't actually need to parse the JSON body, you can just treat it as a binary blob.

You should not be using the json option of requests, since this is going to re-serialize the JSON, which can lead to what's being sent not matching the hash of what's inside of the measurement_uid: https://github.com/ooni/backend/blob/master/ooniapi/services/ooniprobe/src/ooniprobe/routers/reports.py#L162

hellais · 2026-04-15T12:53:23Z

+def process_postcan(s3, bucket, key, local_path):
+    try:
+        print("Downloading", key)
+        s3.download_file(bucket, key, local_path)


You don't actually need to download this to a local file and re-read it, you can just stream it from s3 directly to the post request

add a script to feed fastpath from failed measurements s3 bucket

7f4e330

aagbsn marked this pull request as draft April 15, 2026 12:35

hellais reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a script to feed fastpath from failed measurements s3 bucket#404

add a script to feed fastpath from failed measurements s3 bucket#404
aagbsn wants to merge 1 commit intomainfrom
add_failed_measurements_fastpath_feeder

aagbsn commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

hellais Apr 15, 2026

Uh oh!

hellais Apr 15, 2026

Uh oh!

hellais Apr 15, 2026

Uh oh!

hellais Apr 15, 2026

Uh oh!

hellais Apr 15, 2026

Uh oh!

hellais Apr 15, 2026

Uh oh!

hellais Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aagbsn commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Ansible Run Output 🤖

Ansible Playbook Recap 🔍

Ansible playbook output 📖success

Uh oh!

hellais Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hellais Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hellais Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hellais Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hellais Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hellais Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hellais Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ansible playbook output 📖`success`