Skip to content

add a script to feed fastpath from failed measurements s3 bucket#404

Draft
aagbsn wants to merge 1 commit intomainfrom
add_failed_measurements_fastpath_feeder
Draft

add a script to feed fastpath from failed measurements s3 bucket#404
aagbsn wants to merge 1 commit intomainfrom
add_failed_measurements_fastpath_feeder

Conversation

@aagbsn
Copy link
Copy Markdown
Contributor

@aagbsn aagbsn commented Apr 15, 2026

this script reads from ooniprobe-failed-reports and submits them to fastpath.

@aagbsn aagbsn marked this pull request as draft April 15, 2026 12:35
@github-actions
Copy link
Copy Markdown

Ansible Run Output 🤖

Ansible Playbook Recap 🔍



Ansible playbook output 📖success

Show Execution

$ ansible-playbook playbook.yml --check --diff -i ../tf/modules/ansible_inventory/inventories/inventory-dev.ini
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
[ERROR]: the role 'geerlingguy.docker' was not found in /home/runner/work/devops/devops/ansible/roles:/home/runner/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/home/runner/work/devops/devops/ansible
Origin: /home/runner/work/devops/devops/ansible/deploy-testlists.yml:16:7

14         node_exporter_host: "0.0.0.0"
15         node_exporter_options: ""
16     - role: geerlingguy.docker
         ^ column 7

Pusher @aagbsn
Action pull_request
Working Directory
Workflow .github/workflows/check_ansible.yml
Last updated Wed, 15 Apr 2026 12:36:48 GMT


# Configuration from environment (set these in your shell)
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID") # required if not using IAM role/profile
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") # required if not using IAM role/profile
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source bucket and destination bucket are different and require two different access keys, so these parameters should be separated.

p = Path(local_path)
msmt_id = p.stem
with p.open("r", encoding="utf-8") as f:
data = json.load(f)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use ujson here since it's significantly faster than the stock python json parser.

print("S3_BUCKET_NAME environment variable is required.")
return
s3 = get_s3_client()
for prefix, subs, objs in walk(s3, BUCKET_NAME, ""):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should support a dry-run mode which doesn't actually do any copy or submission, to make sure that everything is working as intended.

s3 = get_s3_client()
for prefix, subs, objs in walk(s3, BUCKET_NAME, ""):
print(f"PREFIX: {prefix} subdirs={len(subs)} objects={len(objs)}")
with ThreadPoolExecutor(max_workers=50) as _exe:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't run this inside of a thread pool to avoid concurrency issues. I think we are OK with this not going too fast.

content = data.get('content')
endpoint = f"{FASTPATH_API}/{msmt_id}"
try:
resp = requests.post(endpoint, json=content, timeout=30)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to post the full content of the payload (including the format and content keys), not just the content of content.

Because of that you don't actually need to parse the JSON body, you can just treat it as a binary blob.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not be using the json option of requests, since this is going to re-serialize the JSON, which can lead to what's being sent not matching the hash of what's inside of the measurement_uid: https://github.com/ooni/backend/blob/master/ooniapi/services/ooniprobe/src/ooniprobe/routers/reports.py#L162

def process_postcan(s3, bucket, key, local_path):
try:
print("Downloading", key)
s3.download_file(bucket, key, local_path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need to download this to a local file and re-read it, you can just stream it from s3 directly to the post request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants