Skip to content

Backfill Pre-192 Court Sessions #2107

@Mephistic

Description

@Mephistic

Summary

The earliest MAPLE court is 192, but the API apparently supports general courts going back 20 years. Having a more complete backlog would help build confidence in MAPLE itself, and there are a few features we've wanted that would be able to take advantage of this older data (top of mind - tracking bills introduced across multiple sessions for a more accurate historical activity log).

In theory, this is as simple as expanding the generalCourt constants to include the lower court numbers (e.g. starting with 191 would be a good test) and running all of the scrapers against the new general court. We only scrape the current court's data on an ongoing basis: these backfills for older courts should only need to be run once each on DEV and PROD.

The one exception here is the scrapeHearings function - that uses AssemblyAI by default and we should do that backfill separately (once the other data such as bills, committees, etc. is available). We should probably disable the AssemblyAI call when run via the runScrapers script to avoid accidentally breaking or racking up charges (though that functionality doesn't have to be part of this ticket).

Success Criteria

  • Investigate how far back the MA Legislature API's data goes (The API supports back to the 1993/1994 session, but it's not clear how much data is actually available in the format we expect going that far back.)
  • Add the additional general courts to the
  • Run all of our scrapers against each of the new courts
    • As a warning, some of the scrapers are paced to scrape all resources over a longer period of time (e.g. the bills scraper is paced to scrape 8,000 bills over 24 hours to avoid API limits) - these may need to run overnight.
  • Avoid scraping the hearing video for transcripts until further notice - we definitely want this, but want to be careful about how we spend our remaining AssemblyAI credits. (Kimin has a script for this we should revisit when we're ready

Additional Links

Metadata

Metadata

Assignees

Labels

Ready for DevelopmentAutomatically assigned to issues that are ready to be picked upbackendBackend Developmentdata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions