| CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Improve solr + nginx performance monitoring #11156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve solr + nginx performance monitoring #11156
Conversation
6064271 to
5a76175
Compare
|
Patch deployed to monitor frequency of this issue. |
62da713 to
f988fba
Compare
a136444 to
dde5b2e
Compare
dde5b2e to
652b10b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances monitoring capabilities for Solr and Nginx performance with more accurate metrics and query tracking. It adds specific monitoring for homepage book display and switches from fixed-size log sampling to time-based analysis for better accuracy.
- Introduces comprehensive Solr log monitoring with query labeling and performance tracking
- Refactors Nginx monitoring to use time-based sampling (
obfi_previous_minute) instead of fixed entry counts - Adds homepage monitoring to track book display issues and query labeling throughout the search system
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/monitoring/solr_logs_monitor.py | New comprehensive Solr log parser and monitoring system |
| scripts/monitoring/utils.sh | Refactored to use obfi_previous_minute for more accurate Nginx metrics |
| scripts/monitoring/monitor.py | Added Solr and homepage monitoring jobs |
| openlibrary/plugins/worksearch/code.py | Added query labeling system for tracking different search types |
| scripts/deployment/deploy.sh | Extended deployment to include Solr servers |
| compose.production.yaml | Added monitoring profile for Solr servers |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| def safe_parse_log_entry(log_line: str) -> RequestLogEntry | SolrLogEntry | None: | ||
| try: | ||
| return parse_log_entry(log_line) | ||
| except Exception as e: # noqa: BLE001 |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bare Exception catch is too broad and could hide important errors. Consider catching specific exceptions like ValueError or ParseError that are expected from parse_log_entry.
| except Exception as e: # noqa: BLE001 | |
| except (ValueError, ParseError) as e: |
708d172 to
3b7d1e4
Compare
3b7d1e4 to
ca92ed4
Compare
| fields: str = '*', | ||
| facet: bool = True, | ||
| spellcheck_count: int | None = None, | ||
| query_label: QueryLabel = 'UNLABELLED', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNLABELLED_WORK_SEARCH may help us at least differentiate between e.g. author, list, subject searches.
Closes #10290
Closes #10239
When this is 0, it means the homepage is empty.
Adds a
query_labeloption to a few spots we make solr queries that we can use to monitor certain types of queriesAdd ol-solr0/ol-solr1 to our deploy flow. This lets these monitoring changes stay up-to-date
Switch nginx monitoring to use
obfi_previous_minuteinstead of using the last 17500 entries. This results in more accurate numbers and lets use get absolute values, so we can now easily see if e.g a certain bot has seen a spike in traffic, as opposed to a bot looks like it has a spike in traffic because everything else went down.The flat line is the before since we were sampling a constant value. Now we get the actual previous minute in full.
Technical
Testing
https://grafana.us.archive.org/d/000000176/open-library-dev?orgId=1&refresh=1m&from=1755279851212&to=1755290651214&viewPanel=41
Screenshot
Stakeholders