Automating Google Index Monitoring: What I Learned from Building Free Python SEO Tool

Ryan McCain
March 8, 2025
5:47 pm
0 comments

These are some things I learned (best practices) while creating my Norzer Google Index Bot python script, which checks the indexing status of URLs from a website's XML sitemap using the Google Search Console API. It sends an email report with results, categorizing URLs into Indexed, Not Indexed (Issues), and Excluded URLs. Consider this 'lessons learned" or a brain dump.

Secure Storage of Credentials

Storing API credentials and tokens securely is paramount. Avoid hardcoding sensitive keys or embedding them directly in the code or a public repository

blog.gitguardian.com. Instead, use external configuration with proper safeguards. For example, credentials can be placed in a config.ini file or a .env file that is not tracked by version control

blog.gitguardian.com. Ensure this file is included in your .gitignore and has restricted file permissions (only accessible by the owner) to prevent unauthorized access. As a best practice, consider using environment variables for credentials at runtime, which adds security and flexibility

blog.gitguardian.com. This way, even if the code is shared, the secrets remain outside the codebase.

Recommendations:

External Config Files: Keep API keys, client secrets, and tokens in a config file (like config.ini) or use environment variables, rather than in the script. The script should read these at runtime.
Do Not Commit Secrets: Never commit your config with credentials to source controlblog.gitguardian.com. If using a public repository, double-check that no secrets are exposed.
File Permissions: Protect the configuration file by setting strict permissions (e.g., on Linux chmod 600 config.ini). This ensures only the owner/service user can read it.
Environment Variables Option: For added security, allow the script to fetch credentials from environment variables (e.g., os.getenv("API_KEY")). Environment variables are often recommended for securing credentials in Python apps medium.com. Tools like python-dotenv can load a .env file for user convenience.
Use Minimal Scope: If the Google credentials involve OAuth scopes or service accounts, use the least privileged scope needed. For instance, if only the Indexing API is used, avoid enabling broader scopes than necessary. This limits impact if a credential leaks.

By following these steps, credentials remain out of the code, reducing the risk of accidental exposure and aligning with security best practices.

API Rate Limit Handling

Respecting Google’s API rate limits is critical for stability. The Google Indexing API (used for indexing requests) has strict quotas – by default around 200 URL publish requests per day and up to 380 requests per minute across all endpoints

At Norzer, we offer managed WordPress hosting, custom WordPress website design, and local SEO services to help your business grow. With secure, high-performance hosting and expert support, we handle the tech so you can focus on success. Get started today! 🚀

web.swipeinsight.app. Exceeding these limits can lead to errors or temporary bans. To avoid hitting the limits, implement throttling in your script: limit the number of requests per second and spread out large batches over time.

Recommendations:

Understand Quotas: Familiarize yourself with Google’s quotas for the APIs you use (e.g., Indexing API or Search Console API). For example, the Indexing API allows roughly 200 new URL submissions per day by defaultweb.swipeinsight.app. Design the bot so it never sends more than the allowed requests in a given period.
Configure Throttling: Build in a delay or sleep between API calls. For instance, if 200/day is the cap, you might space calls by a few seconds or minutes, and avoid rapid-fire bursts. A simple approach is using time.sleep() after each request or after a batch of requests. Make the delay configurable via config.ini so users can adjust based on their quota and needs.
Monitor Rate Limit Errors: Google’s APIs typically return HTTP 429 (Too Many Requests) if you hit the rate limit. If the bot receives a 429, it should pause and retry after a backoff period (see next section on backoff). Logging these events is helpful to alert the user that the rate limit was hit.
Use Exponential Backoff: When a rate limit is encountered, implement an exponential backoff strategy rather than immediate retriesdiscuss.ai.google.dev. This means waiting progressively longer (e.g., 1s, then 2s, 4s, etc.) before retrying the request. Backoff helps ensure you don't hammer the API when it's asking for slow-down.
Batch and Queue Requests: If you have a large list of URLs to index, consider batching them in a queue and processing in chunks that respect the per-minute limit. This prevents the script from overrunning the API.

By handling rate limits proactively, the bot avoids triggering errors or bans. It will operate more smoothly within Google's allowed usage, which improves its reliability.

Parallel Processing Efficiency

Processing multiple URLs in parallel can significantly speed up indexing for large batches, but it must be done carefully. Python offers ways to do concurrent requests (e.g., using threading or asyncio), but naive parallelism can conflict with rate limits and thread safety. If using the official Google API Python client, note that it is built on httplib2 which is not thread-safe

googleapis.github.io. This means that each thread should use its own HTTP connection object when making API calls to avoid issues.

Recommendations:

Use Thread Pools Judiciously: Instead of launching a very high number of threads, use a thread pool or async pool with a moderate size (for example, 5–10 threads at most, depending on quotas). This provides concurrency without overwhelming the API or your system. Python’s concurrent.futures.ThreadPoolExecutor or asyncio can help manage this easily.
Ensure Thread Safety: If your script uses a Google API client library in threads, give each thread its own authenticated HTTP client instance googleapis.github.io. This prevents cross-thread data corruption or authentication mix-ups. Alternatively, use thread-safe HTTP requests (the requests library is thread-safe by default for separate calls, especially if each thread uses its own session).
Controlled Concurrency: Tie the number of parallel workers to the API limits. For example, if only ~2 requests per second are allowed, having 5 threads starting requests simultaneously may be fine, but 50 threads would likely cause failures. You can implement a simple semaphore or rate limiter that only allows a certain number of concurrent API calls.
Avoid Global State: When processing in parallel, avoid modifying global structures from multiple threads without locks. Instead, gather results from threads in a thread-safe queue or use higher-level concurrency primitives. This will improve stability and prevent race conditions.
Configurable Concurrency: Make the degree of parallelism configurable via config.ini (e.g., a “threads” setting). Hobbyist users can then dial it down if they encounter issues or if running on a low-power machine, or dial it up (within safe limits) to speed up processing.

By leveraging parallel processing carefully, the bot can index URLs faster while still playing nicely with Google’s rules. It’s a balance between efficiency and caution: a bit of concurrency for speed, but not so much that it causes errors.

Robust Error Handling

A resilient script should handle errors gracefully and provide informative feedback. Google Search Console APIs can occasionally return transient errors (like HTTP 500 Internal Errors) even if requests are valid. These often indicate a temporary backend issue on Google’s side

stackoverflow.com, so your bot should catch them and retry rather than crashing. Implementing retries with exponential backoff is a best practice for both 500 errors and rate-limit errors

discuss.ai.google.dev. This means if an API call fails with a 500 or similar, the script waits a short time and tries again, increasing the wait time on each retry attempt.

Recommendations:

Catch Exceptions: Wrap API calls in try/except blocks to catch exceptions (e.g., network errors, HTTP errors thrown by libraries). This prevents the entire script from stopping on a single failure. For Google’s Python client, catch HttpError; for raw HTTP requests, check response.status_code.
Handle Specific Status Codes: Implement logic for different error types:
- 500 Internal Server Error: Assume it’s a temporary issue. Log a warning and retry the request after a delay. Often a second attempt will succeed if the issue was momentary. Use exponential backoff for subsequent retries (e.g., wait 1s, then 2s, then 4s)stackoverflow.com. Limit the number of retries (for example, 3 attempts) to avoid infinite loops if the error persists.
- 429 Too Many Requests / Quota Errors: These indicate you’ve hit a limit. In this case, log an error about rate limit, and definitely wait (perhaps a minute or more) before retrying. The backoff strategy is important herediscuss.ai.google.dev – each retry wait time should grow (with some random jitter) to give the API time to reset the quota window.
- 403 Unauthorized / Permission Errors: If the API returns 401/403, it could be an authentication issue (bad credentials or missing API access). In this case, retries won’t help; the script should log a clear error and stop or skip those URLs after informing the user to check credentials or API enablement.
- Network Timeouts: If a request times out or there’s no response, treat it similar to a 500 – log it and retry with backoff. Sometimes a slow network or hiccup can cause transient failures.
Logging Errors: Every error (especially those causing a retry) should be logged with details (timestamp, URL or action that failed, and the error message). This helps the user or developer later diagnose issues. For instance, if a particular URL consistently returns a 500, it might indicate a problem with that URL or the API’s handling of it.
Skip or Continue on Failure: The script should continue processing remaining URLs even if one fails. Perhaps maintain a list of failed URLs to report at the end. This way one problematic URL doesn’t prevent the rest from being indexed.
User Notification: If certain errors require user intervention (like invalid credentials or exceeding daily quota), make sure the script outputs a clear message (and perhaps writes to a log) so the user knows what to fix or when to try again (e.g., “Daily quota reached, try again tomorrow”).

By anticipating and handling errors, the bot becomes much more stable. It will recover from transient Google errors automatically, politely back off when needed, and inform the user of any persistent problems. This leads to a smoother experience, especially for non-technical users who just want the tool to work.

Performance and Logging Improvements

Even for a hobbyist script, paying attention to performance and logging can enhance user experience. Performance in this context means the script runs efficiently without wasting resources or time. Simple tweaks can make a difference, especially when processing many URLs. Logging is equally important for maintainability – it provides transparency into what the bot is doing and aids in debugging issues or verifying that tasks completed successfully.

Performance Tips:

Reuse HTTP Sessions: If using the requests library or similar for API calls, use a requests.Session to keep connections alive. This avoids the overhead of establishing a new TCP/SSL connection for each request, making the indexing calls faster and less resource-intensive.
Avoid Redundant Operations: If the script reads input URLs from a file or database, load them once and reuse the list, rather than re-reading multiple times. Similarly, only fetch auth tokens when necessary and cache them in memory (for example, if using OAuth, retrieve the token once and reuse until expiration).
Optimize Data Structures: Use appropriate data structures for lookups (e.g. a set to track processed URLs and avoid duplicates). While these optimizations might be minor, they can help when the URL list is large.
Parallelism with Caution: As discussed, some parallel processing can improve throughput. Just ensure it’s tuned (no oversubscription of threads) and that your system has enough bandwidth. This ties back into efficiency—doing more in less time without breaking things.
Test in Small Batches: For performance tuning, test the script with a smaller batch of URLs first. This will give you an idea of how quickly it processes and any bottlenecks, before scaling up to hundreds of URLs.

Logging Best Practices:

Use Python’s Logging Module: Instead of using only print statements, integrate the built-in logging module for better flexibility. This allows setting levels (INFO, DEBUG, ERROR) and easily redirecting output to a file if needed. For example, important events (start/finish of indexing, any errors) can be logged at INFO or ERROR level, while detailed debug information (like raw API responses or stack traces) can be logged at DEBUG level and turned on when troubleshooting.
Log to File (Optionally): Provide an option in config.ini to enable file logging. Small business users might appreciate an indexbot.log file that records what happened during a run. Use rotating logs or timestamped files to prevent a single log from growing indefinitely.
No Sensitive Data in Logs: Be careful not to log full credentials or tokens. If you log requests, strip out authorization headers or any secrets. Logs can be viewed by others or left on disk, so they should not become another point of leakage.
Success/Failure Summary: At the end of a run, output a brief summary – e.g., “10 URLs processed, 8 indexed successfully, 2 failed.” Perhaps even list failed URLs or the reason. This user-friendly touch lets the operator quickly see the outcome without digging through logs.
Verbose Mode: Consider a verbose or debug mode toggle. In normal mode, the script can print minimal but useful info (like “Indexing URL X... Success/Failed”). In debug mode (enabled via config), it can print more internal details. This keeps the default output clean for casual users, but allows deeper insight when needed for troubleshooting.

Improving performance ensures the bot runs smoothly even on low-power machines or with large URL lists, and good logging provides confidence and transparency in what the script is doing, which is especially helpful for maintainability.

Maintainability and User-Friendliness

To make sure the Google Index Bot remains useful for hobbyists and small business owners, prioritize simplicity in configuration and clarity in code. A maintainable script is one that can be easily updated or fixed by its user or other contributors down the line. User-friendliness means non-developers should be able to use and even tweak it with minimal friction.

Recommendations:

Clear Configuration Settings: Keep the config.ini well-organized and well-documented (you can include comments explaining each setting). Use intuitive names for options (e.g., api_key, site_url, max_threads, log_level). This way, users can adjust behavior without touching the code. Secure values like credentials have to be in there (or in env vars), but other tunables (delays, number of retries, etc.) can also be made configurable.
Documentation: Provide a README or usage guide (even if just as comments at the top of the script). It should include how to obtain the Google credentials, how to fill out the config.ini, and how to run the script. For small business owners unfamiliar with development, step-by-step setup instructions are invaluable.
Code Organization: Structure the Python script into logical functions or sections (for example: one function for loading config, one for authenticating with Google, one for submitting a URL to index, etc.). This modular approach makes it easier to maintain or modify parts of the code. It also improves readability – someone skimming the code can understand the high-level flow.
In-Line Comments: Write brief comments in the code especially for non-obvious logic. For instance, if you implement exponential backoff or multi-threading, a short comment on why you’re doing something will help the next person (or your future self) understand the reasoning.
Graceful Shutdown: Ensure the script can exit cleanly. For example, if the user hits Ctrl+C to stop, catch the keyboard interrupt and print a summary or a friendly message rather than a long stack trace. This polish makes the tool feel more robust and user-friendly.
Dependencies and Updates: Limit external dependencies to what's necessary. If you use the Google API Python client, that's fine – just mention it in requirements. For a hobby script, fewer dependencies means easier setup. Also, keep an eye on updates from Google (APIs can change). If Google offers a new official method (for example, an official URL indexing endpoint or updated library), plan to upgrade accordingly and document any changes for users.

By keeping the tool simple in configuration and clear in execution, hobbyists and small business users will find it accessible. They can run it without deep technical knowledge and trust that it’s doing its job securely and reliably. Plus, if something goes wrong, the combination of good logging, error handling, and documentation will make it much easier to diagnose and fix.

Hopefully, this helps someone else!

At Norzer, we specialize in getting small businesses more leads with managed WordPress hosting, custom website design, and local SEO services. Our secure, high-performance hosting and expert support ensure your site runs smoothly while attracting more customers. Let us handle the tech—so you can focus on growing your business! Get started today! 🚀

Ryan McCain

Ryan is the founder of Norzer and a lifelong tech prodigy with over 25 years of experience in IT. Animal lover, experimental home cook, avid reader, and loyal Boston sports fan (a holdover from my IBM days in the city). I also make a point to hit the gym and knock out 10,000 steps every day because sometimes the best ideas come while moving. Lately, I’ve been hacking something else: food. I find creative ways to cut calories without sacrificing flavor.

Step 2: We will discuss your current business goals and present a plan of how Norzer can help you increase leads.

Step 3: Norzer goes to work developing and implementing a strategy that will produce results that align with your business goals.

The best part: you don’t owe us a dime until for our initial analysis. You are free to take our analysis and do your own work if you choose. You work tirelessly to ensure the best possible experience for every customer who steps foot in your door or visits your website. We do the same. Help us help you!

Automating Google Index Monitoring: What I Learned from Building Free Python SEO Tool

Secure Storage of Credentials

API Rate Limit Handling

Parallel Processing Efficiency

Robust Error Handling

Performance and Logging Improvements

Maintainability and User-Friendliness

RECENT POSTS

WHY NORZER?

TESTIMONIALS