How to Check Website’s Categories in Bulk from Symantec Site Review Tool?

Are you a security professional or analyst who needs to quickly categorize a large list of websites? Or maybe you simply want to analyze a bunch of domains for your own research. Manually checking each website’s category on Symantec’s Site Review tool can be extremely tedious and time-consuming.

To help with this, we’ve created a handy Python script that can automate the process of categorizing websites in bulk using Symantec’s database. In this post, we’ll explain what the Site Review tool is, how our script works, what’s required to run it, and step-by-step directions to start categorizing websites in bulk.

A Short Note About the Symantec Site Review Tool

Symantec Site Review Tool

Symantec’s free Site Review tool allows you to manually check the category of any website URL through their database. It can identify over 60 content types like gambling, hate speech, botnets, malware, and more. This helps security teams classify and filter websites during investigations.

The tool is very useful but has some capacity limitations. It is designed for manual one-off checks rather than bulk automated submissions. Submitting over 500 URLs in quick succession can trigger CAPTCHAs which block further automated requests.

We strongly recommend avoiding submitting thousands of URLs automatically in a short time span. This overuses Site Review’s resources past its intended purpose. Instead, use our script to categorize websites in batches under 100 URLs. Let sufficient time pass before running larger lists. With reasonable use, the tool can benefit your website categorization workflow. But be mindful of its limitations and don’t aggressively overuse the free resource.

About The domain_categorization.py Bulk Website Categorization Script

To help automate Symantec’s Site Review tool for bulk checks, we built a Python script called domain_categorization.py. It handles the submission of multiple URLs to the Site Review website, parses the category results, and compiles everything into an easy to analyze text file output.

See also  Step-By-Step Procedure To Install kali Linux On VMWare Workstation

Here is a high-level overview of how it works:

The script starts by opening the list of URLs you want to check from a file called domains.txt. It then launches a Chrome browser using Selenium, programmatically navigating to the Site Review tool webpage.

It takes the first URL in the list, inputs it into the search box on the Site Review website, and hits enter to submit the site for categorization. As it processes each URL, it checks for any CAPTCHA pop-ups and handles them accordingly.

Once a URL is categorized, the script extracts the identified categories and domain name into variables. It logs everything, along with the original URL, into a results.txt output file. It iterates through this extraction process for every URL in the input list.

Any URLs that hit a CAPTCHA are logged separately into a captcha.txt file for manual rechecking later.

Overall this automated interaction with Site Review’s website allows you to feed in a bulk list of URLs and efficiently get website categories parsed into a text file. This saves huge amounts of manual analysis time.

Here are its key features:

  • Bulk URL Processing: Parses a text file of URLs and checks categories for each through the Site Review tool website.

  • CAPTCHA Handling: Automatically detects and handles CAPTCHAs if they appear, logging any blocked URLs.

  • Result Logging: Stores categorized URL data into a results .txt file for easy analysis. Also logs any CAPTCHA occurrences.

Overall, it makes checking hundreds or thousands of websites a quick and painless process!

Prerequisites to Run the Bulk Website Checker Script

Before running the domain_categorization.py script to categorize websites, you need to set up:

See also  How To Protect Your macOS From New AdLoad Adware?

Python Environment: Having Python 3.x installed on your computer is necessary for executing the .py script. Download the latest 3.x version if you don’t already have it from https://www.python.org/downloads/.

Detailed installation procedures are: Step-by-Step Procedure to Install Python on Windows

Selenium Module: The script imports Selenium to automate interaction with a browser. Install it via pip by running:

pip install selenium

ChromeDriver: ChromeDriver allows Selenium to interface with Google Chrome. Download the driver from https://chromedriver.chromium.org/downloads and add its executable to your system PATH. Ensure you grab the ChromeDriver version that matches your installed Chrome browser version.

Input File: Have your list of URLs ready in a plain text file called domains.txt. Put one URL per line in this file for the script to iterate through.

Once Python, Selenium, ChromeDriver, and the input URL list are ready, you can move onto running the website category checker script! Let us know if any of the prerequisites are unclear.

Step-by-Step Guide to Checking Websites in Bulk

Once the prerequisites are set up, you are ready to utilize the script to categorize website lists in bulk.

Follow this streamlined process:

Step 1: Populate Input File

Add the full URLs you want to check, one on each line, into a text file called domains.txt. This serves as the input list that the script will iterate through.

Populate Input File

Step 2: Execute the Script

Open a terminal or command prompt, navigate to the script’s directory, and run:

python domain_categorization.py

This launches the Selenium browser automation to start checking each URL through Symantec’s Site Review tool.

Execute The Script

Step 3: Review Outputs

As the script runs, your categorized URL results will be compiled line-by-line in the results.txt file. Any CAPTCHAs encountered mid-process will be logged in captcha.txt for retry later.

Review Outputs

And that’s it! Sit back and let the tool scrape through your website list automatically. The heavy lifting of submissions and parsing is handled programmatically to save you headaches.

See also  Top 10 Common Vectors Of Cyberattacks

Let us know if you have any other questions getting set up! It’s our pleasure to help you create python scripts like this for any other use. Feel free to comment here.

Leave a Reply

Your email address will not be published. Required fields are marked *