You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
πΎ This project gather data, about the usage of HTTP response security headers, into a SQLITE database to allow the generation of statistics in a second time.
# Download the MAJESTIC Top 1 million sites CSV file
$ wget https://downloads.majestic.com/majestic_million.csv
# Transform the downloaded file to an input source that use the same format # than the CISCO Top 1 million sites CSV file
$ cat majestic_million.csv | awk -F ","'NR>1 {print $1 "," $3}'> data/input.csv
$ rm majestic_million.csv
Scripts
Note
π¦ They are all stored in the scripts folder and they are Python 3.x based.
gather_data: Script gathering the information about HTTP security headers usage in a SQLITE database based on the "MAJESTIC Top 1 million sites CSV file" data source.
generate_stats_md_file: Script using the gathered data to generate/update the markdown file stats, with mermaid pie charts with differents statistics about HTTP security headers usage (β οΈnot used anymore).
input.csv: MAJESTIC Top 1 million sites list formated as one entry ranking,domain by line.
data.db: SQLITE database with information about HTTP security headers usage.
Data and statistics update
Note
π‘ Only the first 150000 entries of the CSV datasource are used to fit the processing timeframe allowed for a github action workfows using the free tiers.
π» The update is scheduled in the following way:
The first day of every month the data database is updated via this workflow.
The fifth day of every month the statistic data is updated via this workflow.