This page belongs to SiteOne Crawler and serves as an overview of the functionality of converting entire web pages to markdown.
- Open markdown version of crawler.siteone.io - this webpage is based on Starlight.
- The Markdown version was generated by the specific command below.
- For better performance, some parts of the page (DOM elements) have been removed by
--markdown-exclude-selector
. - Using
--ignore-regex
, it was ensured that URL addresses to HTML reports or examples exports were not passed through, so that only absolute URLs to these URLs remained in the markdown. - I put the
--disable-*
attributes here only to avoid downloading these types of files unnecessarily. They do not affect the output markdown content.
./crawler \
--url=https://crawler.siteone.io/ \
--ignore-regex='/^.*\/html\//' \
--ignore-regex='/^.*\/examples\-exports\//' \
--markdown-export-dir=tmp/crawler.siteone.io/ \
--markdown-exclude-selector='header' \
--markdown-exclude-selector='starlight-theme-select' \
--markdown-exclude-selector='.isMobile' \
--markdown-exclude-selector='#starlight__on-this-page--mobile' \
--markdown-exclude-selector='.social-icons' \
--disable-styles --disable-javascript --disable-fonts
- Open markdown version of react.dev.
- The Markdown version was generated by the specific command below. For better performance, some parts of the page (DOM elements) have been removed.
- I used the
--markdown-disable-images
so that the images are not included and are removed from the markdown. - I used the
--disable-all-assets
here only to avoid downloading assets (JS, CSS, etc.) unnecessarily. That do not affect the output markdown content.
./crawler \
--url=https://react.dev/ \
--markdown-export-dir=tmp/react.dev/ \
--markdown-disable-images \
--disable-all-assets
- Open markdown version of docs.astro.build - this webpage is based on Starlight.
- The Markdown version was generated by the specific command below. For better performance, some parts of the page (DOM elements) have been removed.
- I put the
--disable-*
attributes here only to avoid downloading these types of files unnecessarily. They do not affect the output markdown content.
./crawler \
--url=https://docs.astro.build/ \
--markdown-export-dir=tmp/docs.astro.build/ \
--markdown-exclude-selector='header' \
--markdown-exclude-selector='starlight-theme-select' \
--markdown-exclude-selector='.isMobile' \
--markdown-exclude-selector='#starlight__on-this-page--mobile' \
--markdown-exclude-selector='.social-icons' \
--disable-styles --disable-javascript --disable-fonts