You can use Node-red which does not require that much coding if you're not an experienced coder or want to do it quicker. Off course it will require some time to get familiar with it.
from the meta , it contains some information about the page or the site , it define some key words describe the site. crawling is another method to extract interested data form the targeted site
You can use wget and other Linux tools. Here is an example: https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file