I am not sure why privacy is a consideration in your context. If the Websites are indeed constructed correctly there is little, if any, possibility of any crawler capable of accessing any specific content due to limitations imposed via authentication and session storage. In short, your crawler will not be capable of accessing personal data simply because it will not be able to authenticate as a specific individual and not be able to instantiate a session providing access to their data. HTTrack is probably the oldest and most widely used crawler. Data mining typically involves having access to an API or databases, which a crawler typically won't.
It should be some respect for people and their personal data that they are not sharing with third party. Also, it should be some copy rights and intellectual property for the contents. There are some other issues that can make the privacy on the web are vital.
Again, sharing of personal information is not possible to a web crawler unless the Website is flawed or is intentionally displaying it. Copyright and intellectual property are irrelevant to a web crawler, as it will access and retrieve anything publicly available. I have twenty years building privacy-related applications. A web crawler does not exist which can authenticate as a user and retrieve their personal information without having access to their credentials. Also, a web crawler is essentially "stupid" as it cannot differentiate one piece of data from another (i.e. company name versus name of individual versus name of product, etc.) The issue would be to define a means to filter privacy-sensitive information from static information - which would change from one site to the next.
I understand the main point now from my readings. I was not familiar with data privacy as a whole overview. However, some data can be consider as marketing data that crawled by bots and other bots can generate fake websites or accounts on the web to hack search or recommender systems.