If you have a data mining based model or machine learning based model it is best to test your model on UCI Machine learning repository spam based email dataset...
How far the data sets of phishing emails and phishing web pages are different with each other? I mean are they conceptually different with each other? are the red signals of a phishing email very different with ones in a phishing web page?
According to me, Initially, the attacker generates a phishing URL and distributes through the email or other communication channels for hoping, the user clicks the link. Hence, the attacker usually employs social engineering into the email to lure the victims. You can detect the phishing email by frequent terms, hyperlinks.
However, the phishing website is visually similar to the genuine website except for the input field or hyperlink as well.
We created a phishing email dataset for the 1st anti-phishing shared task, which is available with a request to me. The proceedings of the shared task are at: http://ceur-ws.org/Vol-2124/