IBM Modeler is platform agnostic and supports archive of stored procedures. Depending on how robust your machine is, how vast the restructued array can become is limited only by contiguous memory. As long as you do not change any of the data plumbing nomenclature, it is quite possible to automate a steady production stream that not only extracts data from a relatively dirty source, transforms it into a normalized format, and then loads the transformed extract into a repository where the second phase of scientific data mining operations can occur through discovery, analysis, and dissemenation of findings. The apex difficulty is building the ideal model, of course. To do that, you have to spend a lot of time shredding electrons with silicon, through a human's brain that is fully immersed in the informatics stream. ETL is just the initial phase of the Data Mining process and is more data engineering and data plumbing, but wtihout it, data scientists would have a much more difficult task of discovering new knowledge from existing data.
In ORACLE built in tool for ETL is Warehouse builder. But it is not user friendly and flexible. Much more user friendly is dedicated for ORACLE (but not only) is ORACLE Data Integrator (downloaded from ORACLE Web site) free for scientific and educational use.
In MS SQL it is Integration Services, part of MS Data Tools (later MS Business Intelligence). It is very flexible, fast and useful tool for ETL process. It can be download from DrimSpark MS store (later MSDN). If You have not experience in this platform please choose 2012 platform (newest version 2014 is more difficult to install).
It is a pity that Penatho is no longer free. It is really very good tool.
ETL is not only first step for Data Mining. Wider it is beginning of analytical processing - Data Warehousing and Data Mining