how about giving each student(or a group) a huge and dirty dataset; then asking them to cleanse(using Pig+Hive) and and store the data into a datastore(Hbase,Cassandra,.. or even HDFS) for further use(like issuing queries and analysis);
you can add more complexity by specifying how the Hadoop Clusters should be setup, i mean a single machine suffices or they should use a real distributed environment for that;