3 Credit Hours
Students will have an opportunity to walk through hands-on examples with Hadoop framework as well as Spark cluster computing framework. Students will learn the Hadoop tools and technologies to manage big data on a cluster with HDFS and MapReduce. Students will learn how to write programs to analyze data on Hadoop, how to store and query data sets, how to design a Hadoop ecosystem, and how to handle streaming data in real time. Students will also learn how to achieve greater performance advantage with Spark over Hadoop MapReduce, especially for iterative algorithms. Interactive shell features will be covered as well. As students will install and run these open-source Big Data software tools, there are hardware requirements for those who take this course remotely. Consult with instructors for more details.