Posts

Showing posts with the label Spark SQL

Moving Data [Apache Spark]

So I decided I'm going to use a real world example and do some transformations against such. I decided on http://dumps.wikimedia.org , so that I have some nice sized files which I'm able to really see the advantages Apache Spark brings to the table. My system is loaded with Apache Spark 1.6.0 and Scala 2.10.5. Lets do this : First open the spark shell: spark-shell Next load an SQLContext: val sqlContext = new org.apache.spark.sql.SQLContext(sc) //sc an existing Spark context  Next import the following packages into the shell session: import sqlContext.implicits._ import org.apache.spark.sql._ Now, you can start by loading the data from the " pagecounts-2011222 " file into a Resilient Distributed Dataset (RDD). RDDs have transformations and actions; the first() action returns the first element in the RDD  Now load the data from the file into a new RDD: val wikiHits = sc.textFile("/home/osboxes/Downloads/pagecounts-20151222") Do some ...