Talk on Apache Spark Internals/RDDs

I gave a talk at ScalaBay on how the RDD abstraction in Apache Spark enables diverse, distributed, and big data use cases in a cohesive package. The talk starts with RDD usage in the Spark API and shows how the same RDD interface is leveraged in Spark SQL, GraphX, MLlib and Spark Streaming to create a unified ecosystem.

It was my first time giving this talk with the specifics thought up in real time so I apologize for the hesitation in my tone.

If you are interested in learning more about Spark internals check out talks by two Spark committers: Reynold Xin, and Aaron Davidson. Some of the content in my talk is lifted from Reynold’s.

 
20
Kudos
 
20
Kudos

Now read this

The Fall of AMD

Note: This is an old post from 2007 AMD’s latest flagship Phenom processor is an utter disappointment and marks the beginning of the end to the once successful company that dominated the enthusiast CPU market. AMD shares are at a 4 year... Continue →