Talk on Apache Spark Internals/RDDs

I gave a talk at ScalaBay on how the RDD abstraction in Apache Spark enables diverse, distributed, and big data use cases in a cohesive package. The talk starts with RDD usage in the Spark API and shows how the same RDD interface is leveraged in Spark SQL, GraphX, MLlib and Spark Streaming to create a unified ecosystem.

It was my first time giving this talk with the specifics thought up in real time so I apologize for the hesitation in my tone.

If you are interested in learning more about Spark internals check out talks by two Spark committers: Reynold Xin, and Aaron Davidson. Some of the content in my talk is lifted from Reynold’s.

 
17
Kudos
 
17
Kudos

Now read this

Computer Science and Math

I came across an interesting discussion regarding the role of math in computer science education on SIGCSE’s (ACM Special Interest Group on Computer Science Education consisting of CS professors) mailing list. Brad Zanden, a professor in... Continue →