Talk on Apache Spark Internals/RDDs

I gave a talk at ScalaBay on how the RDD abstraction in Apache Spark enables diverse, distributed, and big data use cases in a cohesive package. The talk starts with RDD usage in the Spark API and shows how the same RDD interface is leveraged in Spark SQL, GraphX, MLlib and Spark Streaming to create a unified ecosystem.

It was my first time giving this talk with the specifics thought up in real time so I apologize for the hesitation in my tone.

If you are interested in learning more about Spark internals check out talks by two Spark committers: Reynold Xin, and Aaron Davidson. Some of the content in my talk is lifted from Reynold’s.

 
20
Kudos
 
20
Kudos

Now read this

The Best Design Decision Apple Made in Swift

Apple announced the Swift programming language during this morning’s WWDC. It was lots of fun to read through the five hundred page documentation on iBooks. I am very impressed with how elegantly Apple incorporated functional programming... Continue →