
Transformations will create “normal” DStreams. Once Spark Streaming is “connected” to an external data source via such input DStreams, any subsequent DStream Kafka) you need one such input DStream implementation. An input DStream: an input DStream is a special DStream that connects Spark Streaming to external data sourcesįor reading input data.an input stream for reading from Kafka), and each receiver – and thus input DStream – occupies Each receiver is responsible for exactly one so-called Is run within an executor as a long-running task. The SparkContext sends those tasks for the executors to run. That is, it runs (part of) the actual computation of A task is a unit of work that will be sent to one executor.“slots” available to run tasks assigned to it. An executor has a certain amount of cores aka An executor is a process launched for an application on a worker node, which runs tasks and keeps data in memory.A Spark cluster contains 1+ worker nodes aka slave machines (simplified view I exclude pieces like cluster.Understanding of some Spark terminology to be able to follow the discussion in those sections.

The subsequent sections of this article talk a lot about parallelism in Spark and in Kafka. Excursus: Machines, cores, executors, tasks, and receivers in Spark My word, please do check out the talks/decks above yourself.īoth Spark and Storm are top-level Apache projects, and vendors have begun to integrate either or both tools into theirĬommercial offerings, e.g. Pleasant to use, at least if you write your Spark applications in Scala (I prefer the Spark API, too). Spark on the other hand has a more expressive, higher level API than Storm, which is arguably more Here’s my personal, very brief comparison: Storm has higher industry adoption and better production stability compared Taylor Goetz of HortonWorks shared a slide deck titledĪpache Storm and Spark Streaming Compared. In which they compare the two platforms and also cover the question of when and why choosing one over the other. In terms of use cases Spark Streaming is closely related to Apache Storm, which isĪrguably today’s most popular real-time processing platform for Big Data. Spark is a batch processing platform similar to Apache Hadoop, and Spark Streaming is a real-time processing tool Spark Streaming is a sub-project of Apache Spark.

#Project spark game code code
The Spark Streaming example code is available atĪnd yes, the project's name might now be a bit misleading.
