spark streaming 5: InputDStream
生活随笔
收集整理的這篇文章主要介紹了
spark streaming 5: InputDStream
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
spark streaming 5: InputDStream InputDStream的繼承關系。他們都是使用InputDStream這個抽象類的接口進行操作的。特別注意ReceiverInputDStream這個類,大部分時候我們使用的是它作為擴展的基類,因為它才能(更容易)使接收數據的工作分散到各個worker上執行,更符合分布式計算的理念。
所有的輸入流都某個時間間隔將數據以block的形式保存到spark memory中,但以spark core不同的是,spark streaming默認是將對象序列化后保存到內存中。
/**
* This is the abstract base class . This class provides methods
* start() and stop() which is called by Spark Streaming system to .
* Input streams that can For example,
* FileInputDStream, a subclass of InputDStream, monitors a HDFS directory from the driver for
* new files and generates RDDs with the new files. .
*
* @param ssc_ Streaming context that will execute this input stream
*/
abstract class T@transient extends Tprivatevar lastValidTimenull
graphthis
/**
* Abstract class for defining any [[org.apache.spark.streaming.dstream.InputDStream]]
* that has to start a receiver on worker nodes to receive external data.
* * @param ssc_ Streaming context that will execute this input stream
* @tparam T Class type of the object of this stream
*/
abstract class T@transient extends T/** Keeps all received blocks information */
private lazy val new , /** This is an unique identifier for the network input stream. */
val id
/**
* Gets the receiver object that will be sent to the worker nodes
* to receive data. This method needs to defined by any specific implementation
* of a NetworkInputDStream.
*/
def getReceiver(): Receiver[T]最終都是以BlockRDD返回的/** Ask ReceiverInputTracker for received data blocks and generates RDDs with them. */
override def compute(validTime: Time): Option[RDD[T]] = {
// If this is called for any time before the start time of the context,
// then this returns an empty RDD. This may happen when recovering from a
// master failure
if (validTime >= graph.startTime) {
val blockInfo = ssc.scheduler.receiverTracker.getReceivedBlockInfo(id)
receivedBlockInfo(validTime) = blockInfo
val blockIds = blockInfo.map(_.blockId.asInstanceOf[BlockId])
Some(new BlockRDD[T](ssc.sc, blockIds))
} else {
Some(new BlockRDD[T](ssc.sc, Array[BlockId]()))
}
}
From WizNote
posted on 2015-02-05 17:17 過雁 閱讀(...) 評論(...) 編輯 收藏
所有的輸入流都某個時間間隔將數據以block的形式保存到spark memory中,但以spark core不同的是,spark streaming默認是將對象序列化后保存到內存中。
/**
* This is the abstract base class . This class provides methods
* start() and stop() which is called by Spark Streaming system to .
* Input streams that can For example,
* FileInputDStream, a subclass of InputDStream, monitors a HDFS directory from the driver for
* new files and generates RDDs with the new files. .
*
* @param ssc_ Streaming context that will execute this input stream
*/
abstract class T@transient extends Tprivatevar lastValidTimenull
graphthis
/**
* Abstract class for defining any [[org.apache.spark.streaming.dstream.InputDStream]]
* that has to start a receiver on worker nodes to receive external data.
* * @param ssc_ Streaming context that will execute this input stream
* @tparam T Class type of the object of this stream
*/
abstract class T@transient extends T/** Keeps all received blocks information */
private lazy val new , /** This is an unique identifier for the network input stream. */
val id
/**
* Gets the receiver object that will be sent to the worker nodes
* to receive data. This method needs to defined by any specific implementation
* of a NetworkInputDStream.
*/
def getReceiver(): Receiver[T]最終都是以BlockRDD返回的/** Ask ReceiverInputTracker for received data blocks and generates RDDs with them. */
override def compute(validTime: Time): Option[RDD[T]] = {
// If this is called for any time before the start time of the context,
// then this returns an empty RDD. This may happen when recovering from a
// master failure
if (validTime >= graph.startTime) {
val blockInfo = ssc.scheduler.receiverTracker.getReceivedBlockInfo(id)
receivedBlockInfo(validTime) = blockInfo
val blockIds = blockInfo.map(_.blockId.asInstanceOf[BlockId])
Some(new BlockRDD[T](ssc.sc, blockIds))
} else {
Some(new BlockRDD[T](ssc.sc, Array[BlockId]()))
}
}
From WizNote
posted on 2015-02-05 17:17 過雁 閱讀(...) 評論(...) 編輯 收藏
轉載于:https://www.cnblogs.com/zwCHAN/p/4275348.html
總結
以上是生活随笔為你收集整理的spark streaming 5: InputDStream的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Sqoop2入门之导入关系型数据库数据到
- 下一篇: 争讼锵然马前卒领域褪色人情世故连成一句话