Azure Active イベント 機械学習 IoT デバイス ブローカー／ Directory (Machine Cloud gateways デバイス管理 Learning API) ダッシュボード IP 通信可能な デバイス (Azure App Services) (Windows/Linux) Field 省電力駆動のデバ gateways イス (RTOS) ストリーム処理 (Stream Analytics) 見える化・分析 (Power BI) レガシーデバイス (カスタムプロトコル) Azure Active Directory 認証基盤 クエリと検索 (Azure Search) 時系列データ 並列データ処理 機械学習 データベース (Azure Data Lake Analytics) (Machine Learning / ドキュメント Revolution R Enterprise) DWH デバイスへの通知 (Notification Hubs) 4
Consumer Group C Consumer Group B Consumer Grou Worker 1 p A Worker 1 C Worker 1allback for prtn. 1 Partition 1 Callback for prtn. 1 Callback for prtn. 1 Callback “n” Callback “n” Partition 2 Callback “n” Worker “n” Worker “n” Call Worker “n” back for prtn. 6 Partition “n” Callback for prtn. 6 Callback for prtn. 6 Callback for prtn. 2 Callback for prtn. 2 Callback for prtn. 2
Backpressure Feedback Intermediary Broker
Transformation Meaning map(func) Return a new DStream by passing each element of the source DStream through a function func. flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items. filter(func) Return a new DStream by selecting only the records of the source DStream on which func returns true. repartition(numPartitions) Changes the level of parallelism in this DStream by creating more or fewer partitions. union(otherStream) Return a new DStream that contains the union of the elements in the source DStream and otherDStream. count() Return a new DStream of single-element RDDs by counting the number of elements in each RDD of the source DStream. Return a new DStream of single-element RDDs by aggregating the reduce(func) elements in each RDD of the source DStream using a function func (which takes two arguments and returns one). The function should be associative so that it can be computed in parallel.
Transformation Meaning When called on a DStream of elements of type K, return a new DStream of (K, countByValue() Long) pairs where the value of each key is its frequency in each RDD of the source DStream. When called on a DStream of (K, V) pairs, return a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function. reduceByKey(func, Note: By default, this uses Spark's default number of parallel tasks (2 for local [numTasks]) mode, and in cluster mode the number is determined by the config property spark.default.parallelism) to do the grouping. You can pass an optional numTasks argument to set a different number of tasks. join(otherStream, When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of [numTasks]) (K, (V, W)) pairs with all pairs of elements for each key. cogroup(otherStream, When called on DStream of (K, V) and (K, W) pairs, return a new DStream of (K, [numTasks]) Seq[V], Seq[W]) tuples. transform(func) Return a new DStream by applying a RDD-to-RDD function to every RDD of the source DStream. This can be used to do arbitrary RDD operations on the DStream. Return a new "state" DStream where the state for each key is updated by applying updateStateByKey(func) the given function on the previous state of the key and the new values for the key. This can be used to maintain arbitrary state data for each key.
Transformation Description Map Takes one element and produces one element. A map function that doubles the values of the input stream FlatMap Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words Filter Evaluates a boolean function for each element and retains those for which the function returns true. A filter that filters out zero values: KeyBy Logically partitions a stream into disjoint partitions, each partition containing elements of the same key. Internally, this is implemented with hash partitioning. Reduce A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. Fold A "rolling" fold on a keyed data stream with an initial value. Combines the current element with the last folded value and emits the new value. Rolling aggregations on a keyed data stream. The difference between min and minBy is that min Aggregations returns the minimun value, whereas minBy returns the element that has the minimum value in this field (same for max and maxBy). Window Windows can be defined on already partitioned KeyedStreams. Windows group the data in each key according to some characteristic (e.g., the data that arrived within the last 5 seconds). WindowAll Windows can be defined on regular DataStreams. Windows group all the stream events according to some characteristic (e.g., the data that arrived within the last 5 seconds). Window Apply Applies a general function to the window as a whole. Below is a function that manually sums the elements of a window. Window Reduce Applies a functional reduce function to the window and returns the reduced value. Window Fold Applies a functional fold function to the window and returns the folded value.
Transformation Description Aggregations on Aggregates the contents of a window. The difference between min and minBy is that min windows returns the minimun value, whereas minBy returns the element that has the minimum value in this field (same for max and maxBy). Union of two or more data streams creating a new stream containing all the elements from all Union the streams. Node: If you union a data stream with itself you will get each element twice in the resulting stream. Window Join Join two data streams on a given key and a common window. Window CoGroup Cogroups two data streams on a given key and a common window. Connect "Connects" two data streams retaining their types. Connect allowing for shared state between the two streams. CoMap, CoFlatMap Similar to map and flatMap on a connected data stream Split Split the stream into two or more streams according to some criterion. Select Select one or more streams from a split stream. Creates a "feedback" loop in the flow, by redirecting the output of one operator to some previous operator. This is especially useful for defining algorithms that continuously update a Iterate model. The following code starts with a stream and applies the iteration body continuously. Elements that are greater than 0 are sent back to the feedback channel, and the rest of the elements are forwarded downstream. Extract Timestamps Extracts timestamps from records in order to work with windows that use event time semantics.