Flink的SideOutputSplit分流怎么实现

这篇文章主要讲解了“Flink的SideOutputSplit分流怎么实现”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“Flink的SideOutputSplit分流怎么实现”吧!

目前成都创新互联公司已为上千多家的企业提供了网站建设、域名、雅安服务器托管绵阳服务器托管、企业网站设计、汶上网站维护等服务,公司将坚持客户导向、应用为本的策略,正道将秉承"和谐、参与、激情"的文化,与客户和合作伙伴齐心协力一起成长,共同发展。

版本说明:

环境: Windiws

Scala: 2.11.8

Flink :1.10.1

大部分的DataStream API的算子的输出是单一输出,也就是某种数据类型的流。

除了split算子,可以将一条流分成多条流,这些流的数据类型也都相同。

process function的side outputs功能可以产生多条流(Flink 1.9版本之后推荐此种方案),并且这些流的数据类型可以不一样。一个side output可以定义为OutputTag[X]对象,X是输出流的数据类型。process function可以通过Context对象发射一个事件到一个或者多个side outputs。

import org.apache.flink.streaming.api.scala.{DataStream, OutputTag, StreamExecutionEnvironment}
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.ProcessFunction
import org.apache.flink.util.Collector

/**
  *
  * @param deviceNo    设备号
  * @param timestamp   时间戳
  * @param temperature 温度
  */
case class SensorReading(deviceNo: String, timestamp: Long, temperature: Double)

object SensorReadingSplitStreaming {
    def main(args: Array[String]): Unit = {
        val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
        //设置时间语义  时间发生时间
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
        env.setParallelism(1)


        val socketSource: DataStream[String] = env.readTextFile("D:\\tmp\\file1.txt")

        val mapStream: DataStream[SensorReading] = socketSource
            .map(data => {
                val split: Array[String] = data.split(",")
                SensorReading(split(0).trim, split(1).trim.toLong, split(2).trim.toDouble)
            })

        //对数据流进行分流处理
        val tmpStageStream: DataStream[SensorReading] = mapStream.process(new TempStageProcess())

        tmpStageStream.print("main");
        val lowStream: DataStream[(String, Double)] = tmpStageStream.getSideOutput(new OutputTag[(String, Double)]("low-tmp"))
        val highStream: DataStream[(String, Double)] = tmpStageStream.getSideOutput(new OutputTag[(String, Double)]("high-tmp"))
        lowStream.print("low")
        highStream.print("high")
        env.execute()
    }

}


class TempStageProcess() extends ProcessFunction[SensorReading, SensorReading] {
    // 定义侧输出流
    lazy val lowTmp: OutputTag[(String, Double)] = new OutputTag[(String, Double)]("low-tmp");
    lazy val HighTmp: OutputTag[(String, Double)] = new OutputTag[(String, Double)]("high-tmp");

    //处理数据
    override def processElement(value: SensorReading, context: ProcessFunction[SensorReading, SensorReading]#Context, collector: Collector[SensorReading]): Unit = {
        if (value.temperature < 10) {
            context.output(lowTmp, (value.deviceNo, value.temperature))
        } else if (value.temperature > 70) {
            context.output(HighTmp, (value.deviceNo, value.temperature))
        } else {
            collector.collect(value)
        }
    }
}  //测试文件内容如下↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓

/*
设备8,1610035289736,84.3
设备5,1610035371758,38.8
设备5,1610035458637,60.2
设备1,1610035543127,10.2
设备7,1610035623427,51.6
设备5,1610035705302,20.1
设备5,1610035787387,12.9
设备7,1610035877019,88.2
设备6,1610035960537,33.5
设备7,1610036043040,63.0
设备5,1610036125179,64.5
设备6,1610036214972,30.2
设备5,1610036296542,56.5
设备7,1610036377999,29.7
设备6,1610036467523,59.4
设备4,1610036557446,71.1
设备5,1610036641100,28.2
设备2,1610036725803,88.8
设备8,1610036808041,73.5
设备1,1610036897060,18.0
设备7,1610036980127,14.9
设备2,1610037069523,47.4
设备4,1610037154507,59.5
设备5,1610037235099,35.0
设备6,1610037317868,76.4
设备2,1610037403367,10.0
设备2,1610037484177,18.5
设备4,1610037571384,98.7
设备5,1610037653666,95.6
设备6,1610037735520,32.6
设备6,1610037823906,83.3
设备3,1610037913756,29.1
设备7,1610037994980,74.6
设备6,1610038081606,22.2
设备3,1610038163043,10.4
设备5,1610038244717,56.9
设备3,1610038326227,64.8
设备4,1610038411053,65.0
设备8,1610038500538,93.2
设备8,1610038583924,76.2
设备1,1610038670150,42.1
设备5,1610038756839,35.1
设备3,1610038840180,75.9
设备3,1610038929751,83.4
设备7,1610039019422,24.1
设备3,1610039101778,85.0
设备8,1610039183077,45.6
设备3,1610039264498,79.5
设备1,1610039351600,44.4
设备8,1610039434187,73.3
设备3,1610039518048,77.9
设备7,1610039598556,9.79
设备4,1610039679144,19.0
设备2,1610039761967,56.1
设备3,1610039847823,88.2
设备6,1610039933024,77.4
设备7,1610040014212,14.4
设备4,1610040101627,98.2
设备8,1610040182379,85.0
设备6,1610040265210,61.8
设备2,1610040345769,48.0
设备3,1610040432855,19.9
设备4,1610040515943,30.9
设备4,1610040601373,51.7
设备1,1610040681803,29.7
设备8,1610040770779,31.6
设备3,1610040851986,67.1
设备4,1610040941421,93.2
设备7,1610041022836,37.2
设备8,1610041105401,84.6
设备6,1610041189301,19.2
设备4,1610041270735,99.0
设备4,1610041354109,77.0
设备5,1610041435113,49.7
设备1,1610041521773,74.2
设备8,1610041603035,42.2
设备3,1610041687230,87.1
设备1,1610041767985,82.7
设备3,1610041848130,0.59
设备4,1610041933021,7.38
设备2,1610042016080,28.9
设备2,1610042103229,99.2
设备2,1610042190222,42.2
设备3,1610042277841,12.0
设备7,1610042364076,93.5
设备7,1610042444652,10.5
设备4,1610042530461,68.5
设备1,1610042615421,78.2
设备3,1610042702219,18.5
设备6,1610042787478,64.8
设备5,1610042874301,6.34
设备2,1610042956073,65.6
设备8,1610043038793,10.6
设备8,1610043122971,30.3
设备7,1610043203810,17.5
设备8,1610043291566,83.8
设备5,1610043373188,30.5
设备2,1610043456107,84.7
设备1,1610043545998,53.4
设备3,1610043627174,97.4

 */

输出结果:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/.m2/repository/ch/qos/logback/logback-classic/1.2.0/logback-classic-1.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17:19:42,659 WARN  org.apache.flink.runtime.taskmanager.TaskManagerLocation      - No hostname could be resolved for the IP address 127.0.0.1, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted.
17:19:42,725 WARN  org.apache.flink.runtime.taskmanager.TaskManagerLocation      - No hostname could be resolved for the IP address 127.0.0.1, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted.
17:19:43,088 WARN  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file environment variable 'log.file' is not set.
17:19:43,089 WARN  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'Key: 'web.log.path' , default: null (fallback keys: [{key=jobmanager.web.log.path, isDeprecated=true}])'.
high> (设备8,84.3)
main> SensorReading(设备5,1610035371758,38.8)
main> SensorReading(设备5,1610035458637,60.2)
main> SensorReading(设备1,1610035543127,10.2)
main> SensorReading(设备7,1610035623427,51.6)
main> SensorReading(设备5,1610035705302,20.1)
main> SensorReading(设备5,1610035787387,12.9)
high> (设备7,88.2)
main> SensorReading(设备6,1610035960537,33.5)
main> SensorReading(设备7,1610036043040,63.0)
main> SensorReading(设备5,1610036125179,64.5)
main> SensorReading(设备6,1610036214972,30.2)
main> SensorReading(设备5,1610036296542,56.5)
main> SensorReading(设备7,1610036377999,29.7)
main> SensorReading(设备6,1610036467523,59.4)
high> (设备4,71.1)
main> SensorReading(设备5,1610036641100,28.2)
high> (设备2,88.8)
high> (设备8,73.5)
main> SensorReading(设备1,1610036897060,18.0)
main> SensorReading(设备7,1610036980127,14.9)
main> SensorReading(设备2,1610037069523,47.4)
main> SensorReading(设备4,1610037154507,59.5)
main> SensorReading(设备5,1610037235099,35.0)
high> (设备6,76.4)
main> SensorReading(设备2,1610037403367,10.0)
main> SensorReading(设备2,1610037484177,18.5)
high> (设备4,98.7)
high> (设备5,95.6)
main> SensorReading(设备6,1610037735520,32.6)
high> (设备6,83.3)
main> SensorReading(设备3,1610037913756,29.1)
high> (设备7,74.6)
main> SensorReading(设备6,1610038081606,22.2)
main> SensorReading(设备3,1610038163043,10.4)
main> SensorReading(设备5,1610038244717,56.9)
main> SensorReading(设备3,1610038326227,64.8)
main> SensorReading(设备4,1610038411053,65.0)
high> (设备8,93.2)
high> (设备8,76.2)
main> SensorReading(设备1,1610038670150,42.1)
main> SensorReading(设备5,1610038756839,35.1)
high> (设备3,75.9)
high> (设备3,83.4)
main> SensorReading(设备7,1610039019422,24.1)
high> (设备3,85.0)
main> SensorReading(设备8,1610039183077,45.6)
high> (设备3,79.5)
main> SensorReading(设备1,1610039351600,44.4)
high> (设备8,73.3)
high> (设备3,77.9)
low> (设备7,9.79)
main> SensorReading(设备4,1610039679144,19.0)
main> SensorReading(设备2,1610039761967,56.1)
high> (设备3,88.2)
high> (设备6,77.4)
main> SensorReading(设备7,1610040014212,14.4)
high> (设备4,98.2)
high> (设备8,85.0)
main> SensorReading(设备6,1610040265210,61.8)
main> SensorReading(设备2,1610040345769,48.0)
main> SensorReading(设备3,1610040432855,19.9)
main> SensorReading(设备4,1610040515943,30.9)
main> SensorReading(设备4,1610040601373,51.7)
main> SensorReading(设备1,1610040681803,29.7)
main> SensorReading(设备8,1610040770779,31.6)
main> SensorReading(设备3,1610040851986,67.1)
high> (设备4,93.2)
main> SensorReading(设备7,1610041022836,37.2)
high> (设备8,84.6)
main> SensorReading(设备6,1610041189301,19.2)
high> (设备4,99.0)
high> (设备4,77.0)
main> SensorReading(设备5,1610041435113,49.7)
high> (设备1,74.2)
main> SensorReading(设备8,1610041603035,42.2)
high> (设备3,87.1)
high> (设备1,82.7)
low> (设备3,0.59)
low> (设备4,7.38)
main> SensorReading(设备2,1610042016080,28.9)
high> (设备2,99.2)
main> SensorReading(设备2,1610042190222,42.2)
main> SensorReading(设备3,1610042277841,12.0)
high> (设备7,93.5)
main> SensorReading(设备7,1610042444652,10.5)
main> SensorReading(设备4,1610042530461,68.5)
high> (设备1,78.2)
main> SensorReading(设备3,1610042702219,18.5)
main> SensorReading(设备6,1610042787478,64.8)
low> (设备5,6.34)
main> SensorReading(设备2,1610042956073,65.6)
main> SensorReading(设备8,1610043038793,10.6)
main> SensorReading(设备8,1610043122971,30.3)
main> SensorReading(设备7,1610043203810,17.5)
high> (设备8,83.8)
main> SensorReading(设备5,1610043373188,30.5)
high> (设备2,84.7)
main> SensorReading(设备1,1610043545998,53.4)
high> (设备3,97.4)

Process finished with exit code 0

感谢各位的阅读,以上就是“Flink的SideOutputSplit分流怎么实现”的内容了,经过本文的学习后,相信大家对Flink的SideOutputSplit分流怎么实现这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是创新互联,小编将为大家推送更多相关知识点的文章,欢迎关注!


文章标题:Flink的SideOutputSplit分流怎么实现
标题网址:http://ybzwz.com/article/gdhisi.html