IDEAWordCountjar包上传spark是怎么调试及排错的

这篇文章给大家介绍IDEA WordCount jar包上传spark是怎么调试及排错的,内容非常详细,感兴趣的小伙伴们可以参考借鉴,希望对大家能有所帮助。

专注于为中小企业提供成都网站建设、网站设计服务,电脑端+手机端+微信端的三站合一,更高效的管理,为中小企业晋源免费做网站提供优质的服务。我们立足成都,凝聚了一批互联网行业人才,有力地推动了上千家企业的稳健成长,帮助中小企业通过网站建设实现规模扩充和转变。

Based on:

Mac os

Spark 2.4.3

(Spark running on  a standalone mode  reference blog :http://blog.itpub.net/69908925/viewspace-2644303/  )

scala 2.12.8

IDEA 2019

1  IDEA-File-Project Structure-Libarary-Scala SDK

IDEA WordCount jar包上传spark是怎么调试及排错的

select  version  2.11.12 

IDEA WordCount jar包上传spark是怎么调试及排错的

这处选择的版本需要跟spark scala运行版本一致,默认的是本机装的Scala版本2.12.8,spark上运行会报主类错误

2 新建project ,pom.xml添加依赖



    4.0.0
    com.ny.service
    scala517
    1.0
    
    
        
        
        
            org.scala-lang
            scala-library
            2.11.12
        
    
        org.apache.spark
        spark-core_2.11
        2.4.3
    
    
    
        
        src/main/scala
        
            
                org.scala-tools
                maven-scala-plugin
                2.15.2
                
                    
                        
                            compile
                            testCompile
                        
                    
                
            
            
                org.apache.maven.plugins
                maven-shade-plugin
                2.4.3
                
                    
                        package
                        
                            shade
                        
                        
                            
                                
                                    *:*
                                    
                                        META-INF/*.SF
                                        META-INF/*.DSA
                                        META-INF/*.RSA
                                    
                                
                            
                            
                            
                            
                            
                            
                        
                    
                
            
            
                org.apache.maven.plugins
                maven-compiler-plugin
                
                    1.8
                    1.8
                
            
            
                org.apache.maven.plugins
                maven-jar-plugin
                
                    
                        
                            true
                            false
                            lib/
                            
                            com.ny.service.WordCount
                        
                    
                
            
        
    

scala library  选择spark中的Scala版本 2.11.12 也是目前支持的最近版本

org.apache.spark  也选择2.11   

否则会出现主类错误:

19/05/16 10:52:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:60010 (size: 22.9 KB, free: 366.3 MB)

19/05/16 10:52:03 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:18

Exception in thread "main" java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp

at com.nyc.WordCount$.main(WordCount.scala:24)

at com.nyc.WordCount.main(WordCount.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

如何查看spark 中Scala版本号

进入路径:

/usr/local/opt/spark-2.4.3/jars

IDEA WordCount jar包上传spark是怎么调试及排错的

3 word count测试脚本

package com.ny.service
import org.apache.spark.{SparkConf, SparkContext}
object WordCount{
  def main(args: Array[String]): Unit = {
    // 1 创建配置信息
    val conf = new SparkConf().setAppName("wc")
    // 2 创建spark context sc
     val  sc = new SparkContext(conf)
    // 3 处理逻辑
    //读取文件
    val lines = sc.textFile(args(0))
    //压平
    val words = lines.flatMap(_.split(" "))
    //map
    val k2v = words.map((_,1))
    val results = k2v.reduceByKey(_+_)
    //保存数据
    results.saveAsTextFile(args(1))
    // 4 关闭连接
    sc.stop()
  }
}

4 打包

IDEA WordCount jar包上传spark是怎么调试及排错的    IDEA WordCount jar包上传spark是怎么调试及排错的

复制到spark家目录下,因为standalone模式所以没有启动Hadoop集群

nancylulululu:spark-2.4.3 nancy$ mv /Users/nancy/IdeaProjects/scala517/target/original-scala517-1.0.jar wc.jar 

5 spark submit 执行

bin/spark-submit \
--class com.ny.service.WordCount \
--master spark://localhost:7077 \
./wc.jar \
file:///usr/local/opt/spark-2.4.3/test/1test \
file:///usr/local/opt/spark-2.4.3/test/out

如果是Hadoop file改为hdfs文件系统路径 

查看执行结果文件:

nancylulululu:out nancy$ ls
_SUCCESSpart-00000part-00001
nancylulululu:out nancy$ cat part-00000
(scala,2)
(hive,1)
(MySQL,1)
(hello,5)
(java,2)

关于IDEA WordCount jar包上传spark是怎么调试及排错的就分享到这里了,希望以上内容可以对大家有一定的帮助,可以学到更多知识。如果觉得文章不错,可以把它分享出去让更多的人看到。


本文名称:IDEAWordCountjar包上传spark是怎么调试及排错的
文章分享:http://ybzwz.com/article/pcicph.html