SparkSQL使用beeline访问hive仓库
一、添加hive-site.xml
在$SPARK_HOME/conf下添加hive-site.xml的配置文件,目的是能正常访问hive的元数据
丹寨网站建设公司成都创新互联公司,丹寨网站设计制作,有大型网站制作公司丰富经验。已为丹寨上千余家提供企业网站建设服务。企业网站搭建\成都外贸网站建设要多少钱,请找那个售后服务好的丹寨做网站的公司定做!
vim hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:MySQL://192.168.1.201:3306/hiveDB?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
root
javax.jdo.option.ConnectionPassword
123456
hive.cli.print.header
true
hive.cli.print.current.db
true
注意:在节点上不需要部署hive,只要是你可以连接到hive的元数据就可以!
二、启动thriftserver服务
[hadoop@hadoop003 spark]$ ./sbin/start-thriftserver.sh --jars ~/softwares/mysql-connector-java-5.1.47.jar
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2,
logging to /home/hadoop/app/spark/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop003.out
检查日志,确认thriftserver服务正常启动
[hadoop@hadoop003 spark]$ tail -50f /home/hadoop/app/spark/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop003.out
19/05/21 09:39:14 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/05/21 09:39:15 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
19/05/21 09:39:15 INFO metastore.ObjectStore: Initialized ObjectStore
19/05/21 09:39:15 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
19/05/21 09:39:15 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
19/05/21 09:39:15 INFO metastore.HiveMetaStore: Added admin role in metastore
19/05/21 09:39:15 INFO metastore.HiveMetaStore: Added public role in metastore
19/05/21 09:39:15 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
19/05/21 09:39:15 INFO metastore.HiveMetaStore: 0: get_all_databases
19/05/21 09:39:15 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_all_databases
19/05/21 09:39:15 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
19/05/21 09:39:15 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_functions: db=default pat=*
19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
19/05/21 09:39:16 INFO session.SessionState: Created local directory: /tmp/73df82dd-1fd3-4dd5-97f1-680d53bd44bc_resources
19/05/21 09:39:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc
19/05/21 09:39:16 INFO session.SessionState: Created local directory: /tmp/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc
19/05/21 09:39:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc/_tmp_space.db
19/05/21 09:39:16 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is file:/home/hadoop/app/spark-2.4.2-bin-hadoop-2.6.0-cdh6.7.0/spark-warehouse
19/05/21 09:39:16 INFO session.SessionManager: Operation log root directory is created: /tmp/hadoop/operation_logs
19/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread pool size: 100
19/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread wait queue size: 100
19/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread keepalive time: 10 seconds
19/05/21 09:39:16 INFO service.AbstractService: Service:OperationManager is inited.
19/05/21 09:39:16 INFO service.AbstractService: Service:SessionManager is inited.
19/05/21 09:39:16 INFO service.AbstractService: Service: CLIService is inited.
19/05/21 09:39:16 INFO service.AbstractService: Service:ThriftBinaryCLIService is inited.
19/05/21 09:39:16 INFO service.AbstractService: Service: HiveServer2 is inited.
19/05/21 09:39:16 INFO service.AbstractService: Service:OperationManager is started.
19/05/21 09:39:16 INFO service.AbstractService: Service:SessionManager is started.
19/05/21 09:39:16 INFO service.AbstractService: Service:CLIService is started.
19/05/21 09:39:16 INFO metastore.ObjectStore: ObjectStore, initialize called
19/05/21 09:39:16 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
19/05/21 09:39:16 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
19/05/21 09:39:16 INFO metastore.ObjectStore: Initialized ObjectStore
19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: get_databases: default
19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_databases: default
19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: Shutting down the object store...
19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Shutting down the object store...
19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: Metastore shutdown complete.
19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Metastore shutdown complete.
19/05/21 09:39:16 INFO service.AbstractService: Service:ThriftBinaryCLIService is started.
19/05/21 09:39:16 INFO service.AbstractService: Service:HiveServer2 is started.
19/05/21 09:39:16 INFO thriftserver.HiveThriftServer2: HiveThriftServer2 started
19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a7c72af{/sqlserver,null,AVAILABLE,@Spark}
19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e24bab6{/sqlserver/json,null,AVAILABLE,@Spark}
19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@391e2a{/sqlserver/session,null,AVAILABLE,@Spark}
19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@641cea11{/sqlserver/session/json,null,AVAILABLE,@Spark}
19/05/21 09:39:16 INFO thrift.ThriftCLIService:
Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads#标志启动成功
三、启动beeline
[hadoop@hadoop003 spark]$ ./bin/beeline -u jdbc:hive2://localhost:10000 -n hadoop
Connecting to jdbc:hive2://localhost:10000
19/05/21 09:46:19 INFO jdbc.Utils: Supplied authorities: localhost:10000
19/05/21 09:46:19 INFO jdbc.Utils: Resolved authority: localhost:10000
19/05/21 09:46:19 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.4.2)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://localhost:10000> select * from student.student limit 5;
+---------+-----------+-----------------+--------------------------------------------+--+
| stu_id | stu_name | stu_phone_num | stu_email |
+---------+-----------+-----------------+--------------------------------------------+--+
| 1 | Burke | 1-300-746-8446 | ullamcorper.velit.in@ametnullaDonec.co.uk |
| 2 | Kamal | 1-668-571-5046 | pede.Suspendisse@interdumenim.edu |
| 3 | Olga | 1-956-311-1686 | Aenean.eget.metus@dictumcursusNunc.edu |
| 4 | Belle | 1-246-894-6340 | vitae.aliquet.nec@neque.co.uk |
| 5 | Trevor | 1-300-527-4967 | dapibus.id@acturpisegestas.net |
+---------+-----------+-----------------+--------------------------------------------+--+
5 rows selected (3.275 seconds)
0: jdbc:hive2://localhost:10000>
启动成功
四、注意
1、最好在spark/bin目录下启动beeline
因为如果你启动sparkbeeline的机器还部署了hive,恰巧你的hive环境变量正好在spark环境变量之前,那么很可能启动的是hive的beeline
比如:
[hadoop@hadoop003 spark]$ beeline
ls: cannot access /home/hadoop/app/spark/lib/spark-assembly-*.jar: No such file or directory
which: no hbase in (/home/hadoop/app/hive/bin:/home/hadoop/app/spark/bin:/home/hadoop/app/hadoop-2.6.0-cdh6.7.0//bin:/home/hadoop/app/hadoop-2.6.0-cdh6.7.0//sbin:/home/hadoop/app/zookeeper/bin:/usr/java/jdk1.8.0_131/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hadoop/bin)
Beeline version 1.1.0-cdh6.7.0 by Apache Hive # 这不就是hive么
beeline>
此时你查看下环境变量
[hadoop@hadoop003 spark]$ cat ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
#####JAVA_HOME#####
export JAVA_HOME=/usr/java/jdk1.8.0_131
####ZOOKEEPER_HOME####
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
#####HADOOP_HOME######
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh6.7.0/
export SPARK_HOME=/home/hadoop/app/spark
#####HIVE_HOME#####
export HIVE_HOME=/home/hadoop/app/hive
export PATH=$HIVE_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$PATH
果然如果不指定beeline路径就会优先使用hive的beeline
本文题目:SparkSQL使用beeline访问hive仓库
文章路径:http://ybzwz.com/article/gesecd.html