apache spark - SparkSQL, Thrift Server and Tableau -
i wondering if there way make sparksql
table in sqlcontext
directly visible other processes, example tableau.
i did research on thrift server, didn't find specific explanation it. middleware between hive(database) , application(client)? if so, need write hive table in spark program?
when use beeline check tables thrift server, there's field istemptable
. know mean? i'm guessing temp table in sqlcontext
of thrift server, because read spark driver program , cached tables visible through multiple programs. confusion here is, if driver program, workers?
to summarize,
- where should write dataframe, or tables in sqlcontext to? method should use(like
dataframe.write.mode(savemode.append).saveastable()
)? - should default settings used thrift server? or changes necessary?
thanks
i assume you've moved on now, comes across answer, thrift server broker between jdbc connection , sparksql.
once you've got thrift running (see spark docs basic intro), connect on jdbc using hive jdbc drivers thrift, , in turn relays sql queries spark using hivecontext.
if have full hive metastore , running, should able see hive tables in jdbc client immediately, otherwise can create tables on demand running commands in jdbc client:
create table data1 using org.apache.spark.sql.parquet options (path "/path/to/parquetfile"); create table data2 using org.apache.spark.sql.json options (path "/path/to/jsonfile");
hope helps little.
Comments
Post a Comment