apache spark - SparkSQL, Thrift Server and Tableau -


i wondering if there way make sparksql table in sqlcontext directly visible other processes, example tableau.

i did research on thrift server, didn't find specific explanation it. middleware between hive(database) , application(client)? if so, need write hive table in spark program?

when use beeline check tables thrift server, there's field istemptable. know mean? i'm guessing temp table in sqlcontext of thrift server, because read spark driver program , cached tables visible through multiple programs. confusion here is, if driver program, workers?

to summarize,

  1. where should write dataframe, or tables in sqlcontext to? method should use(like dataframe.write.mode(savemode.append).saveastable())?
  2. should default settings used thrift server? or changes necessary?

thanks

i assume you've moved on now, comes across answer, thrift server broker between jdbc connection , sparksql.

once you've got thrift running (see spark docs basic intro), connect on jdbc using hive jdbc drivers thrift, , in turn relays sql queries spark using hivecontext.

if have full hive metastore , running, should able see hive tables in jdbc client immediately, otherwise can create tables on demand running commands in jdbc client:

create table data1 using org.apache.spark.sql.parquet options (path "/path/to/parquetfile"); create table data2 using org.apache.spark.sql.json options (path "/path/to/jsonfile"); 

hope helps little.


Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

apache - setting document root in antoher partition on ubuntu -