TestHiveContext (Spark 1.1.1 JavaDoc)

Object
- org.apache.spark.sql.SQLContext
- - org.apache.spark.sql.hive.HiveContext
  - - org.apache.spark.sql.hive.test.TestHiveContext

All Implemented Interfaces:

java.io.Serializable, Logging

Direct Known Subclasses:

TestHive
```
public class TestHiveContext
extends HiveContext
```
A locally running test instance of Spark's Hive execution engine.
Data from testTables will be automatically loaded whenever a query is run over those tables. Calling reset will delete all tables and other state in the database, leaving the database in a "clean" state.
TestHive is singleton object version of this class because instantiating multiple copies of the hive metastore seems to lead to weird non-deterministic failures. Therefore, the execution of test cases that rely on TestHive must be serialized.

See Also:
Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`org.apache.spark.sql.SQLConf.Deprecated$`
`class`	`TestHiveContext.QueryExecution` Override QueryExecution with special debug workflow.
`class`	`TestHiveContext.TestTable`

Constructor Summary

Constructors
Constructor and Description

TestHiveContext(SparkContext sc)

Constructors
Constructor and Description
`TestHiveContext(SparkContext sc)`

Method Summary

Methods
Modifier and Type	Method and Description
`String`	`AUTO_BROADCASTJOIN_THRESHOLD()`
`int`	`autoBroadcastJoinThreshold()` Upper bound on the sizes (in bytes) of the tables qualified for the auto conversion to a broadcast value during the physical executions of join operations.
`boolean`	`cacheTables()`
`void`	`clear()`
`String`	`CODEGEN_ENABLED()`
`boolean`	`codegenEnabled()` When set to true, Spark SQL will use the Scala compiler at runtime to generate custom bytecode that evaluates expressions found in queries.
`String`	`COLUMN_BATCH_SIZE()`
`int`	`columnBatchSize()` The number of rows that will be
`String`	`COMPRESS_CACHED()`
`String`	`DEFAULT_SIZE_IN_BYTES()`
`long`	`defaultSizeInBytes()` The default size in bytes to assign to a logical operator's estimation statistics.
`scala.util.matching.Regex`	`describedTable()`
`String`	`DIALECT()`
`TestHiveContext.QueryExecution`	`executePlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)`
`scala.collection.immutable.Map<String,String>`	`getAllConfs()` Return all the configuration properties that have been set (i.e.
`String`	`getConf(String key)` Return the value of Spark SQL configuration property for the given key.
`String`	`getConf(String key, String defaultValue)` Return the value of Spark SQL configuration property for the given key.
`java.io.File`	`getHiveFile(String path)`
`scala.Option<java.io.File>`	`hiveDevHome()` The location of the hive source code.
`java.io.File`	`hiveFilesTemp()`
`scala.Option<java.io.File>`	`hiveHome()` The location of the compiled hive distribution
`scala.collection.Seq<TestHiveContext.TestTable>`	`hiveQTestUtilTables()`
`java.io.File`	`inRepoTests()`
`boolean`	`isParquetBinaryAsString()` When set to true, we always treat byte arrays in Parquet files as strings.
`void`	`loadTestTable(String name)`
`String`	`metastorePath()`
`String`	`PARQUET_BINARY_AS_STRING()`
`String`	`PARQUET_CACHE_METADATA()`
`String`	`PARQUET_COMPRESSION()`
`String`	`parquetCompressionCodec()` The compression codec for writing to a Parquetfile
`<T> void`	`registerFunction(String name, scala.Function1<?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)` registerFunction 1-22 were generated by this script
`<T> void`	`registerFunction(String name, scala.Function10<?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$10)`
`<T> void`	`registerFunction(String name, scala.Function11<?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$11)`
`<T> void`	`registerFunction(String name, scala.Function12<?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$12)`
`<T> void`	`registerFunction(String name, scala.Function13<?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$13)`
`<T> void`	`registerFunction(String name, scala.Function14<?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$14)`
`<T> void`	`registerFunction(String name, scala.Function15<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$15)`
`<T> void`	`registerFunction(String name, scala.Function16<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$16)`
`<T> void`	`registerFunction(String name, scala.Function17<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$17)`
`<T> void`	`registerFunction(String name, scala.Function18<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$18)`
`<T> void`	`registerFunction(String name, scala.Function19<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$19)`
`<T> void`	`registerFunction(String name, scala.Function2<?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$2)`
`<T> void`	`registerFunction(String name, scala.Function20<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$20)`
`<T> void`	`registerFunction(String name, scala.Function21<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$21)`
`<T> void`	`registerFunction(String name, scala.Function22<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$22)`
`<T> void`	`registerFunction(String name, scala.Function3<?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$3)`
`<T> void`	`registerFunction(String name, scala.Function4<?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$4)`
`<T> void`	`registerFunction(String name, scala.Function5<?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$5)`
`<T> void`	`registerFunction(String name, scala.Function6<?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$6)`
`<T> void`	`registerFunction(String name, scala.Function7<?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$7)`
`<T> void`	`registerFunction(String name, scala.Function8<?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$8)`
`<T> void`	`registerFunction(String name, scala.Function9<?,?,?,?,?,?,?,?,?,T> func, scala.reflect.api.TypeTags.TypeTag<T> evidence$9)`
`void`	`registerPython(String name, byte[] command, java.util.Map<String,String> envVars, java.util.List<String> pythonIncludes, String pythonExec, Accumulator<java.util.List<byte[]>> accumulator, String stringDataType)`
`scala.collection.mutable.HashMap<String,TestHiveContext.TestTable>`	`registerTestTable(TestHiveContext.TestTable testTable)`
`void`	`reset()` Resets the test instance by deleting any tables that have been created.
`scala.collection.Seq<String>`	`runSqlHive(String sql)` Runs the specified SQL query using Hive.
`void`	`setConf(java.util.Properties props)` Set Spark SQL configuration properties.
`java.util.Map<String,String>`	`settings()` Only low degree of contention is expected for conf, thus NOT using ConcurrentHashMap.
`String`	`SHUFFLE_PARTITIONS()`
`scala.collection.mutable.HashMap<String,TestHiveContext.TestTable>`	`testTables()` A list of test tables and the DDL required to initialize them.
`java.io.File`	`testTempDir()`
`String`	`THRIFTSERVER_POOL()`
`boolean`	`useCompression()` When true tables cached using the in-memory columnar caching will be compressed.
`String`	`warehousePath()`

Methods inherited from class org.apache.spark.sql.hive.HiveContext
analyze, createTable, hivePlanner, hiveql, hql, setConf, sql

Methods inherited from class org.apache.spark.sql.SQLContext
applySchema, cacheTable, createParquetFile, createSchemaRDD, isCached, jsonFile, jsonFile, jsonFile, jsonRDD, jsonRDD, jsonRDD, logicalPlanToSparkQuery, parquetFile, registerRDDAsTable, sparkContext, table, uncacheTable

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initialized, initializeIfNecessary, initializeLogging, initLock, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Constructor Detail

TestHiveContext

public TestHiveContext(SparkContext sc)

Method Detail

warehousePath
```
public String warehousePath()
```

metastorePath
```
public String metastorePath()
```

testTempDir
```
public java.io.File testTempDir()
```

hiveHome
```
public scala.Option<java.io.File> hiveHome()
```
The location of the compiled hive distribution

hiveDevHome

public scala.Option<java.io.File> hiveDevHome()

The location of the hive source code.

runSqlHive
```
public scala.collection.Seq<String> runSqlHive(String sql)
```
Description copied from class: HiveContext

Runs the specified SQL query using Hive.

executePlan

public TestHiveContext.QueryExecution executePlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan)

hiveFilesTemp
```
public java.io.File hiveFilesTemp()
```

inRepoTests
```
public java.io.File inRepoTests()
```

getHiveFile

public java.io.File getHiveFile(String path)

describedTable

public scala.util.matching.Regex describedTable()

testTables
```
public scala.collection.mutable.HashMap<String,TestHiveContext.TestTable> testTables()
```
A list of test tables and the DDL required to initialize them. A test table is loaded on demand when a query are run against it.

registerTestTable

public scala.collection.mutable.HashMap<String,TestHiveContext.TestTable> registerTestTable(TestHiveContext.TestTable testTable)

hiveQTestUtilTables

public scala.collection.Seq<TestHiveContext.TestTable> hiveQTestUtilTables()

cacheTables
```
public boolean cacheTables()
```

loadTestTable

public void loadTestTable(String name)

reset
```
public void reset()
```
Resets the test instance by deleting any tables that have been created. TODO: also clear out UDFs, views, etc.

COMPRESS_CACHED
```
public String COMPRESS_CACHED()
```

COLUMN_BATCH_SIZE
```
public String COLUMN_BATCH_SIZE()
```

AUTO_BROADCASTJOIN_THRESHOLD

public String AUTO_BROADCASTJOIN_THRESHOLD()

DEFAULT_SIZE_IN_BYTES
```
public String DEFAULT_SIZE_IN_BYTES()
```

SHUFFLE_PARTITIONS
```
public String SHUFFLE_PARTITIONS()
```

CODEGEN_ENABLED
```
public String CODEGEN_ENABLED()
```

DIALECT
```
public String DIALECT()
```

PARQUET_BINARY_AS_STRING

public String PARQUET_BINARY_AS_STRING()

PARQUET_CACHE_METADATA

public String PARQUET_CACHE_METADATA()

PARQUET_COMPRESSION
```
public String PARQUET_COMPRESSION()
```

THRIFTSERVER_POOL
```
public String THRIFTSERVER_POOL()
```

settings
```
public java.util.Map<String,String> settings()
```
Only low degree of contention is expected for conf, thus NOT using ConcurrentHashMap.

useCompression
```
public boolean useCompression()
```
When true tables cached using the in-memory columnar caching will be compressed.

parquetCompressionCodec
```
public String parquetCompressionCodec()
```
The compression codec for writing to a Parquetfile

columnBatchSize
```
public int columnBatchSize()
```
The number of rows that will be

codegenEnabled
```
public boolean codegenEnabled()
```
When set to true, Spark SQL will use the Scala compiler at runtime to generate custom bytecode that evaluates expressions found in queries. In general this custom code runs much faster than interpreted evaluation, but there are significant start-up costs due to compilation. As a result codegen is only beneficial when queries run for a long time, or when the same expressions are used multiple times.
Defaults to false as this feature is currently experimental.

autoBroadcastJoinThreshold
```
public int autoBroadcastJoinThreshold()
```
Upper bound on the sizes (in bytes) of the tables qualified for the auto conversion to a broadcast value during the physical executions of join operations. Setting this to -1 effectively disables auto conversion.
Hive setting: hive.auto.convert.join.noconditionaltask.size, whose default value is also 10000.

defaultSizeInBytes
```
public long defaultSizeInBytes()
```
The default size in bytes to assign to a logical operator's estimation statistics. By default, it is set to a larger value than autoBroadcastJoinThreshold, hence any logical operator without a properly implemented estimation of this statistic will not be incorrectly broadcasted in joins.

isParquetBinaryAsString
```
public boolean isParquetBinaryAsString()
```
When set to true, we always treat byte arrays in Parquet files as strings.

setConf

public void setConf(java.util.Properties props)

Set Spark SQL configuration properties.

getConf
```
public String getConf(String key)
```
Return the value of Spark SQL configuration property for the given key.

getConf
```
public String getConf(String key,
             String defaultValue)
```
Return the value of Spark SQL configuration property for the given key. If the key is not set yet, return defaultValue.

getAllConfs
```
public scala.collection.immutable.Map<String,String> getAllConfs()
```
Return all the configuration properties that have been set (i.e. not the default). This creates a new copy of the config properties in the form of a Map.

clear
```
public void clear()
```

registerPython

public void registerPython(String name,
                  byte[] command,
                  java.util.Map<String,String> envVars,
                  java.util.List<String> pythonIncludes,
                  String pythonExec,
                  Accumulator<java.util.List<byte[]>> accumulator,
                  String stringDataType)

registerFunction
```
public <T> void registerFunction(String name,
                        scala.Function1<?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
```
registerFunction 1-22 were generated by this script
(1 to 22).map { x => val types = (1 to x).map(x => "_").reduce(_ + ", " + _) s""" def registerFunction[T: TypeTag](name: String, func: Function$x[$types, T]): Unit = { def builder(e: Seq[Expression]) = ScalaUdf(func, ScalaReflection.schemaFor(typeTag[T]).dataType, e) functionRegistry.registerFunction(name, builder) } """ }

registerFunction

public <T> void registerFunction(String name,
                        scala.Function2<?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$2)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function3<?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$3)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function4<?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$4)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function5<?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$5)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function6<?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$6)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function7<?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$7)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function8<?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$8)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function9<?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$9)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function10<?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$10)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function11<?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$11)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function12<?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$12)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function13<?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$13)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function14<?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$14)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function15<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$15)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function16<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$16)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function17<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$17)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function18<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$18)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function19<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$19)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function20<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$20)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function21<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$21)

registerFunction

public <T> void registerFunction(String name,
                        scala.Function22<?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,T> func,
                        scala.reflect.api.TypeTags.TypeTag<T> evidence$22)

Class TestHiveContext

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.sql.hive.HiveContext

Methods inherited from class org.apache.spark.sql.SQLContext

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

TestHiveContext

Method Detail

warehousePath

metastorePath

testTempDir

hiveHome

hiveDevHome

runSqlHive

executePlan

hiveFilesTemp

inRepoTests

getHiveFile

describedTable

testTables

registerTestTable

hiveQTestUtilTables

cacheTables

loadTestTable

reset

COMPRESS_CACHED

COLUMN_BATCH_SIZE

AUTO_BROADCASTJOIN_THRESHOLD

DEFAULT_SIZE_IN_BYTES

SHUFFLE_PARTITIONS

CODEGEN_ENABLED

DIALECT

PARQUET_BINARY_AS_STRING

PARQUET_CACHE_METADATA

PARQUET_COMPRESSION

THRIFTSERVER_POOL

settings

useCompression

parquetCompressionCodec

columnBatchSize

codegenEnabled

autoBroadcastJoinThreshold

defaultSizeInBytes

isParquetBinaryAsString

setConf

getConf

getConf

getAllConfs

clear

registerPython

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction

registerFunction