What is Shark ?
- Shark is a tool, developed for people who are from a database background – to access Scala MLib capabilities through Hive like SQL interface.
- Shark tool helps data users run Hive on Spark – offering compatibility with Hive metastore, queries and data.
- Like Hive, Spark queries are written using a SQL-like language called HiveQL, which Spark translates into Spark Directed Acyclic Graphs (DAGs) that are executed on the Hadoop cluster.
- More complex queries are supported through User Defined Functions (UDFs) that can be written in Java and referenced by a HiveQL query.