What is cloudera impala ?

Cloudera’s Impala

Impala was the first to bring SQL querying to the public in April 2013. Impala comes with a bunch of interesting features:

Impala can query many file format such as Parquet, Avro, Text, RCFile, SequenceFile
Impala supports data stored in HDFS, Apache HBase and Amazon S3
Impala supports multiple compression codecs:
- Snappy (Recommended for its effective balance between compression ratio and decompression speed),
- Gzip (Recommended when achieving the highest level of compression),
- Deflate (not supported for text files), Bzip2, LZO (for text files only);
Impala provides security through authorization based on Sentry (OS user ID)
- Defining which users are allowed to access which resources,
- What operations are they allowed to perform authentication based on Kerberos + ability to specify Active Directory username/password,
- How does Impala verify the identity of the users to confirm that they are allowed exercise their privileges assigned to that user auditing,
- What operations were attempted,
- Did they succeed or not, allowing to track down suspicious activity; audit data are collected by Cloudera Manager;
Impala supports SSL network encryption between Impala and client programs, and between the Impala-related daemons running on different nodes in the cluster;
Impala allows to use UDFs and UDAFs;
Impala orders the joins automatically to be the most efficient;
Impala allows admission control – prioritization and queueing of queries within impala;
Impala allows multi-user concurrent queries;
Impala caches frequently accessed data in memory;
Impala computes statistics (with COMPUTE STATS);
Impala provides window functions (aggregation OVER PARTITION, RANK, LEAD, LAG, NTILE, and so on) – to provide more advanced SQL analytic capabilities (since version 2.0);
Impala allows external joins and aggregation using disk (since version 2.0) – enables operations to spill to disk if their internal state exceeds the aggregate memory size;
Impala allows subqueries inside WHERE clauses;
Impala allows incremental statistics – only run statistics on the new or changed data for even faster statistics computations;
Impala enables queries on complex nested structures including maps, structs and arrays;
Impala enables merging (MERGE) in updates into existing tables;
Impala enables some OLAP functions (ROLLUP, CUBE, GROUPING SET);
Impala allows use of impala for inserts and updates into HBase.

Categorized in:

Cloudera Impala

Tagged in:

Cloudera’s Impala

Leave a Reply

Other Stories

What is Cloudera’s technology stack ?

Communication between vCenter Server and ESX ?

What is the output of the following pseudo code ? Int a = 456, b, c, d = 10; b = a/d; c = a-b; print c ?

A mother her little daughter and her just born infant boy together stood on a weighing machine which shows 74kgs. How much does the daughter weight if the mother weights 46kg more than the combined weight of daughter and the infant and the infant weights 60% less than the daughter ?

16, 24, 48, 120, 360, 1260, ?

A number is divided by 5, 3, 2 successively in order to get remainders of 0, 2, 1 respectively. What will be the remainder when the same number is divided by 2, 3, 5 respectively ?

Project Details for DOTNET / JAVA /PHP

UI/UX Design Projects

Project Details for Python

Cyber Security Projects

Project Details for Java

Ads Blocker Detected!!!

Press ESC to close

Or check our Popular Categories...

Cloudera’s Impala

Leave a Reply

Related Articles

Other Stories

What is Cloudera’s technology stack ?

Communication between vCenter Server and ESX ?

Ads Blocker Detected!!!