IBM BigInsights for Apache Hadoop

Hadoop

IBM® BigInsights™ for Apache™ Hadoop® adds advanced analytics tools and enterprise services to the Hadoop ecosystem to enhance and streamline workflows for the software architect, data scientist, and business analyst.  With these enterprise grade extensions to Hadoop, BigInsights delivers a robust process for analyzing massive volumes of structured and unstructured data from a variety of sources.

BigInsights includes the following features:

Advanced Engines

  Big SQL: a massively parallel processing SQL engine that deploys directly on the physical Hadoop Distributed File System (HDFS) cluster

Big R: integrates R with BigInsights, providing distributed functionality that abstracts some of the MapReduce complexity

AQL: a rule-based SQL-like language for processing unstructured data

     

Visualization & Exploration

  BigSheets: a spreadsheet-like visualization tool that facilitates the analysis of large volumes of distributed data by generating MapReduce jobs behind the scenes to retrieve and process the necessary data
     

Development Tools

  Eclipse: for developing BigInsights applications that includes a workflow user interface, and an extensive text analytics IDE
 

Workload Optimization

Integrated installer and enhanced securitySplittable text compression and large-scale text indexingAdaptive MapReduce enhancements to balance workload across Map tasks

Flexible scheduler, distributed coordination via ZooKeeper, and coordination of MapReduce jobs via Oozie

 

Administration & Security

Web-based administration console to view HDFS file system, monitor workflow, jobs, and storage, and run applicationsSecurity features include authentication, role-based access to data and console, and a reverse proxy that provides access to the cluster
     

Open Source Components

  Pig: a high-level programming language and runtime environment for Hadoop

Jaql: a high-level JSON-based query language that supports SQL

Hive: a data warehouse infrastructure designed to support batch queries and analysis of files managed by Hadoop

HBase: a column-oriented data storage environment designed to support large, sparsely populated tables in Hadoop

Flume: a facility for collecting and loading data into Hadoop

Lucene: a text search and indexing technology

Avro: a data serialization technology

ZooKeeper: a coordination service for distributed applications

Oozie: a workflow/job orchestration technology

Users can explore IBM Open Platform with Apache Hadoop, Big SQL, and other analytics on Hadoop by installing trial versions of BigInsights: IBM BigInsights Quick Start Edition for Non-Production Environment or BigInsights on Cloud.