Warp 10 and Hadoop

The ties between Warp 10 and Hadoop go beyond the distributed version of Warp 10 which relies on HBase.

Time Series analysis is something that can be done in an interactive way only until a certain point when the volume of series or datapoints to handle becomes too big for a single server to handle. At this point, only tools such as Pig, Spark or Flink can handle the load due to their distributed manner and the parallel processing they allow.

Those tools have common grounds, namely that they were initially built for Hadoop, and as such they still leverage Hadoop Input and Output formats for reading and writing data.

Warp 10 provides two Hadoop InputFormat and one OutputFormat, making those tools, and any tool able to use Hadoop Input/Output Formats able to read and write data in and out of Warp 10, and use WarpScript for processing input from any Hadoop compatible data source.

The Warp10InputFormat enables reading data from a Warp 10 Storage Engine, whether distributed or standalone.

The Warp10OutputFormat enables storing Geo Time Series into a Warp 10 Storage Engine instance, again, distributed or standalone.

The WarpScriptInputFormat is a powerful tool for wrapping any other Hadoop InputFormat and process the read records on the fly with WarpScript code prior to returning them to the reading process.