Loading

Reading data from Warp 10

As part of the integration of WarpScript in Pig, a new Load Function is provided which enables you to read data from the Warp 10 Storage Engine.

The load function produces tuples with two elements, a field named id of type chararray and a wrapped Geo Time Series encode in the field wrapper of type bytearray. The wrapper field can be decoded using UNWRAP or UNWRAPENCODER.

Using the Warp10 LoadFunc

The complete class of the Warp 10 LoadFunc is io.warp10.pig.Warp10LoadFunc, the syntax for using it to read data in Warp 10 is:

REGISTER warp10-pig.jar;

A = LOAD 'foo' USING io.warp10.pig.Warp10LoadFunc('suffix') AS (id:chararray, wrapper: bytearray);

This loads data from Warp 10 into a Pig relation named A. The Warp10LoadFunc instance will determine what to load from parameters which must have been set using the Pig SET command. The parameters considered are either those whose name is in the table below, or ones with the specified suffix, i.e. in the example above, warp10.splits.token.suffix will have precedence before warp10.splits.token.

The supported parameters are:

KeyDescription
warp10.splits.endpointURL of the endpoint to access for retrieving splits. Typically http://HOST:PORT/api/v0/splits
warp10.fetcher.fallbacksComma separated list of hosts which can server as fallback fetchers in case one of the fetchers defined for a split is unavailable
warp10.fetcher.fallbacksonlyBoolean indicating whether to use the fetchers ('false') or only the fallbacks ('true'). Set to 'true' when retrieving data from a standalone Warp 10
warp10.fetcher.protocolProtocol to use when talking to the fetchers, defaults to http
warp10.fetcher.portPort to use when talking to the fetchers, defaults to 8881
warp10.fetcher.pathURL patch of the fetcher endpoint, defaults to /api/v0/sfetch
warp10.splits.selectorGeo Time Series selector to use to retrieve the list of GTS, for example class{label1~regexp1}
warp10.splits.tokenToken to use for retrieving the list of Geo Time Series and later their datapoints
warp10.http.connect.timeoutConnection timeout to the splits and sfetch endpoints. Defaults to 10000 ms
warp10.http.read.timeoutRead timeout for the splits and sfetch endpoints, also defaults to 10000 ms
warp10.fetch.nowTimestamp to use as the now parameter for the datapoints retrieval
warp10.fetch.timespanTimespan to use for the datapoints retrieval
warp10.max.combined.splitsMaximum number of splits to combine in a single split. Each original split corresponds to a single Geo Time Series, do not set to let the InputFormat infer the right number for you
warp10.max.splitsMaximum number of splits to produce. The InputFormat will combine the individual splits to producer that many splits