Reading data from Warp 10
As part of the integration of WarpScript in Pig, a new Load Function is provided which enables you to read data from the Warp 10 Storage Engine.
The load function produces tuples with two elements, a field named id
of type chararray
and a wrapped Geo Time Series encode in the field wrapper
of type bytearray
. The wrapper
field can be decoded using UNWRAP
or UNWRAPENCODER
.
Using the Warp10 LoadFunc
The complete class of the Warp 10 LoadFunc is io.warp10.pig.Warp10LoadFunc
, the syntax for using it to read data in Warp 10 is:
REGISTER warp10-pig.jar;
A = LOAD 'foo' USING io.warp10.pig.Warp10LoadFunc('suffix') AS (id:chararray, wrapper: bytearray);
This loads data from Warp 10 into a Pig relation named A
. The Warp10LoadFunc
instance will determine what to load from parameters which must have been set using the Pig SET
command. The parameters considered are either those whose name is in the table below, or ones with the specified suffix
, i.e. in the example above, warp10.splits.token.suffix
will have precedence before warp10.splits.token
.
The supported parameters are:
Key | Description |
---|---|
warp10.splits.endpoint | URL of the endpoint to access for retrieving splits. Typically http://HOST:PORT/api/v0/splits |
warp10.fetcher.fallbacks | Comma separated list of hosts which can server as fallback fetchers in case one of the fetchers defined for a split is unavailable |
warp10.fetcher.fallbacksonly | Boolean indicating whether to use the fetchers ('false') or only the fallbacks ('true'). Set to 'true ' when retrieving data from a standalone Warp 10 |
warp10.fetcher.protocol | Protocol to use when talking to the fetchers, defaults to http |
warp10.fetcher.port | Port to use when talking to the fetchers, defaults to 8881 |
warp10.fetcher.path | URL patch of the fetcher endpoint, defaults to /api/v0/sfetch |
warp10.splits.selector | Geo Time Series selector to use to retrieve the list of GTS, for example class{label1~regexp1} |
warp10.splits.token | Token to use for retrieving the list of Geo Time Series and later their datapoints |
warp10.http.connect.timeout | Connection timeout to the splits and sfetch endpoints. Defaults to 10000 ms |
warp10.http.read.timeout | Read timeout for the splits and sfetch endpoints, also defaults to 10000 ms |
warp10.fetch.now | Timestamp to use as the now parameter for the datapoints retrieval |
warp10.fetch.timespan | Timespan to use for the datapoints retrieval |
warp10.max.combined.splits | Maximum number of splits to combine in a single split. Each original split corresponds to a single Geo Time Series, do not set to let the InputFormat infer the right number for you |
warp10.max.splits | Maximum number of splits to produce. The InputFormat will combine the individual splits to producer that many splits |