Loading

High Availability using datalog

datalog_replication

Intro

The Warp 10 standalone version can be made highly available via a feature called datalog which will replicate data from one instance to another.

Using datalog, one can configure multiple standalone Warp 10 instances to replicate data to each other, allowing for smart architectures combining in-memory and disk-based instances.

Overview

When the datalog feature is enabled, any update, meta or delete request is logged in a datalog file in a specific directory. Those files contain all the information needed to replay the request in a coherent manner on a remote Warp 10 instance. This includes the token used for the request and any information such as the datapoints, GTS selectors or attributes. These files are named in a way which will allow the replay in order, thus guaranteeing the coherency between the original data and its replicas.

The datalog files are handled by a datalog forwarder which will read them in the order in which they were produced and forward them to a designated remote Warp 10 instance. The files will then be moved to the datalog.forwarder.dstdir directory. This directory will grow over time, you should consider to enable the datalog.forwarder.deleteforwarded key for cleaning purpose, same for datalog.forwarder.deleteignored.

Each Warp 10 instance participating in the datalog chain must have a unique id and select ids can be excluded from replication by a forwarder. This mechanism is in place to avoid loops in the replication chain while still authorizing cross or circular replication between instances.

datalog_ring

Note that datalog is a feature unique to the standalone version of Warp 10, distributed Warp 10 instances can only be defined as datalog forwarder targets.

datalog_dist

Configuration

datalog configuration is done by setting the datalog properties in the Warp 10 configuration file. See the template for a description of those properties.

Here is an example of a datalog configuration for 3 servers using a cross replication. This sample is only for one instance, the other instances have to adapt ids and endpoints

/////////////////////////////////////////////////////////////////////////////////////////
//
// D A T A L O G
//
/////////////////////////////////////////////////////////////////////////////////////////

//
// Datalogging directory. If set, every data modification action (UPDATE, META, DELETE) will produce
// a file in this directory with the timestamp, the token used and the action type. These files can then
// be used to update another instance of Warp 10
//
datalog.dir = ${standalone.home}/datalog

//
// Set datalog.sync to true to force a call to fsync upon closing each datalog file, thus ensuring
// the file blocks are written to the underlying device. This defaults to false if the property is not set.
//
datalog.sync = true

//
// Unique id for this datalog instance.
//
datalog.id = instance-A

//
// Set this property to 'false' to skip logging forwarded requests or to 'true' if you want to log them to
// forward them to an additional hop.
//
datalog.logforwarded = true

//
// Comma separated list of datalog forwarders. Configuration of each forwarder is done via datalog configuration
// keys suffixed with '.name' (eg .xxx, .yyy), except for datalog.psk which is common to all forwarders.
//
datalog.forwarders = instance-B, instance-C

//
// Directory where datalog files to forward reside. If this property and 'datalog.forwarder.dstdir' are set, then
// the DatalogForwarder daemon will run.
// When running multiple datalog forwarders, all their srcdir MUST be on the same device as the 'datalog.dir' directory
// as hard links are used to make the data available to the forwarders
//
datalog.forwarder.srcdir.instance-B = ${datalog.dir}/instance-B
datalog.forwarder.srcdir.instance-C = ${datalog.dir}/instance-C

//
// Directory where forwarded datalog files will be moved. MUST be on the same device as datalog.forwarder.srcdir
//
datalog.forwarder.dstdir.instance-B = ${datalog.dir}_done/instance-B
datalog.forwarder.dstdir.instance-C = ${datalog.dir}_done/instance-C

//
// Comma separated list of datalog ids which should not be forwarded. This is used to avoid loops.
//
datalog.forwarder.ignored.instance-B = ${datalog.forwarders}
datalog.forwarder.ignored.instance-C = ${datalog.forwarders}

//
// Endpoint to use when forwarding datalog UPDATE requests.
//
datalog.forwarder.endpoint.update.instance-B = http://host-instance-B:port-instance-B/api/v0/update
datalog.forwarder.endpoint.update.instance-C = http://host-instance-C:port-instance-C/api/v0/update

//
// Endpoint to use when forwarding datalog META requests.
//
datalog.forwarder.endpoint.meta.instance-B = http://host-instance-B:port-instance-B/api/v0/meta
datalog.forwarder.endpoint.meta.instance-C = http://host-instance-C:port-instance-C/api/v0/meta

//
// Endpoint to use when forwarding datalog DELETE requests.
//
datalog.forwarder.endpoint.delete.instance-B = http://host-instance-B:port-instance-B/api/v0/delete
datalog.forwarder.endpoint.delete.instance-C = http://host-instance-C:port-instance-C/api/v0/delete

Setting up replication to multiple targets

The usual way of replicating data from one instance of Warp 10 to multiple remote instances is to configure datalog on each one of them by specifying the remote instance so they form a chain or cycle. This works well and is the simplest architecture, however, if one instance of the chain is down, the ones further down will not get replicated data and will lag behind for as long as the down instance is not back up.

If you wish to mitigate this situation, you may set up an architecture where one datalog instance replicates data to multiple remote instances. Each forwarder will run in its own JVM with its own configuration file. The source directory for each forwarder needs to be different and MUST be on the same device as the datalog directory of the primary instance but no forwarder should have the datalog directory as its source directory. An external script will periodically scan the datalog directory and create hard links for each file into the source directory of each forwarder, this way each forwarder will be able to function independently.