Import vector data
For more information about how to import the vector data of Lindorm (HBase Enhanced Edition) into Data Lake Analytics (DLA), see Quick start.
Import raster data
- Pipeline technology
The pipeline model uses the extract, transform, and load (ETL) technology developed by DLA Ganos based on the GeoTrellis open source project for fast loading, processing, and importing of raster data.
This model consists of several functional modules, such as Load, Transform, and Save. The pipeline model is typically represented by JSON objects. Primary JSON objects are called pipelines. A pipeline is an array of steps to be executed. Other JSON objects are called stage objects. The pipeline process and related parameter settings for the import operation of DLA Ganos are defined in a JSON object. The following example shows a simple JSON script:[ { "uri" : "URI of OSS resources", "type" : "singleband.spatial.read.oss" }, { "resample_method" : "nearest-neighbor", "type" : "singleband.spatial.transform.tile-to-layout" }, { "crs" : "EPSG:3857", "scheme" : { "crs" : "epsg:3857", "tileSize" : 256, "resolutionThreshold" : 0.1 }, "resample_method" : "nearest-neighbor", "type" : "singleband.spatial.transform.buffered-reproject" }, { "end_zoom" : 0, "resample_method" : "nearest-neighbor", "type" : "singleband.spatial.transform.pyramid" }, { "name" : "mask", "uri" : "oss://geotrellis-test/colingw/pipeline/", "key_index_method" : { "type" : "zorder" }, "scheme" : { "crs" : "epsg:3857", "tileSize" : 256, "resolutionThreshold" : 0.1 }, "type" : "singleband.spatial.write" } ]
- Import process
- Import related dependencies.
import geotrellis.layer._ import geotrellis.spark.pipeline._ import geotrellis.spark.pipeline.json._ import geotrellis.spark._ import geotrellis.spark.store.kryo.KryoRegistrator import org.apache.spark.{ SparkConf, SparkContext} import scala.util.{ Failure, Try}
- Initialize the Serverless Spark environment.
val conf = new SparkConf() .setMaster("local[*]") .setAppName("Spark Tiler") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryo.registrator", classOf[KryoRegistrator].getName) conf.set("spark.kryoserializer.buffer.max", "2047m") implicit val sc = new SparkContext(conf)
- Define the pipeline JSON description.The following example shows a simple pipeline model, which is defined based on the following information:
- URI and load driver of the imported file
- Data layout scheme (tile-to-layout)
- Data conversion and projection
- Data writing address (Lindorm)
val pipeline: String = """ |[ | { | "uri" : "URI of OSS resources", | "time_tag" : "TIFFTAG_DATETIME", | "time_format" : "yyyy:MM:dd HH:mm:ss", | "type" : "singleband.spatial.read.hadoop" | }, | { | "resample_method" : "nearest-neighbor", | "type" : "singleband.spatial.transform.tile-to-layout" | }, | { | "crs" : "EPSG:3857", | "scheme" : { | "crs" : "EPSG:3857", | "tileSize" : 256, | "resolutionThreshold":0.1 | }, | "resample_method" : "nearest-neighbor", | "type" : "singleband.spatial.transform.buffered-reproject" | }, | { | "end_zoom" : 0, | "resample_method" : "nearest-neighbor", | "type" : "singleband.spatial.transform.pyramid" | }, | { | "name" : "srtm", | "uri" : "hbase://localhost:2181? master=localhost&attributes=attributes&layers=srtm-tms-layers", | "pyramid" : true, | "key_index_method" : { | "type" : "zorder" | }, | "scheme" : { | "tileCols" : 256, | "tileRows" : 256 | }, | "type" : "singleband.spatial.write" | } |] """.stripMargin
- Run the pipeline model.
//Parse the pipeline model of the JSON description and generate a set of expressions: val list: List[PipelineExpr] = pipeline.pipelineExpr match { case Right(r) => r case Left(e) => throw e } //Run the pipeline model. val erasedNode = list.erasedNode Try { erasedNode.eval[Stream[(Int, TileLayerRDD[SpatialKey])]] } match { case Failure(e) => println("run failed as expected"); throw e case _ => }
- Import related dependencies.
Sample configuration file
Data loading objects
{
"uri" : "{oss| file | hdfs | ...}://...",
"time_tag" : "TIFFTAG_DATETIME", // optional field
"time_format" : "yyyy:MM:dd HH:mm:ss", // optional field
"type" : "{singleband | multiband}.{spatial | temporal}.read.{oss | hadoop}"
}
The following table describes the parameters.
Key | Value |
---|---|
uri | The URI of the raster data source. |
time_tag | The name of the time tag in the metadata of a dataset. |
type | The type of an operation. |
Note Data can be read by using the Hadoop API from OSS or the file system supported by Hadoop.
Data writing objects
{
"name" : "layerName",
"uri" : "{oss| file | hdfs | ...}://...",
"key_index_method" : {
"type" : "{zorder | hilbert}",
"temporal_resolution": 1 // optional, if set - temporal index is used
},
"scheme" : {
"crs" : "epsg:3857",
"tileSize" : 256,
"resolutionThreshold" : 0.1
},
"type" : "{singleband | multiband}.{spatial | temporal}.write"
}
The following table describes the parameters.
Key | Value |
---|---|
uri | The URI of the raster data source. |
name | The name of a layer. |
key_index_method | The key index method used to generate indexes from a spatial key. |
key_index_method.type | The type of fill curve. Valid values: zorder, row-major, and hilbert. |
key_index_method. tmporal_resolution | The time resolution, in milliseconds. |
scheme | The scheme of the specified layout. |
scheme.crs | The CRS parameters of the specified scheme. |
scheme.tileSize | The tile size of the layout scheme. |
scheme.resolutionThreshold | Optional. The resolution of the user-defined layout scheme. |
Note Data can be read by using the Hadoop API from S3 or the file system supported by Hadoop.
Data conversion objects
- Tile To Layout
{ "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.tile-to-layout" }
Note This example demonstrates how to convert the RDD[({ProjectedExtent | TemporalProjectedExtent}, {Tile | MultibandTile})] model to the RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] model.The following table describes the parameters.Key Options resample_method The resampling method. Valid values: nearest-neighbor, bilinear, cubic-convolution, cubic-spline, and lanczos. - ReTile To Layout
{ "layout_definition": { "extent": [0, 0, 1, 1], "tileLayout": { "layoutCols": 1, "layoutRows": 1, "tileCols": 1, "tileRows": 1 } }, "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.retile-to-layout" }
Note This example demonstrates how to retile the RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] object based on user-defined layout definition rules. - Buffered Reproject
{ "crs" : "EPSG:3857", "scheme" : { "crs" : "epsg:3857", "tileSize" : 256, "resolutionThreshold" : 0.1 }, "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.buffered-reproject" }
Note This example demonstrates how to convert the RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] object to the required CRS data tiles based on the setting of the layout scheme parameter.The following table describes the parameters.Key Options crs The CRS parameters of the specified scheme. tileSize The tile size of the layout scheme. resolutionThreshold Optional. The user-defined resolution of the layout scheme. resample_method The resampling method. Valid values: nearest-neighbor, bilinear, cubic-convolution, cubic-spline, and lanczos. - Per Tile Reproject
{ "crs" : "EPSG:3857", "scheme" : { "crs" : "epsg:3857", "tileSize" : 256, "resolutionThreshold" : 0.1 }, "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.per-tile-reproject" }
Note This example demonstrates how to convert the RDD[({ProjectedExtent | TemporalProjectedExtent}, {Tile | MultibandTile})] object to the required CRS data tiles based on the setting of the layout scheme parameter.The following table describes the parameters.Key Options scheme The scheme of the specified layout. scheme.crs The CRS parameters of the specified scheme. scheme.tileSize The tile size of the layout scheme. scheme. resolutionThreshold Optional. The user-defined resolution of the layout scheme. resample_method The resampling method. Valid values: nearest-neighbor, bilinear, cubic-convolution, cubic-spline, and lanczos. - Pyramid
{ "end_zoom" : 0, "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.pyramid" }
Note This example demonstrates how to create a pyramid within the range specified by end_zoom for the RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] object. The return type is Stream[RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})]].
Layout scheme
DLA Ganos supports two layout schemes:
- ZoomedLayoutSchemeThis scheme is used to build a Tile Map Service (TMS) pyramid.Important If ZoomedLayoutScheme is used, the world scope must be obtained from CRS to build TMS pyramids. In this case, the input raster may be resampled to match the TMS-level resolution.
- FloatingLayoutSchemeThis scheme is used to match the original resolution of the input raster.Important If FloatingLayoutScheme is used, the resolution and scope of the local machine are identified and partitioned based on the specified tile size without the need to resample.