Import raster data

  1. Pipeline technology

    The pipeline model uses the extract, transform, and load (ETL) technology developed by DLA Ganos based on the GeoTrellis open source project for fast loading, processing, and importing of raster data. For more information, see https://pdal.io/pipeline.html.

    This model consists of several functional modules, such as Load, Transform, and Save. The pipeline model is typically represented by JSON objects. Primary JSON objects are called pipelines. A pipeline is an array of steps to be executed. Other JSON objects are called stage objects. The pipeline process and related parameter settings for the import operation of DLA Ganos are defined in a JSON object. The following example shows a simple JSON script:
    [
      {
        "uri" : "URI of OSS resources",
        "type" : "singleband.spatial.read.oss"
      },
      {
        "resample_method" : "nearest-neighbor",
        "type" : "singleband.spatial.transform.tile-to-layout"
      },
      {
        "crs" : "EPSG:3857",
        "scheme" : {
          "crs" : "epsg:3857",
          "tileSize" : 256,
          "resolutionThreshold" : 0.1
        },
        "resample_method" : "nearest-neighbor",
        "type" : "singleband.spatial.transform.buffered-reproject"
      },
      {
        "end_zoom" : 0,
        "resample_method" : "nearest-neighbor",
        "type" : "singleband.spatial.transform.pyramid"
      },
      {
        "name" : "mask",
        "uri" : "oss://geotrellis-test/colingw/pipeline/",
        "key_index_method" : {
          "type" : "zorder"
        },
        "scheme" : {
          "crs" : "epsg:3857",
          "tileSize" : 256,
          "resolutionThreshold" : 0.1
        },
        "type" : "singleband.spatial.write"
      }
    ]
  2. Import process
    1. Import related dependencies.
      import geotrellis.layer._
      import geotrellis.spark.pipeline._
      import geotrellis.spark.pipeline.json._
      import geotrellis.spark._
      import geotrellis.spark.store.kryo.KryoRegistrator
      import org.apache.spark.{ SparkConf, SparkContext}
      import scala.util.{ Failure, Try}
    2. Initialize the Serverless Spark environment.
        val conf =
          new SparkConf()
            .setMaster("local[*]")
            .setAppName("Spark Tiler")
            .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
            .set("spark.kryo.registrator", classOf[KryoRegistrator].getName)
        conf.set("spark.kryoserializer.buffer.max", "2047m")
        implicit val sc = new SparkContext(conf)
    3. Define the pipeline JSON description.
      The following example shows a simple pipeline model, which is defined based on the following information:
      • URI and load driver of the imported file
      • Data layout scheme (tile-to-layout)
      • Data conversion and projection
      • Data writing address (Lindorm)
      The following code shows the detailed configuration of the pipeline model.
       val pipeline: String =
          """
            |[
            |  {
            |    "uri" : "URI of OSS resources",
            |    "time_tag" : "TIFFTAG_DATETIME",
            |    "time_format" : "yyyy:MM:dd HH:mm:ss",
            |    "type" : "singleband.spatial.read.hadoop"
            |  },
            |  {
            |    "resample_method" : "nearest-neighbor",
            |    "type" : "singleband.spatial.transform.tile-to-layout"
            |  },
            |  {
            |    "crs" : "EPSG:3857",
            |    "scheme" : {
            |      "crs" : "EPSG:3857",
            |      "tileSize" : 256,
            |      "resolutionThreshold":0.1
            |    },
            |    "resample_method" : "nearest-neighbor",
            |    "type" : "singleband.spatial.transform.buffered-reproject"
            |  },
            |  {
            |    "end_zoom" : 0,
            |    "resample_method" : "nearest-neighbor",
            |    "type" : "singleband.spatial.transform.pyramid"
            |  },
            |  {
            |    "name" : "srtm",
            |    "uri" : "hbase://localhost:2181? master=localhost&attributes=attributes&layers=srtm-tms-layers",
            |    "pyramid" : true,
            |    "key_index_method" : {
            |      "type" : "zorder"
            |    },
            |    "scheme" : {
            |      "tileCols" : 256,
            |      "tileRows" : 256
            |    },
            |    "type" : "singleband.spatial.write"
            |  }
            |]
          """.stripMargin
    4. Run the pipeline model.
      //Parse the pipeline model of the JSON description and generate a set of expressions:
      val list: List[PipelineExpr] = pipeline.pipelineExpr match {
          case Right(r) => r
          case Left(e) => throw e
      }
      
      //Run the pipeline model.
      val erasedNode = list.erasedNode
        Try {
          erasedNode.eval[Stream[(Int, TileLayerRDD[SpatialKey])]]
        } match {
          case Failure(e) => println("run failed as expected"); throw e
          case _ =>
        }

Sample configuration file

Data loading objects
{
   "uri" : "{oss| file | hdfs | ...}://...",
   "time_tag" : "TIFFTAG_DATETIME", // optional field
   "time_format" : "yyyy:MM:dd HH:mm:ss", // optional field
   "type" : "{singleband | multiband}.{spatial | temporal}.read.{oss | hadoop}"
}
The following table describes the parameters.
Key Value
uri The URI of the raster data source.
time_tag The name of the time tag in the metadata of a dataset.
type The type of an operation.
Note Data can be read by using the Hadoop API from OSS or the file system supported by Hadoop.
Data writing objects
{
   "name" : "layerName",
   "uri" : "{oss| file | hdfs | ...}://...",
   "key_index_method" : {
      "type" : "{zorder | hilbert}",
      "temporal_resolution": 1 // optional, if set - temporal index is used
   },
   "scheme" : {
      "crs" : "epsg:3857",
      "tileSize" : 256,
      "resolutionThreshold" : 0.1
   },
   "type" : "{singleband | multiband}.{spatial | temporal}.write"
}
The following table describes the parameters.
Key Value
uri The URI of the raster data source.
name The name of a layer.
key_index_method The key index method used to generate indexes from a spatial key.
key_index_method.type The type of fill curve. Valid values: zorder, row-major, and hilbert.
key_index_method. tmporal_resolution The time resolution, in milliseconds.
scheme The scheme of the specified layout.
scheme.crs The CRS parameters of the specified scheme.
scheme.tileSize The tile size of the layout scheme.
scheme.resolutionThreshold Optional. The resolution of the user-defined layout scheme.
Note Data can be read by using the Hadoop API from S3 or the file system supported by Hadoop.
Data conversion objects
  • Tile To Layout
    {
       "resample_method" : "nearest-neighbor",
       "type" : "{singleband | multiband}.{spatial | temporal}.transform.tile-to-layout"
    }
    Note This example demonstrates how to convert the RDD[({ProjectedExtent | TemporalProjectedExtent}, {Tile | MultibandTile})] model to the RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] model.
    The following table describes the parameters.
    Key Options
    resample_method The resampling method. Valid values: nearest-neighbor, bilinear, cubic-convolution, cubic-spline, and lanczos.
  • ReTile To Layout
    {
       "layout_definition": {
          "extent": [0, 0, 1, 1],
          "tileLayout": {
             "layoutCols": 1,
             "layoutRows": 1,
             "tileCols": 1,
             "tileRows": 1
          }
        },
       "resample_method" : "nearest-neighbor",
       "type" : "{singleband | multiband}.{spatial | temporal}.transform.retile-to-layout"
    }
    Note This example demonstrates how to retile the RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] object based on user-defined layout definition rules.
  • Buffered Reproject
    {
       "crs" : "EPSG:3857",
       "scheme" : {
          "crs" : "epsg:3857",
          "tileSize" : 256,
          "resolutionThreshold" : 0.1
       },
       "resample_method" : "nearest-neighbor",
       "type" : "{singleband | multiband}.{spatial | temporal}.transform.buffered-reproject"
    }
    Note This example demonstrates how to convert the RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] object to the required CRS data tiles based on the setting of the layout scheme parameter.
    The following table describes the parameters.
    Key Options
    crs The CRS parameters of the specified scheme.
    tileSize The tile size of the layout scheme.
    resolutionThreshold Optional. The user-defined resolution of the layout scheme.
    resample_method The resampling method. Valid values: nearest-neighbor, bilinear, cubic-convolution, cubic-spline, and lanczos.
  • Per Tile Reproject
    {
       "crs" : "EPSG:3857",
       "scheme" : {
          "crs" : "epsg:3857",
          "tileSize" : 256,
          "resolutionThreshold" : 0.1
       },
       "resample_method" : "nearest-neighbor",
       "type" : "{singleband | multiband}.{spatial | temporal}.transform.per-tile-reproject"
    }
    Note This example demonstrates how to convert the RDD[({ProjectedExtent | TemporalProjectedExtent}, {Tile | MultibandTile})] object to the required CRS data tiles based on the setting of the layout scheme parameter.
    The following table describes the parameters.
    Key Options
    scheme The scheme of the specified layout.
    scheme.crs The CRS parameters of the specified scheme.
    scheme.tileSize The tile size of the layout scheme.
    scheme. resolutionThreshold Optional. The user-defined resolution of the layout scheme.
    resample_method The resampling method. Valid values: nearest-neighbor, bilinear, cubic-convolution, cubic-spline, and lanczos.
  • Pyramid
    {
       "end_zoom" : 0,
       "resample_method" : "nearest-neighbor",
       "type" : "{singleband | multiband}.{spatial | temporal}.transform.pyramid"
    }
    Note This example demonstrates how to create a pyramid within the range specified by end_zoom for the RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] object. The return type is Stream[RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})]].

Layout scheme

DLA Ganos supports two layout schemes:
  • ZoomedLayoutScheme
    This scheme is used to build a Tile Map Service (TMS) pyramid.
    Notice If ZoomedLayoutScheme is used, the world scope must be obtained from CRS to build TMS pyramids. In this case, the input raster may be resampled to match the TMS-level resolution.
  • FloatingLayoutScheme
    This scheme is used to match the original resolution of the input raster.
    Notice If FloatingLayoutScheme is used, the resolution and scope of the local machine are identified and partitioned based on the specified tile size without the need to resample.