This section is adopted from module 1, section 1 of Open Nighttime Lights. Please refer to it for more details and a video introduction of remote sensing.
Remote sensing is the science of identifying, observing, collecting, and measuring objects without coming into direct contact with them. This can be accomplished through many devices that carry sensors and capture the characteristics of Earth remotely. Sensors on board satellites also record the electromagnetic energy that is reflected or emitted from objects on Earth. There are two types of sensors:
In this exercise, we will primarily work with the Vegetation Continuous Fields (VCF) data provided by the Land Processes Distributed Active Archive Center (LP DAAC), a component of NASA’s Earth Observing System Data and Information System (EOSDIS). The MOD44B Version 6 VCF is a yearly representation of surface vegetation from 2000 to 2020 at 250 m resolution. Each pixel stores a percentage of three ground cover components: percent tree cover, percent non-tree cover, and percent bare.
The ground cover percentages are estimates from a machine learning model based on the combination of the Moderate Resolution Imaging Spectroradiometer (MODIS) data and other high resolution data from NASA and Google Earth. The machine learning model incorporates the visible bandwidth as well as other bandwidth such as brightness temperature (from MODIS bands 20, 31, 32).
The VCF data utilize thermal signatures and other correlates to distinguish forest and non-forest plantation, which is an improvement compared to the Normalized Differenced Vegetation Index (NDVI). For this use case, VCF also improves on the Global Forest Cover (GFC) data set, another source used to study deforestation, which only provides binary data points. GFC records baseline forest cover in the year 2000 and includes a binary indicator for the year of deforestation for each 30m × 30m pixel. If over 90% of the forest cover was lost in a pixel by a given year, then the pixel is marked as deforested, while a pixel is marked as reforested if the forest cover went from 0 in 2000 to a positive value. The VCF records continuous changes in percent of ground cover components, which provides more details than the GFC data.
In geospatial data analysis, data can be classified into two categories: raster and vector data. A graphic comparison between raster and vector data can be found in this page.
In this tutorial, we will use both vector data and raster data. Geospatial data in vector format are often stored in a shapefile format. Because the structure of points, lines, and polygons are different, each individual shapefile can only contain one vector type (all points, all lines, or all polygons). You will not find a mixture of point, line, and polygon objects in a single shapefile. Raster data, on the other hand, is stored in Tagged Image File Format (TIFF or TIF). A GeoTIFF is a TIFF file that follows a specific standard for structuring meta-data. The meta-data stored in a TIFF is called a tif tag and GeoTIFFs often contain tags including spatial extent, coordinate reference system, resolution, and number of layers. We will see examples in section 2.2.
More information and examples can be found in sections 3 & 4 of the Earth Analytics Course.
To get started with R, we provide instructions on how to download and install R on your computer. R is an open source software, which means users like you can also inspect, modify, and improve its source code.
RStudio is an integrated development environment for R. R provides the engine for running code, while RStudio is a user friendly control panel to perform various tasks. RStudio facilitates R code writing and debugging and provides tools for workspace management. RStudio can be downloaded from the RStudio IDE page.
There are numerous posts, tutorials, and courses on the internet. Once you have installed R and RStudio, pick any of the following resources to get familiar with R:
In order to perform data manipulation, we need to attach packages. The first step is downloading R packages from CRAN. For this exercise, we are going to use the package luna to download data from MODIS and the packages terra, tidyverse, raster, and sf for data manipulation. To download these packages, in the R or RStudio console, type the following code (you will need the remotes package in order to download luna):
install.packages(c("terra", "remotes", "tidyverse", "raster", "sf")) remotes::install_github("rspatial/luna")
We follow this tutorial to get MODIS data with luna. For details of the terra package, please refer to the package manuscript and this tutorial. If you are not familiar with the tidyverse workflow, please refer to the R for Data Science book we suggested in the previous section.
Note: If you receive the message ‘These packages have more recent versions available. It is recommended to update all of them. Which would you like to update?’, it is best to update all packages. If some of these updates fail (potentially due to a permission issue: ‘cannot remove prior installation of package
We now attach these packages.
library(terra) library(luna) library(tidyverse) library(raster) library(sf)
Troubleshooting Note: If you receive the message “package or namespace load failed”, it may be because of incompatible versions for dependent packages. If this is the case, you will see a message like: “namespace ‘dplyr’ 1.0.2 is already loaded, but >= 1.0.4 is required”. To proceed, you will need to manually update the package identified, which can be done using
install.packages() (for example:
install.packages("dplyr")). Once the update completes, try to re-attach the package which failed (for example:
library(tidyverse)). Continue repeating this process if you see further errors until you are able to successfully attach all packages.
Once the required packages have been attached, we can access VCF in R. We prefer using R for its ability to download large numbers of files and enable regular, automated updates. For other tools to access MODIS data, see this NASA website.
We can first use luna to check the list of data products available from MODIS. Since luna can also access data from the LANDSAT and SENTINEL platforms, we add
"^MOD|^MYD|^MCD" to narrow our scope to MODIS data. The printed results below list six products from MODIS.
MODIS.product = getProducts("^MOD|^MYD|^MCD") head(MODIS.product)
## provider concept_id short_name version ## 806 LPDAAC_ECS C1000000120-LPDAAC_ECS MOD44B 051 ## 1433 LPDAAC_ECS C1000000400-LPDAAC_ECS MCD43D10 006 ## 1447 LPDAAC_ECS C1000000401-LPDAAC_ECS MCD43D33 006 ## 1452 LPDAAC_ECS C1000000402-LPDAAC_ECS MCD43D45 006 ## 1454 LPDAAC_ECS C1000000403-LPDAAC_ECS MCD43D26 006 ## 1456 LPDAAC_ECS C1000000404-LPDAAC_ECS MCD43D49 006
The product name for VCF is MOD44B. We can use the function
productInfo to launch the information page of VCF.
We can query MODIS and only download a subset of the data. We need to specify the start and end dates and our area of interest (AOI). The date format is “yyyy-mm-dd”. Suppose here we want to subset data from 2010 to 2012.
start.date = "2010-01-01" end.date = "2012-12-31"
In order to subset your area of interest, you need to provide a “map” to
getModis(). This can be obtained from online databases such as the global administrative area database (GADM). You can download map data directly from GADM, as in this post, or you can use R to obtain GADM map data. We will use R below, which requires first installing the package
Geographic levels in GADM are defined as:
For our example, we are interested in India at the district level. We can download the map of India and its level 2 administrative areas with the following code:
india = geodata::gadm("India", level=2, path="./data")
The boundary data is downloaded to the path that you specified in the
path argument. The downloaded data through
gadm() will be in the PackedSpatVector class. If you want to convert it to another class (for example, the sf class, which is easier to work with in R), you can first read it using
readRDS(), then convert to a SpatVector via
vect() from the terra package, and finally convert it to a sf object.
india = readRDS("./data/gadm36_IND_2_pk.rds") %>% vect() %>% st_as_sf(india)
The map we downloaded is at the district level (level 2). Assume our AOI is the state of Odisha. Each row of the data represents a county in Odisha, and the geospatial information for each county is stored in the last column:
geometry. We can filter to obtain the boundaries for our AOI, which will return
aoi in vector format, stored as a data frame in R.
aoi = india %>% filter(NAME_1 == "Odisha") head(aoi)
## Simple feature collection with 6 features and 13 fields ## Geometry type: GEOMETRY ## Dimension: XY ## Bounding box: xmin: 82.63184 ymin: 20.14274 xmax: 87.48266 ymax: 21.97341 ## CRS: BOUNDCRS[ ## SOURCECRS[ ## GEOGCRS["unknown", ## DATUM["World Geodetic System 1984", ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]], ## ID["EPSG",6326]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8901]], ## CS[ellipsoidal,2], ## AXIS["longitude",east, ## ORDER, ## ANGLEUNIT["degree",0.0174532925199433, ## ID["EPSG",9122]]], ## AXIS["latitude",north, ## ORDER, ## ANGLEUNIT["degree",0.0174532925199433, ## ID["EPSG",9122]]]]], ## TARGETCRS[ ## GEOGCRS["WGS 84", ## DATUM["World Geodetic System 1984", ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["latitude",north, ## ORDER, ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["longitude",east, ## ORDER, ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4326]]], ## ABRIDGEDTRANSFORMATION["Transformation from unknown to WGS84", ## METHOD["Geocentric translations (geog2D domain)", ## ID["EPSG",9603]], ## PARAMETER["X-axis translation",0, ## ID["EPSG",8605]], ## PARAMETER["Y-axis translation",0, ## ID["EPSG",8606]], ## PARAMETER["Z-axis translation",0, ## ID["EPSG",8607]]]] ## GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1 GID_2 NAME_2 ## 1 IND India IND.26_1 Odisha NA IND.26.1_1 Anugul ## 2 IND India IND.26_1 Odisha NA IND.26.2_1 Balangir ## 3 IND India IND.26_1 Odisha NA IND.26.3_1 Baleshwar ## 4 IND India IND.26_1 Odisha NA IND.26.4_1 Bargarh ## 5 IND India IND.26_1 Odisha NA IND.26.5_1 Bauda ## 6 IND India IND.26_1 Odisha NA IND.26.6_1 Bhadrak ## VARNAME_2 NL_NAME_2 TYPE_2 ENGTYPE_2 CC_2 HASC_2 ## 1 NA NA District District NA IN.OR.AN ## 2 Balangir NA District District NA IN.OR.BL ## 3 Balasore, Baleswar NA District District NA IN.OR.BW ## 4 NA NA District District NA IN.OR.BR ## 5 NA NA District District NA IN.OR.BD ## 6 NA NA District District NA IN.OR.BH ## geometry ## 1 POLYGON ((84.74501 20.60569... ## 2 POLYGON ((82.81747 20.16052... ## 3 MULTIPOLYGON (((87.12389 21... ## 4 POLYGON ((83.29642 20.84884... ## 5 POLYGON ((84.78773 20.57517... ## 6 MULTIPOLYGON (((87.01651 20...
ggplot(data = aoi) + geom_sf()
Now that we have our AOI as well as time frame, we can filter the MODIS VCF data on these values and see what is available.
vcf.files = getModis("MOD44B", start.date, end.date, aoi, download = F) head(vcf.files)
##  "MOD44B.A2009065.h26v07.006.2017081034835.hdf" ##  "MOD44B.A2009065.h25v07.006.2017081034600.hdf" ##  "MOD44B.A2009065.h26v06.006.2017081034818.hdf" ##  "MOD44B.A2009065.h25v06.006.2017081034537.hdf" ##  "MOD44B.A2010065.h26v07.006.2017081045754.hdf" ##  "MOD44B.A2010065.h25v07.006.2017081045519.hdf"
The products we are going to download are tiled products. For details of tiled products, the tilling system, and the naming convention, please refer to the MODIS overview page here. In essence, we will be downloading grids of maps that cover our AOI.
To actually download these files from the NASA server, you will need a username and password. Please register on NASA Earth Data if you haven’t done so.
The following code will download the files. Replace the path value with the location on your computer where you would like to store these files. Replace the username and password values with your NASA Earth Data credentials from above. (Example:
getModis("MOD44B", start.date, end.date, aoi, download = T, path = "./data/VCFexample", username = "myusername", password = "mypwd")).
getModis("MOD44B", start.date, end.date, aoi, download = T, path = YourPathHere, username = YourNASAUserName, password = YourPassWord)
The data format from MODIS is HDF and may include sub-datasets. We can use
terra to read these files and create raster files. For example,
hdf.example = rast("./data/VCFexample/MOD44B.A2009065.h25v06.006.2017081034537.hdf") hdf.example
## class : SpatRaster ## dimensions : 4800, 4800, 7 (nrow, ncol, nlyr) ## resolution : 231.6564, 231.6564 (x, y) ## extent : 7783654, 8895604, 2223901, 3335852 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs ## sources : MOD44B.A2009065.h25v06.006.2017081034537.hdf:MOD44B_250m_GRID:Percent_Tree_Cover ## MOD44B.A2009065.h25v06.006.2017081034537.hdf:MOD44B_250m_GRID:Percent_NonTree_Vegetation ## MOD44B.A2009065.h25v06.006.2017081034537.hdf:MOD44B_250m_GRID:Percent_NonVegetated ## ... and 4 more source(s) ## names : MOD44~Cover, MOD44~ation, MOD44~tated, MOD44~ality, MOD44~er_SD, MOD44~ed_SD, ...
We can find basic information such as the coordinate reference system, number of cells, and resolution from the above output. There are 7 layers in each of the VCF tiled files. We are interested in the percent tree coverage layer.
##  "MOD44B_250m_GRID:Percent_Tree_Cover" ##  "MOD44B_250m_GRID:Percent_NonTree_Vegetation" ##  "MOD44B_250m_GRID:Percent_NonVegetated" ##  "MOD44B_250m_GRID:Quality" ##  "MOD44B_250m_GRID:Percent_Tree_Cover_SD" ##  "MOD44B_250m_GRID:Percent_NonVegetated_SD" ##  "MOD44B_250m_GRID:Cloud"
A quick plot of the data can be done with the
plotRBG() function. We will dive into data processing and basic operations in the next section.
plotRGB(hdf.example, stretch = "lin")