Untitled-design-Medium-Banner-US-Landscape-.png

Author: Aayush Malik

Data Science Associate

International Initiative for Impact Evaluation

New Delhi | London | Washington DC

Welcome

We welcome you to the 3ie tutorial on applications of remote sensing for impact evaluation. Our hope is that equipped with the theoretical considerations and the practical hands-on code, you will be in a better position to decide which application of remote sensing is suitable for your work and what can be done to calculate the same. Moreover, we believe that laden with this geospatial literacy of earth observation, you can generate evidence faster in a cost-effective manner and over a large area. With extensive on-ground surveys limited by the COVID-19 pandemic and socio-political scenarios, remote sensing can help you to scale your work effectively.

This tutorial uses Python and Jupyter Notebooks. Python is a general purpose programming language that is used extensively in the field of machine learning and data analytics. Jupyter Notebooks are executable notebooks which allow us to document and run our code simultaneoulsy using Markdown. To reduce the access barrier, we are sharing this tutorial as a Google Colab Notebook, so that users can run the code and access the tutorial without downloading any software if they desire to do so.

We provide a general guidance on usage of these software. However, we believe that prior experience of basic programming will be helpful.

Tutorial Structure

The tutorial is divided into four modules. We recommend Module 1 for the audience who are not familiar with the concepts of earth observation using remote sensing. It may include commissioners of evaluation, as well as leadership staff who want to be informed consumers of earth observation services. The module introduces the core concepts, opportunities, and challenges associate with remote sensing. The Modules 2, 3, and 4 are geared more towards an audience who have fundamental familiarity with Python and want to use it for doing earth observation. The only prerequisite for Module 2, 3, and 4 is familiarity with basics of coding, in particular Python, although we believe that those who have worked on languages such as R or Javascript may also find it easy to follow the tutorial.

Hands-on exercises are provided throughout the tutorial to provide readers with ample opportunities for practicing what they are learning. Additional Resources are provided at the end of the tutorial for those who would like to delve deeper into the field of remote sensing and its use for causal inference.

Please note that this tutorial is geared towards an audience which is primarily interested in measuring agriculture and water outcomes using Remote Sensing. Further applications of remote sensing such as population density estimation, urban sprawl measurement, object detection, land cover estimation, district/block level economic activity estimation, measurement of pollution indicators, species and biodiversity classification, city-wide solar power generation estimation, waterlogging estimation post flooding etc are some of the current applications of remote sensing which are not part of the scope of current tutorial, but could be developed if useful.

Tutorial Use-Cases

After completing this tutorial successfully, a learner will be able to do the following:

For instance, one may be interested in measuring the impact of an intervention on developing sustainable agriculture or checking deforestation. In this case counting the number of trees may not always yield into measurable and accurate outcomes, however measuring the greenery as shown by a vegetation around the intervention unit may give us valuable information. Such an information can be easily quanitified alongside other covariates of interest into an econometric model to investigate the causal effects of that intervention.

1. Introduction to Remote Sensing

1.1 What is Remote Sensing?

The term remote sensing is made up of two words: remote and sensing. Remote means being far away from the unit of observation (field, building, person) and sensing means using digital sensors to measure the data across time and space. Thus, remote sensing is the science of collecting measurements on units without coming in touch with them. Satellite Imagery is one form of remotely sensed data. For this tutorial, we will focus on satellite imagery only.

mod1-remote_sensing.png

Source: NASA’s Goddard Space Flight Center. Remotely Sensing Our Planet. 2017. URL: https://svs.gsfc.nasa.gov/30892

1.1.1 Active and Passive Sensors

The sensors on satellites can be classified into two main categories: Passive and Active

  1. Passive Sensors: Passive Sensors record the energy that is naturally reflected from the surface of earth. For example, reflection of sunlight in a desert region will be different from the reflection of sunlight from a water body. Passive Sensors capture this differentiated energy reflection.

  2. Active Sensors: Active Sensors provide their own energy source for differentiated energy reflection. For example, RADARs send out electromagnetic waves via the instrument on board a system and then captures the reflection.

Active-Vs-Passive-Remote-Sensing.png

SOURCE: Unmanned Aerial Vehicles (UAVs): A Survey on Civil Applications and Key Research Challenges, Sawalmeh, Ahmad, April 2018

1.1.2 Electromagnetic Spectrum

The Electromagnetic (EM) Spectrum is the range of all types of EM radiations that exist around us. As humans, our eyes are only sensitive to the visible portion of EM spectrum, which we call as Visible Spectrum. However, there are other waves in the EM Spectrum that are of special use to remote sensing especially Infrared Waves (IR Waves) and Visible Light.

Although most of the satellites use EM for their operations, some satellites have started using RADAR for earth observation. The major advantages of RADAR based systems are their ability to generate an image when there are clouds, so this is particularly helpful for regions with distinct wet and dry season or areas which receive rainfall for whole year, like tropical rainforests. For this tutorial, we will focus on EM-based imagery only.

ems.jpg

SOURCE: Imperial Encyclopedia Traveller, Marc Miller, July 2019

1.1.3 Fundamentals of Remote Sensing

The fundamental technology that is behind the remote sensing is differentiated reflection of EM waves by different objects. The sun's energy which is composed of full EM Spectrum is reflected differently by different objects on the planet. For example, healthy plants that are lush green will reflect the sun's energy in a different way as compared to the plants that are dull and are under stress. The sensors which are on board these satellites capture these differentiated amounts of reflections and use it to construct the satellite imagery as we see. These data help us to identify the changes happening on the planet remotely.

reflection.png

SOURCE: Linking Student Performance in Massachusetts Elementary Schools with the "Greenness'' of School Surroundings Using Remote Sensing, PubMed, DOI

1.2 Why Remote Sensing

Evidence-based decision-making is becoming more and more common in policy, but the cost of generating evidence continues to increase. Add to this fact that some of the regions of the world, where development interventions are most needed may not always be accessible due to natural and social causes. Keeping these issues in mind, remote sensing promises to be an alternative that can be readily used in a cost-effective manner to assist in measuring the impact of development interventions in LMICs. Doing remote sensing is not a replacement for traditional surveys. It is actually a complement to make traditional data collection better and especially useful when any other means of surveying is not possible.

1.3 Benefits of Remote Sensing in Impact Evaluation

There are multiple benefits and applications of using remote sensing data for impact evaluation purposes.

  1. In contrast to ground-based data, remote sensing data allows us to capture larger areas at a relatively lower cost. For example, remote sensing was used to do earthquake damage assessment in Italy as highlighted here.

  2. Remote Sensing allows us to ask more relevant questions where we see a geographic element plays an important role. For example, it may allow us to measure spillover effect, as shown in this study which measured spatial spillover effects of a reclaimed mining subsided lake in China.

  3. The sampling bias is lower as data can be collected for harder to reach areas impacted by geographic or political constraints. Thus, we can have a more credible identification strategy. For example, this study does drought evaluation in the North East Region of Thailand.

  4. Remote sensing data can be used to calculate various indices such as vegetation indices or water indices, or pollutants indices to capture the effects of a treatment in an area remotely. For example, it has been used for detection and monitoring of marine pollution.

  5. Deep Learning (making computers learn on the basis of huge amounts of data) based Computer Vision (making computers see and interpret images like humans do) methods allow us to do efficient program targeting to know where to put in our resources and who are the people who would benefit most from the interventions. For example, deep learning and remote sensing were used to locate the poor in Nigeria.

Note that Deep Learning may not always be need for doing an analysis that requires satellite imagery. For example making a land-usage map using satellite imagery may not always require one to perform deep learning computations. For motivated learners who are interested in knowing what is possible, we recommend going to this resource

It is also worth mentioning that the advanced technologies such as getting high resolution commerical imagery and performing deep learning based computer vision algorithms may have cost and human resources implications which may not always be readily available with organizations based in LMICs.

1.4 Challenges of Remote Sensing

  1. #### Ground Truthing We may always be not sure if what remote sensing shows us is actually true, because it is "remotely sensed". For example, if benefits from a development programme are directed to the households with a straw roof, then we need to ensure that houses below the straw roof are actually mud houses, and have not been masked by straw roofing for getting the benefits. However, if we need to measure impact remotely and there is no chance we can go ground truthing, it will be better to apply remote sensing to measure at least what's possible.

  2. ### Cost Implications for High Resolution Data Because many of the freely available satellite data do not have the granularity that can be used to make the inferences on, let's say smaller plots in LMICs or object detection, remote sensing can have cost and administrative implications. It is worth noting that geospatial analysis is a broad endeavour encompasses many different things. The mapping part and conversion of imagery to numbers within particular user-defined boundaries may be feasible within a few weeks, but extra time may be required for other, more involved work - e.g., spatial statistics.

  3. #### Crop Type and Plot Boundaries (Lack of Vector Datasets) For doing a plot/small area level analysis, one may need to have a clear idea of what the plot boundaries are then turn them into vector formats. Deep Learning methods allow us to do so with reasonable accuracy, and it is expected that it will be better in the long run.

Note that at this point, it would be important to differentiate between raster and vector datasets. The satellite images you get from Google Earth Engine (GEE) are multi-spectral raster datasets. The more you zoom in, the more pixelated an image becomes. You may wish to restrict your retrieval of those images only for your area of interest. This is where vector dataset comes into picture. Geographical vector data can come in various file formats, but the most commonly used is Shapefiles format developed by ESRI. It is easy to find administrative shapefiles from respective government body or private places, but sometimes the administrative shapefile may not match intervention area. For example, if your area of interest is a village, but the shapefiles you have are only at the district level, you may need to create your own shapefile. This may result into additional costs (person time and software). This will help you to aggregate your data. For example, what is the average Normalized Difference Vegetation Index (NDVI) for the year 2020 in the village X of Benin? To answer those questions, you require shapefiles for your intervention areas.

  1. #### Cloud Cover It may so happen that you may not get a satellite image at all because of dense cloud cover in particular months. For example, it is almost impossible to retrieve satellite images during the months of July, August, and September during the Monsoon Season in India. The phenomenon is true for other parts of the world too, where there is a distinct wet season (Kenya, for example). However, this disadvantage is solved by RADAR-based satellites which use RADAR waves for imagery that penetrate clouds.

  2. #### Temporal Compatibility Satellite sensors age over time and are replaced periodically. Thus, the same digital number does not necessarily mean the same level of light intensity across years and satellites. This is an issue especially with the nighttime light data.

1.5 When NOT to use Remote Sensing

  1. #### Lack of geographic dependence When our research question is not expected to be affected by a geographical variation, remote sensing is not a wise thing to do. For example, measuring the effect of training on staff's performance in different offices located in a country may not be affected by where the offices are located.

  2. #### Doing Remote Sensing ONLY We are proponents of using ground truthing always before taking the decisions on the basis of purely remote analysis. Doing so can be a socially, politically, and economically costly affair that can have far and wide repercussions.

  3. #### Measuring variables that cannot be measured from above With all the merits of Remote Sensing, it may not help us with measuring let's say height of a building or how many houses are located in an apartment building very accurately (although, LiDAR remote sensing may help with that). Moreover, variables such as gender, social capital, political participation etc cannot be measured by remote sensing.

1.6 Remote Sensing and Ethical Considerations

Remote Sensing offers a lot of possibility for measurement of developmental outcomes at a relatively lower cost and at a faster speed. However, ethical considerations are needed to be kept in mind so that no damage is done to the privacy and security of the people while doing remote sensing. This section has been adapted from Ethical Considerations When Using Geospatial Technologies For Evidence Generation and we recommend they should be considered before the application of remote sensing for geospatial analytics.

  1. Be clear and write out specific examples of how remotely sensed data will be used for the analysis and where is it that remote procedures will bring a benefit
  2. Acquire consent where relevant and feasible. For example, if you are monitoring private fields and houses of people, it is better to ask them for their consent if it's possible.
  3. Extreme privacy of personally identifiable information should be maintained and in case it's not possible then all data must be deidentified either by geomasking or blurring
  4. If capturing imagery using custom UAVs, the relevant governmental authorities must be informed in advance and all necessary communication protocols must be discussed in advance
  5. Consider the possible discrimination against the people belonging to a particular region while doing predictive analysis

1.7 Summary

Remotely sensed data can be used to augment the traditional methods of impact evaluation to generate evidence faster, at a relatively lower cost, and for a larger area. The publicly available data along with tools such as Google Earth Engine have lowered the barriers to access of using remote sensing data for impact evaluation. The subsequent chapters will allow you to download and use spatial data for your work using Google Earth Engine and Python.

1.8 Resolution in Satellite Imagery (Optional)

Satellite images can have different resolutions depending upon which satellite the images are coming from. Keeping in mind the uses of satellite imagery for a particular use-cases, we may need to have some or other image depending upon the resolution and purpose of the work. Written below are the three types of resolutions we can classify the satellite images into.

1.8.1 Spatial Resolution

Spatial Resolution refers to the size of one pixel on ground. In simple words, objects placed at a distance higher than the spatial resolution of a satellite can be discerned by the satellite. The spatial resolution for SENTINEL is 10 meters. This means that the satellite's sensor would not be able to differentiate between objects less than 10 m apart. For some of the commercial remote sensing products from MAXAR, the spatial resolution is 15 cm.

HD_15cm_Airport.jpg

SOURCE: Maxar Blog

It is important to keep in mind that the higher resolution images may be required to perform some of the analyses that require granularity, for example cattle monitoring via remote sensing requires a very high resolution image. In those cases, it is worthwhile investing in buying higher resolution satellite imagery from commerical providers.

You can also check out this blog that shows how different resolution images look.

1_1efTZg67DtuhY46JOWSQfQ.jpeg

SOURCE: Medium Blog

1.8.2 Temporal Resolution

Temporal Resolution or revisit time is the time it takes for a satellite to image the same area again. The temporal resolution for SENTINEL 2 is 5 days on equator and for LANDSAT 8, it's 16 days on equator. The revisit time is lower on higher latitudes, which means that an area is frequented more often by a satellite for areas closer to poles.

1.8.3 Spectral Resolution

Spectral resolution refers to the number of bands (range of frequencies in an electromagnetic spectrum used for satellite imagery) that there are in a satellite imagery. Because satellite images contain multiple wavelengths images, we are able to compute relevant indices for specific purposes. We will cover more about multispectral images when we learn computing indices using Google Earth Engine. The spectral resolution of LANDSAT 8 is 9 bands, while that of SENTINEL 2 is 13 bands.

mod1-res_tradeoffs.png

Source: Timothy A Warner, M Duane Nellis, and Giles M Foody. Remote sensing scale and data selection issues. The SAGE handbook of remote sensing, 2:568, 2009

There is an inherent tradeoff between spatial, spectral and temporal resolutions. Typically, the higher the spatial resolution, the lower the spectral and the temporal resolution and the higher the temporal resolution, the lower the spatial and spectral resolutions.

2. Introduction to Technology Stack

There are two fundamental technologies we will be using for extracting, processing, analyzing, and exporting satellite spatial data (aka satellite images).

  1. Python - A general purpose programming language used primarily for data analysis and machine learning. The core of Python is very small making it a fast langauge. In addition to this, there are thousands of libraries that can be used to augment the core functionalities making it first choice of data scientists.
  1. Google Earth Engine - According to Google, "Google Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface."

The reader is assumed to have a basic familiarity with Python. In case the reader wants to have a refresher training before starting, one can refer to this link for a quick reference.

We will be using Jupyter Notebooks for working with satellite data and analysis. To install Jupyter Notebook, one needs to install Anaconda Distribution. For more information please refer to this link. Alternatively, one may use Google Colab to run the same on cloud and share it with others on the team

Jupyter Notebooks run by connecting to virtual machines (VM) that have maximum lifetimes that can be as much as 12 hours. Notebooks will also disconnect from VMs when left idle for too long. Maximum VM lifetime and idle timeout behavior may vary over time, or based on your usage. This is necessary for Colab to be able to offer computational resources for free.

2.1 Introduction to Google Earth Engine

Google Earth Engine is a cloud based platform for planetary-scale geospatial analysis which allows to process a variety of geographical data at scale and handle large geographical datasets.

The main benefit of using Google Earth Engine is the savings in computational resources needed to do large scale analysis. In addition to this, it is free to use for non-commerical use, i.e. free for educational and research purposes.

GEE contains petabytes of ready-to-use satellite data. The platform can be accessed via two methods.

  1. The first one is using the web version that uses Javascript. To use that users can go to code.earthengine.google.com
  2. The second is using Python API for accessing the platform using Jupyter Notebook.

We will use the second method for our work.

It is important to register on Google Earth Engine before using the platform. To register at Google Earth Engine, users are requested to refer to this link.

Users can also explore features of Google Earth Engine, including browsing case studies and data catalog available.

2.1.1 Installation of Packages

Before proceeding, users are requested to install the following packages

  1. Earth Engine API using command pip install earthengine-api
  2. geemap using command pip install geemap

As a first step, we will teach how to download and visualize map from Google Earth Engine. Before using GEE, you need to authenticate your account. In order to do so, follow these steps.

2.2 Rendering your first Map

In this part of the module, we will learn how to import the necessary packages, authenticate, initialize, and render your first map using Google Earth Engine. We will visualize Addis Ababa, Ethiopia. Before you begin this step, please ensure you have registered for Google Earth Engine. That will allow you to authenticate your session and use GEE. Run the code below when you are running this notebook for each session.

Tip: To find the zoom level and coordinates, it is recommended to look at the address bar and copy the link to see them. For example, https://www.google.com/maps/place/Nagpur,+Maharashtra/@21.1612315,79.0024702,12z/ in this link we see coordinates after the '@' sign and zoom before 'z'. Zoom level is not absolute and depending upon the screen size, different users may need to have different zoom level for their need.

Congratulations on your first map generation using Google Earth Engine and Python. You can explore the panel on the left hand side. In the coming tutorials, we will learn how to select a particular area of interest using custom shape files.

There are a couple of reasons why you may not get the desired output. This will help you troubleshoot the probable problems.

  1. Ensure that Python, Google Earth Engine, and geemap are properly installed.
  2. If you are using Jupyter Labs, please ensure to install node.js beforehand. This will allow you to install pyleaflet easily.
  3. If you are using Google Colab, please install the packages as highlighted earlier.

2.3 Map Rendering Advanced

In this part of the tutorial, you will learn how to find and import the satellite images, draw or select a particular area of interest, import custom shape files, select the time period for which you want the image, and selecting the bands from the satellite images. This will be a long tutorial, so take a small break, grab a cup of coffee, and let's go!

2.3.1 Finding and Importing Satellite Images on Google Earth Engine Catalog

The best way to find a satellite image of interest is to go to Google Earth Engine Catalog and search for the required dataset. A user can go this link to search for the satellite imagery. One needs to enter the name of the satellite in the search bar and they will be shown the relevant results. What we need for our purpose is the unique ID associated with each satellite imagery that one can find in the description after clicking the relevant areas. The visuals below explain the whole process step by step.

  1. Open Google Earth Engine Catalog

ee-catalog.png

  1. Search for the Sentinel Imagery

sentinel-search.png

  1. Click on Sentinel-2 MSI: MultiSpectral Instrument, Level-1C panel and you will come across the following screen. We need the highlighted ID (in red) for our information.

sentinel-id.png

Thus for Sentinel-2 MSI, the unique ID for the satellite images is COPERNICUS/S2 and the code to access the image collection will be ee.ImageCollection("COPERNICUS/S2")

The final code for importing the image collection would be

# example code not to be run
image_col = ee.ImageCollection('COPERNICUS/S2') #image collection

In most of the cases, however, we are interested in getting data for a particular region and for specified dates. We will learn how to do this in the next sections.

2.3.2 Selecting an Area of Interest

To reduce computational intensity and make the processes faster, it is recommened to select an area of interest. Selecting that area requires the knowledge of latitude and longitude of a place. For example, if we need to select the boundaries of India, we may look at Northern and Southern Latitude Value and Eastern and Western Longitude Value. In India's case, it would be approximately 68.191522 E to 97.395130 E and 8.077273 N to 37.083957 N. It can be helpful to use Google Maps to know the most approximate absolute value of latitude and longitude. Thus, our area of interest will be [68.191522, 97.395130, 8.077273, 37.083957]. This step may take couple of minutes depending upon the size of area of interest.

# example code not to be run
#(North lat, South lat, East long, West long)
aoi = ee.Geometry.Rectangle([68.191522, 97.395130, 8.077273, 37.083957])

At this point it is worth mentioning that selecting a rectangular area of interest is not the only possibility. We can select an area of interest as per the administrative boundaries as well, as discussed below. However, rectangular regions are preferred if we want to see the change not just in the administrative boundaries of a place, but also in the border areas. For example, we may be interested in seeing the outcome of an intervention at the political or geographical border of a place, such as border between sea and land or a border between two states of a country.

We can also use Global Administrative Unit Layers 2015 provided by the Food and Agriculture Organization as built-in region selector in Google Earth Engine. For more information, refer to this page. For the list of names and spellings used in FAO GAUL dataset, please go to this link

2.3.3 Clipping Map to Custom Shape Files

# example code not to be run
# Get a feature collection of administrative boundaries.
states = ee.FeatureCollection('FAO/GAUL/2015/level2').select('ADM1_NAME')

# Filter the feature collection to subset Bihar
bihar = states.filter(ee.Filter.eq('ADM1_NAME', 'Bihar'))

image = ee.ImageCollection('COPERNICUS/S2_SR').filterDate('2020-01-01', '2020-12-31').median().clip(bihar)

One can also import custom shape files using Google Earth Engine Assets. It may happen that the shapefile one gets from FAO GAUL may not represent the correct political shape of a country. In that case, one may do the following steps to use the shape file.

  1. Go to code.earthengine.google.com and click on Assets

assets.png

  1. Click on New and select shape files. Ensure that your shapefile contains all the necessary ancilliary files. These include files with the extensions .cpg, .dbf, .prj, .shp, .shx

shapefile.png

  1. Upload the files and give this asset a name. This may take a few minutes. You can check the progress in Tasks panel on the right side of your screen.

select-file.png

  1. Doing that will bring your assets to Google Earth Engine from where you can use those files with the following command.
# example code not to be run
india = ee.FeatureCollection("users/amalik/indian_states")

ai_image = ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_AER_AI").filterDate('2020-01-01', '2020-12-31').select('absorbing_aerosol_index').median().clip(india)

2.3.4 Selecting Image for a particular Time Period

To select an image for a specific time period, we need to specify the start date and the end date for the image collection.

For selecting an image collection, we specify satellite image source and the dates as shown below.

# example code not to be run
#SENTINEL 2 LEVEL 2A https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR
image_collection = ee.ImageCollection('COPERNICUS/S2_SR').filterDate('2020-01-01', '2020-12-31')

However, if we are looking for the most realistic single image for this time period, we take the median value of the pixel. To accomplish this, we add median() at the end of previous line and resulting code looks like this:

# example code not to be run
image = ee.ImageCollection('COPERNICUS/S2_SR').filterDate('2020-01-01', '2020-12-31').median()

At this point, it is worth mentioning that median value is not the only value a researcher may be interested in. Depending upon the research question, they may need to have a look at the maximum, or minimum or the variation between maximum and minimum. This will necessitate the relevant change in the code.

Please note that the code does not correct the errors we may get because of a cloudy image. To increase our probability of getting only images with low cloud coverage, we can add a filter, which will allow us to retrieve only those images that have clouds percentage below a certain threshold. The code for that would be:

# example code not to be run
image = ee.ImageCollection('COPERNICUS/S2_SR').filterDate('2020-01-01', '2020-12-31').filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', 15)).median()

2.3.5 Selecting Bands from Multispectral Satellite Image

A satellite image is composed of many bands that we call as Multispectral Satellite Bands. For example, SENTINEL 2A has 13 bands. The visualization below will help to understand what a satellite multispectral image looks like.

download.jpg

Source: Research Gate

Thus each pixel contains 13 values and if we have to get values for a particular band, we may need to select the name of that band. One can look at the Google Earth Engine Image Description for selecting the band. For example, the code below selects the avg_rad from the VIIRS Nighttime Lights Data.

viirs_dnb.png

3. Introduction to Vegetation and Water Indices

3.1 Vegetation Indices

A Vegetation Index (VI) is a spectral transformation of two or more bands designed to enhance the contribution of vegetation properties and allow reliable spatial and temporal inter-comparisons of terrestrial photosynthetic activity and canopy structural variations.

Vegetation indices are remote sensing approaches used to quantify vegetation cover, vigor or biomass for each pixel in an image (Ouyang et al. 2010).

There are many Vegetation Indices (VIs), with many being functionally equivalent. Many of the indices make use of the inverse relationship between red and near-infrared reflectance associated with healthy green vegetation.

Vegetation indices have been used to:

It is worth noting that every vegetation index has atmospheric and sensor effects, and thus it also has high variability and low repeatability or comparability (Huang et al, 2021.

A short article discussing various vegetation indices, their applications, and formulae is available here.

Tip: Index Database is a helpful resource for finding indices and the relevant formula for multiple satellite images.

3.1.1 Normalized Difference Vegetation Index

The Normalized Difference Vegetation Index (NDVI) is a simple indicator that is frequently used to monitor large scale events like drought, agricultural production, identifying fire zones and finding areas most suspectible to desertification.

This index defines values ​​from -1.0 to 1.0, basically representing green vegetation, where negative values ​​are mainly formed from clouds, water and snow, and values ​​close to zero are primarily formed from rocks and bare soil. Very small values ​​(0.1 or less) of the NDVI function correspond to empty areas of rocks, sand or snow. Moderate values ​​(from 0.2 to 0.3) represent shrubs and meadows, while large values ​​(from 0.6 to 0.8) indicate temperate and tropical forests. However, when using NDVI for crop monitoring purposes, it is helpful to focus on the type of crop, as different crops have different reflectance.

In simple words one can say that NDVI is a measure of the state of plant health based on how the plant reflects light at certain frequencies, because the chlorophyll absorbs and reflects certain wavelengths.

According to Earth Observing System, "Chlorophyll (a health indicator) strongly absorbs visible light, and the cellular structure of the leaves strongly reflect near-infrared light. When the plant becomes dehydrated, sick, afflicted with disease, etc., the spongy layer deteriorates, and the plant absorbs more of the near-infrared light, rather than reflecting it. Thus, observing how NIR changes compared to red light provides an accurate indication of the presence of chlorophyll, which correlates with plant health."

$$NDVI = {(NIR - RED)\over (NIR + RED)}$$

Some of the limitations of NDVI include its sensitivity to the effects of soil and atmosphere, and dependency on the time an image was taken. That’s why it’s recommended to ascertain if your research question can be answered by NDVI or any other relevant index.

This link shows an example of land degradation assessment done using NDVI.

Some applications of NDVI include precision farming and to measure biomass. Whereas, in forestry, foresters use NDVI to quantify forest supply and leaf area index. Furthermore, NASA states that NDVI is a good indicator of drought.

The dark red areas are water bodies, whereas the reddish areas are urban areas with low vegetation.

Bonus question: Where do you see an airport in the image above?

3.1.2 Green NDVI

The Green Normalized Difference Vegetation Index (GNDVI) method is a vegetation index for estimating photo synthetic activity and is a commonly used vegetation index to determine water and nitrogen uptake into the plant canopy. Green NDVI is a modification of NDVI which uses NIR and GREEN bands for calculating the vegetative index. The benefit of GNDVI over NDVI is that the green band can cover a broader range of chlorophyll in plants than the red band. This applies to mature plants and can be useful for monitoring yield crop in the late growing season.

GNDVI has been used as an indicator to estimate the amount of Nitrogen available in plants, which impacts the growth and health of the plant.

$$GNDVI = {(NIR - GREEN)\over (NIR + GREEN)}$$

3.1.3 Enhanced Vegetation Index

EVI is similar to Normalized Difference Vegetation Index (NDVI) and can be used to quantify vegetation greenness. The Enhanced Vegetation Index was invented by Liu and Huete to simultaneously correct NDVI results for atmospheric influences and soil background signals, especially in areas of dense canopy. The value range for EVI is -1 to 1, and for healthy vegetation it varies between 0.2 and 0.8. EVI has been found to perform well under high aerosol loads , biomass burning conditions Huete et al., 2002. According to Huete et al 1997, the original EVI is

$$2.5*((NIR - RED)\over (NIR + 6*RED - 7.5*BLUE + 1))$$

The B(lue) band correspond to band 2 of Sentinel-2 MSI, NIR to Band 8 and R(ed) to band 4.

It is worth mentioning that EVI must be used to analyze areas with large amounts of chlorophyll (such as rainforests), and preferably with minimum topographic effects (not mountainous regions). One of the advantages of EVI is its high resolution and good spatial coverage over all terrains. It has also been used to identify drought related stress over different landscapes.

3.2 Water Indices

Similar to vegetation indices (VIs) that have been used to measure vigor of vegetation, researchers have developed water indices too. They have been mainly used to identify the urban water bodies, change in water depth, and for measuring changes in basin area after rare events. A few of them are discussed below.

3.2.1 Normalized Difference Water Index

The NDWI makes use of reflected near-infrared radiation and visible green light to enhance the presence of such features while eliminating the presence of soil and terrestrial vegetation features. It is suggested that the NDWI may also provide researchers with turbidity estimations of water bodies using remotely-sensed digital data.

Normalized Difference Water Index (NDWI), is used to differentiate water from the dry land or rather most suitable for water body mapping. Water bodies have a low radiation and strong absorbability in the visible infrared wavelengths range. NDWI uses near Infra-red and green bands of remote sensing images based on the occurrence. It can boost the water information efficiently in most of the cases. It’s subtle in land built-up and often ends up in overestimated water bodies.

It has been found that water features may not be accurately extracted using NDWI due to spectral confusion of built-up land with water bodies because built-up land may also have positive values in the NDWIderived image (Xu 2007). Therefore, a new method was proposed in 2006 by Xu called Modified NDWI which is discussed in the next section.

$$NDWI = {(B3 - B8)\over (B3 + B8)}$$$$NDWI = {(GREEN - NIR)\over (GREEN + NIR)}$$

Source

Please note that it may be confused with GNDVI as both use GREEN and NIR Bands. In NDWI, the normalized difference is between GREEN and NIR bands, whereas for GNDVI, the normalized difference is between NIR and GREEN.

3.2.2 Modified NDWI

To overcome the shortcomings of NDWI, Xu proposed the modified NDWI (MNDWI) in 2006. According to Xu (2006), the MNDWI can enhance open water features while efficiently suppressing and even removing built‐up land noise as well as vegetation and soil noise. The enhanced water information using the NDWI is often mixed with built‐up land noise and the area of extracted water is thus overestimated. Accordingly, the MNDWI is more suitable for enhancing and extracting water information for a water region with a background dominated by built‐up land areas because of its advantage in reducing and even removing built‐up land noise over the NDWI.

Please note that calculation of MNDWI using SENTINEL 2 imagery will require Pansharpening. For more information, users are requested to visit the following notebook.

3.3 Exporting Vegetation Indices as Covariates

In this section, we will learn how to export the calculated values as a CSV file that can be incorporated as covariates in an econometric model. The file that gets generated in this process contains three things:

  1. The parameters/features/columns contained in the original shape file that is used for setting the boundaries.
  2. The value of the index generated, which can be min, max, average, or median depending upon the Reducer function.
  3. The shape of the polygon which makes up the administrative boundaries.

This piece of code returns a CSV file which is saved in the Google Drive of the account which was used to register for Google Earth Engine. For more information, pleas refer to the official guide from Google Earth Engine here.

You can export a FeatureCollection as CSV, SHP (shapefile), GeoJSON, KML, KMZ or TFRecord using Export.table

3.4 Complete example comparing vegetation before and after rains in the state of Maharashtra in India

This is an example of how vegetation indices can be used to see the change in the vegetation after an event. We try to explore how the state of Maharashtra is transformed after the monsoon rains which last approximately from June to October and is instrumental in the agriculture of the state.

4. Hands-on Demo

We shall be taking the example of this study that was funded by 3ie and undertaken by Rohini Pande and Anant Sudarshan. This study evaluated the impact of a set of reforms to the environmental clearance process in India, which included subjecting larger projects to scrutiny from regulators, independent experts and the public. Mines that applied for clearance after the reforms experienced substantially shorter clearance times, but were more likely to deforest illegally before receiving clearance. Public hearings did not significantly alter the costs or benefits of the clearance process. We got the data from Harvard Dataverse repository maintained by 3ie.

The author uses remote sensing to capture the average, median, minimum and maximum EVI in an area one km around the location of the mine. To accomplish the task, we perform the following operations.

  1. Locate the mines on the map of India and clip an area one km around those locations.
  2. Calculate the EVI for those regions.
  3. Export the EVI values to be used as one of the covariates in the model.

The illustrative example below shows the code needed for exporting median EVI values. The same can be done for exporting minimum, average, and maximum value as well by changing the word median to the relevant function in the given code.

evi_image = ee.ImageCollection('COPERNICUS/S2_SR').filterDate('2020-01-01', '2020-12-31').median().clip(mine_area)

Additional Resources

  1. World Bank’s Open Night Lights tutorial, World Bank
  2. Geo4Dev Training Modules, Geo4Dev
  3. Geospatial Analysis Guide
  4. Automating Geospatial Processes, Helsinki University, 2019
  5. Earth Data Science, Earth Lab at University of Colorado
  6. Remotely Sensed Data for Effective Data Collection
  7. USGS Earth Explorer
  8. Google Earth Engine
  9. GEEMAP Tutorials
  10. A guidebook on mapping poverty through data and artificial intelligence, Asian Development Bank, April 2021
  11. Newcomers Earth Observation Guide, European Space Agency
  12. Index Database
  13. Causal Inference with Spatial Data, Osaka University
  14. Ethical Considerations When Using Geospatial Technologies for Evidence Generation, UNICEF, September 2018
  15. GeoEthics Blog
  16. OpenGeoHub

Statement of Contribution

This tutorial was developed by Aayush Malik on behalf of International Initiative for Impact Evaluation (3ie), with funding from Foreign, Commonwealth and Development Office (FCDO), Government of the United Kingdom through A2012 CEDIL agreement with CEDIL.

Developing a large piece of technical information requires support from a variety of people working in a team. This tutorial has been accomplished by the collective effort of multiple people working at the Data Innovations Group at the International Initiative for Impact Evaluation.

Douglas Glandon, Lead Evaluation Specialist, helped steer the project from the beginning by providing background information and work that 3ie had done so far on using remote sensing for impact evaluation. Moreover, he conceptualized the tutorial and provided substantial feedback on the content and user-friendliness of the code. Furthermore, he co-developed the section on benefits, applications, and challenges of remote sensing.

Jane Hammaker, Research Associate, acted as the Project Manager for this technical piece and was instrumental in taking care of the operational side by constantly staying in touch with FCDO, marketing the event to the right audience, communicating with the Communications Office at 3ie, and sharing feedback on the flow and structure of the tutorial.

Sayak Khatua, Evaluation Specialist, provided feedback on the tutorial, contributed to past work done by 3ie on the applications of big data for impact evaluation, and helped with the case-study development by sharing the data and checking the outputs of the case-study.

All the above members tested the code themselves and shared feedback on the output, explanation of output, graphical representation, and sharing the instances of where the code was not working. I am grateful to all these people for their contributions.

It is worth mentioning that this tutorial follows the general structure used by the Open Nighttime Lights tutorial. We suggest you to follow that tutorial too for a better understanding of how nighttime satellite imagery can be used to perform various forms of evaluations.