# INTRODUCTION¶

Much of the introduction was taken directly or adapted from the World Bank's Open Nighttime Lights Tutorial, which also provides guidance on using nighttime light data from satellites. The following Radiance Calibrated Nighttime Lights Tutorial differs from the Open Nighttime Lights Tutorial in focusing on data from sensors which were calibrated to avoid saturation of urban centers. For more details on Remote Sensing and the Nighttime Lights data, including videos, see the Open Nighttime Lights Tutorial.

This tutorial draws inspiration for the analyses of Radiance Calibrated Nighttime Lights from "Subways and Urban Growth: Evidence from Earth" (2018), by Marco Gonzalez-Navarro and Matthew Turner. However, it does not directly follow the methodology used in this paper for all calculations. As such, certain values may differ from those obtained by Gonzalez-Navarro and Turner when compared at a granular level.

## Introduction to Remote Sensing¶

This section is adopted from module 1, section 1 of Open Nighttime Lights.

Remote sensing is the science of identifying, observing, collecting, and measuring objects without coming into direct contact with them. This can be accomplished through many devices that carry sensors and capture the characteristics of Earth remotely. Sensors on board satellites also record the electromagnetic energy that is reflected or emitted from objects on Earth. There are two types of sensors:

• Passive sensors record the natural energy that is (naturally) reflected or emitted from the Earth's surface (e.g. sunlight, moonlight, city lights).
• Active sensors provide their own energy source for illumination (e.g. RADAR, LIDAR).

## Introduction to Radiance Calibrated Nighttime Light Data¶

This section is adapted from module 1, section 2 of Open Nighttime Lights.

### DMSP-OLS Overview¶

Sensors aboard Defense Meteorological Satellite Program (DMSP) satellites have been providing low-light imaging of the Earth's surface since the 1960s, with a digital archive established in 1992 at NOAA's National Centers for Environmental Information (NCEI), formerly the National Geophysical Data Center (NGDC). The Operational Linescan System (OLS) onboard the DMSP satellites (DMSP-OLS) provided the primary data on nighttime lights until the introduction of the Joint Polar Satellite System (JPSS) program, which officially launched its first satellite in 2017. This JPSS program uses the Visible Infrared Imaging Radiometer Suite Day/Night Band (VIIRS-DNB) sensor, the first of which was launched aboard the S-NPP satellite in 2011 to serve as a transition between DMSP and JPSS. While the DMSP-OLS product was discontinued in 2014, the data stored in NOAA's archive still serves as the primary source of nighttime light data from 1992-2014, and as such is a valuable tool for academics and policy makers.

### DMSP-OLS Usage and Limitations¶

While advanced for its time, the OLS is now old technology and was essentially unchanged since 1976 with the start of the Block 5D series of DMSP satellites. The OLS VIS, the low-light imaging band, has only 6-bit quantization (its digital values range from 0-63). The gain of the instrument, which is a multiplier on the signal (analogous to a volume setting), is adjusted on-the-fly onboard the satellite via an algorithm that takes into account the amount of expected moonlight and sunlight for the time and location. The dynamic gain adjustment of the OLS allows the VIS sensor to produce meaningful imagery under a wide range of conditions, from full sunlight during the day to a dark night with the moon below the horizon. The OLS gain states are not recorded in the global OLS data that is relayed back to the ground stations, making any downstream calibration difficult to achieve. The OLS VIS band has no on-board calibration, so the 64 (0-63) values are only relative values based on the gain in which the data were recorded. The lack of on-board calibration also means the the DMSP-OLS annual stable lights series derived from multiple OLS sensors and different years (1992–2013) are not comparable directly without inter-calibration.

For nighttime data, only imagery from nights with no moonlight present was included. Under no moonlight conditions, the dynamic gain algorithm sets the VIS gain to its maximum value. Using only these data gave some assurance that the gain was constant (albeit unknown) and that it was reasonable to temporally average these VIS band values. The high gain setting also meant that dimmer lights could be detected in the composites. The downside is that when the gain is set to its maximum value, the VIS band, with its 6-bit radiometric resolution, saturates over bright urban centers. This means all values have the maximum value of 63 leaving the variability within urban cores unresolvable (Figure 1.1).

Figure 1.1 A subset of the F18 2013 stable lights composite over Chicago and surrounding communities illustrates the effect that saturation in the DMSP-OLS VIS band has on the information content in urban centers. The upper left image is a grayscale image where the horizontal and vertical red lines are the locations for the profile data plotted in the two charts. The upper right image is the same subset, but with saturated values colored red. Notice that most of the Chicago metropolitan area is saturated so no further detail is available using these data. The two charts are plots of transects through the image, showing the average DN value in the composite along horizontal and vertical tracks through Chicago. While there is variation in average DN values across the smaller towns, the larger urban cores are saturated.

Saturation occurs when pixels in bright areas, such as in city centers, reach the highest possible digital number (DN) value (i.e., 63) and no further details can be recognized.

### Radiance Calibrated Nighttime Light Data¶

To account for the saturation of city centers in the DMSP-OLS nighttime light data, a limited set of observations were also collected with lower gain settings, making the sensor less sensitive to light. While this makes the sensor less able to identify dim lights, for the purposes of studying lighting within city boundaries it is a more useful source than the standard DMSP-OLS data. To provide this additional detail, the data collected from the sensors at lower gain levels were merged with the primary nighttime light product to recover the variation within bright areas. Radiance calibrated data are available from 1996 to 2010 at intervals of 1 to 4 years.

Given that the DMSP-OLS sensors had differing levels of sensitivity, the gain levels at which the sensors were set are not consistent across different satellites. When merging the radiance calibrated data, this is addressed by using the highest level of gain as a reference and weighting the lower-level gain data accordingly. This product is unitless due to the variable gain settings and a degredation in the sensors over time, and thus is recommended only for use which doesn't require numeric radiance values. Due to the differences in sensors and gain levels over time, it is necessary to perform inter-annual calibration to compare the radiance calibrated data from different years.

# INTRODUCTION TO GEOSPATIAL DATA AND TOOLS¶

## Data Structure¶

In geospatial data analysis, data can be classified into two categories: raster and vector data. A graphic comparison between raster and vector data can be found in the World Bank Nighttime Lights Tutorial module 2, section 1.

• Raster data: Data stored in a raster format is arranged in a regular grid of cells, without storing the coordinates of each point (namely, a cell, or a pixel). The coordinates of the corner points and the spacing of the grid can be used to calculate (rather than to store) the coordinates of each location in the grid. Any given pixel in the grid stores one or more values (in one or more bands).
• Vector data: Data in a vector format is stored such that the X and Y coordinates are stored for each point. Data can be represented, for example, as points, lines and polygons. A point has only one coordinate (X and Y), a line has two or more coordinates, and a polygon is essentially a line that closes on itself to enclose a region. Polygons are usually used to represent the area and perimeter of continuous geographic features. Vector data stores features in their original resolution, without aggregation.

In this tutorial, we will use vector and raster data. Geospatial data in vector format are often stored in a shapefile, a popular format for storing vector data developed by ESRI. The shapefile format is actually composed of multiple individual files which make up the entire data. At a minimum, there will be 3 file types included with this geographic data (.shp, .shx, .dbf), but there are often other files included which store additional information. In order to be read and used as a whole, all file types must have the same name and be in the same folder. Because the structure of points, lines, and polygons are different, each shapefile can only contain one vector type (all points, all lines, or all polygons). You will not find a mixture of point, line, and polygon objects in a single shapefile, so in order to work with these different types in the same analysis, multiple shapefiles will need to be used and layered. For more details on shapefiles and file types, see this documentation.

Raster data, on the other hand, is stored in Tagged Image File Format (TIFF or TIF). A GeoTIFF is a TIFF file that follows a specific standard for structuring meta-data. The meta-data stored in a TIFF is called a tif tag and GeoTIFFs often contain tags including spatial extent, coordinate reference system, resolution, and number of layers.

More information and examples can be found in sections 3 & 4 of the Earth Analytics Course.

## Python and Google Earth Engine API¶

To perform our analysis, we'll obtain the Radiance Calibrated Nighttime Light data from Google Earth Engine. For necessary Python setup and an introduction to our use of the GEE Python API, see the World Bank Nighttime Light Tutorial, module 2 sections 2-5. In particular, before proceeding you will need to have jupyter and geemap installed on your machine, and you will need to apply for a Google Earth Engine account here. It may take a day or longer for your Google Earth Engine account to be granted access.

Two of the primary packages we'll be using, Pandas and GeoPandas, must be installed according to their installation instructions: Pandas Installation and GeoPandas Installation. If you're on Windows, GeoPandas installation can occasionally be temperamental - using an environment, as described in the World Bank Nighttime Lights Tutorial, can often circumvent any issues, but if you're still having problems, there are a number of guides online, such as this Practial Data Science guide or this Medium post by Nayane Maia, which provide installation help. Using Windows Subsystem for Linux (WSL) can also make use of tricky packages like GeoPandas easier.

## Accessing the Radiance Calibrated Nighttime Lights Data¶

While the Radiance Calibrated Nighttime Lights data can be downloaded directly as GeoTIFFs from the NOAA/NCEI Earth Observation Group site, it can be more convienent and efficient to use Google Earth Engine's API to create a standardized process to ingest these data, particularly if more than one file is desired for analysis. We'll follow the World Bank Nighttime Light Tutorial, module 2, section 6, though with the Radiance Calibrated data rather than the standard DMSP-OLS data. First, we'll initialize (or authenticate, for the first use) our connection to Google Earth Engine. Next, we can view a map of the area we'll be working with for our examples today: Istanbul, Turkey.

With our basemap centered over Istanbul, we can now collect the DMSP-OLS radiance calibrated nighttime light data through the Google Earth Engine API. We'll obtain this data as an ImageCollection. Each Image is a representation of raster data, with the values for one or many spectral bands provided for a particular time range over an area.

The Image Collection ID for this data is NOAA/DMSP-OLS/CALIBRATED_LIGHTS_V4 (for details, see the GEE Catalog Page), and we can filter for a specific year to limit the amount of data once we bring it in. See the NOAA page for these data to identify which satellites (F12, F12-F15, F14-F15, F15, F16) provided which years (1996, 1999, 2000, 2002, 2004, 2005, 2010) for more information. We can take the 2010 data as an example, which was collected with the F16 satellite.

Using getInfo(), we're able to see the contents of the ImageCollection, which can be helpful both in understanding what you're working with and in diagnosing any issues. The first chunk of information describes the ImageCollection as a whole, while further down under 'features', the individual Images and their respective bands are identified. For example, this range of data we selected contains two Images (IDs: NOAA/DMSP-OLS/CALIBRATED_LIGHTS_V4/F16_20100111-20101209_V4 and NOAA/DMSP-OLS/CALIBRATED_LIGHTS_V4/F16_20100111-20110731_V4), each with two bands: avg_vis and cf_cvg. This matches the 'Bands' listed in the GEE Catalog listing, so all indications are we are good to proceed.

A next step can be to visualize these radiance-calibrated nighttime lights over our area of interest.

NOTE: While this code will allow you to obtain the correct, calibrated nighttime lights data in the ImageCollection dmsp10_vis, depending on the platform you're using to run this Python notebook, the visualization may appear saturated over the city. This is a limitation when combining some of the visualization tools with some programs for running Python code. If this happens with your image, you can run the same script on Google Earth Engine's code editor, which will allow you to view the correct gradient of light.

We can also use a defined boundary map of Istanbul to build our map and apply the Radiance Calibrated Nighttime Light data, which will allow us to filter the light data for our area of interest rather than using the data for the entire world. This can improve performance by limiting the amount of data being processed. We'll use the Global Administrative Unit Layers (GAUL) data, which provides geographic information on multiple levels of administrative areas for most countries worldwide. We can access this data through the Google Earth Engine as well (see GEE documentation here). The feature collection identification we need is FAO/GAUL/2015/level1, and we'll filter for ADM1_NAME of Istanbul. (For more information on the countries and administrative levels provided by GAUL 2015, download the GAUL2015_Documentation.zip from the GAUL Catalog Page. WhatsNewGAUL2015.pdf contains a table of all countries identifying which countries and administrative levels are available.)

We now can use these boundaries to clip the Radiance Calibrated Nighttime Lights data to our area of interest.

# OPERATIONS AND ANALYSIS¶

## Create areas around city center¶

In order to perform analyses on a geographic area, it can often be helpful to define boundaries to create an area of interest. Using the example of Istanbul, we can create an area around the city center with a radius of 750 m to demonstrate.

To create a circular buffer around our city center point, we can use geopandas. Geopandas is a popular Python package for working with geographic data, and it will easily allow us to create circular areas around a given point. To start, we will create our city center from the latitude and longitude values above using geopandas.points_from_xy(). Since we'll be wanting to calculate distance from points, we'll need to learn a bit about Coordinate Reference Systems (CRS). For more information on Coordinate Reference Systems in Python, see this Towards Data Science tutorial by Abdishakur.

One of the most common Coordinate Reference Systems is World Geodetic System 1984 (WGS84), which has the code EPSG:4326. This system defines locations on a three-dimensional surface using degrees (latitude and longitude) and is what is used by GPS systems. We can initially set our Coordinate Reference System (CRS) to 4326, since we will ultimately want to view and combine our data using this system, and we can easily view the details of this CRS using geopandas.crs():

We see that the name of the CRS is WGS 84 and that the axis use latitude and longitude in degrees. The Area of Use demonstrates that this is a global coordinate system and shows the degree bounds we're familiar with for lat/lon.

However, if we want to calculate the distance from a given point in meters, we can't use this CRS, as it's measured in degrees. Instead, we need to choose a CRS which can correctly identify meters around our given area of interest, which is Istanbul. This will require using a projected coordinate system, which is defined on a two-dimensional surface and will allow us to use traditional distance measures. To search for the appropriate CRS for a location, you can use https://epsg.io/ - we find that EPSG:5637 will work for Turkey. We can project to this CRS easily with geopandas using to_crs(). For more information about projections, see this ESRI explainer.

Now when we check for information on our Coordinate Reference System, we find we're using a Projected CRS which measures distance in metres and is used for Turkey. We are now all set to create a buffer area around our point of 750 meters as an example.

To use this buffer area with our other geographic information when mapping, we'll finally convert it back to EPSG:4326 and confirm we see the expected CRS information.

We can try plotting our buffer area, and we will be given a circular area with longitude and latitude as the x and y coordinates.

To plot a map of Istanbul as a basemap we have a few options - to see examples which work well with Geopandas, you can look at the examples in the GeoPandas Example Gallery. To keep things consistent with our satellite data, we will convert the FAO GAUL 2015 area for Istanbul which we used above to a type GeoPandas recognizes and use this as our base layer. For more information on converting between geemap and geopandas, see this documentation.

We can then plot both on the same graph to view our area in context of the map of Istanbul. We'll use matplotlib.pyplot to plot our map with both the Istanbul layer and the buffer area layer. For more information on mapping and plotting with Geopandas, see the documentation.

## Calculate light intensity within buffer areas¶

Since we're interested in the light intensity in these city areas, we can use the DMSP-OLS values from the avg_vis band and calculate the mean light intensity for our given area. We will first need to convert our buffer area to an ee.FeatureCollection type and apply .geometry() in order to use this area to filter the DMSP-OLS range and subsequently calculate the mean and standard deviation using geemap. First, we'll convert the buffer to a type usable by geemap and plot the area to confirm everything went according to plan.

Success! This is what we were expecting to see, and now we want to bring our radiance calibrated nighttime lights values back into use. We'll use this boundary area to clip our full DMSP-OLS data from 2010 to only provide values for our area of interest, and then we can use Google Earth Images reduction functionality to find the mean and standard deviation of the light intensity in this area. For more information on this function and application, see the World Bank Nighttime Lights Tutorial or the Google Earth Image Documentation.

We now have statistics for the intensity of light within a 750m radius of the city center of Istanbul. If we want to compare how light intensity changes as we move away from the city center, as Gonzalez-Navarro and Turner do in Subways and Urban Growth: Evidence from Earth (2018), we can create these boundary areas for different, increasing radii around the city center. While we could manually do this for each desired radius, it would be more efficient to create a function which we can reuse for any radii or type of satellite images.

We now have the mean and standard deviation light intensity for a range of increasing areas around the city of Istanbul. If we want to visualize the complete set of areas for which we have data, we can plot all of our values from the ist_buffs table - the first object returned from our function. We can first plot only the buffer areas themselves:

Once again, it would be helpful to have a background map of Istanbul to put these areas in context. We can use the same process as above for our single 750m buffer area to instead plot all our areas over the map of Istanbul:

## Characterize City Centrality¶

We can next use our values of mean radiance in each layer to characterize the centrality of a city. In "Subways and Urban Growth: Evidence from Earth", Gonzalez-Navarro and Turner use the regression $$lny_i=A+Blnx_i+\epsilon_i$$ to create these centrality measures, where $y_i$ is the mean light intensity within an area, $x_i$ is the radius of the associated area, and B is the rate at which light decays when increasing the distance from the city center. A higher negative value for B indicates that the city density decreases more rapidly, which the authors identify as an indication of greater centralization.

We'll be using the package scikit-learn, a powerful machine learning package in Python, to perform the regression. To install sklearn, follow the installation guide.

We can follow Gonzalez-Navarro and Turner (2018) and interpret the intercept as the brightness at the city center. The slope is interpreted as the rate at which light decays from the center, which provides a measure of city centralization. Comparing these measures of brightness and centralization across cities can produce useful comparisons, as can comparing the same city over time. However, in order to use these radiance calibrated nighttime lights data from multiple years and satellites, we'll need to first perform inter-annual calibration.

# INTER-ANNUAL CALIBRATION¶

## Inter-annual Calibration of Radiance Calibrated Nighttime Lights Data¶

As mentioned in Section 1, the variation between sensors and across years makes these data incomparable when used directly. Instead, we must perform inter-annual calibration to bring all data files to a common scale which we can directly compare over time. The documentation for these data provide instructions on inter-annual calibration using the 2005 data as a reference set. All other data files will be adjusted according to the formula $$Y=Coeff_0+Coeff_1*X$$ where Y is the adjusted data, X is the original data, and the coefficients are provided unique to each data file in Table 3 of the documentation.

We can create a pandas dataframe in Python which contains these values and which we can use to adjust each of the datasets.

We can use the data from 2004 as an example of performing this calibration. We'll be adjusting these 2004 values according to the formula and coefficients provided above, ultimately producing an output image $$Y_{2004}=1.062+0.761*X_{2004}$$ where $X_{2004}$ is the raw image obtained of the 2004 satellite data. We can first follow the same process as above to obtain the 2004 radiance calibrated nighttime lights data.

To correct these 2004 values according to our equation, we'll use the GEE .expression() method to apply our formula to the ImageCollection data. We'll define a function calibrate_lights() which will cycle through all Images in an ImageCollection (though in the case of our dmsp_04 ImageCollection, only a single Image is present), apply the formula to the values in each Image, and return an ImageCollection of the same size with values calculated from the formula.

Now, let's visualize the differences between our original values and these new calibrated ones. We'll follow the same process described in the World Bank's Open Nighttime Lights Tutorial, Section 5.1.3.

While this works well for a single year's calibration, if we want to calibrate multiple years it would be easier to create a function which can use our coefs_df DataFrame of coefficients and automatically calibrate the ImageCollections for each year. See the World Bank's Open Night Lights module 5, section 1.4 for a function which will obtain and apply coefficients as well as clip the final Images.

# USAGE EXAMPLE¶

These radiance calibrated nighttime lights can be used to study a number of phenomena within city boundaries where the amount of lighting causes saturation in the standard DMSP-OLS nighttime lights data. In "Subways and Urban Growth: Evidence from Earth" (2018), by Gonzalez-Navarro and Turner, one of the questions the authors consider is the effect of subways on urban centralization. They measure urban centralization with the light gradient and city center brightness from the regression equation we utilized above: $$lny_i=A+Blnx_i+\epsilon_i$$ where $y_i$ is the mean light intensity within an area, $x_i$ is the radius of the associated area, and B is the rate at which light decays when increasing the distance from the city center.

Using A and B as their dependent variables, the authors perform a first difference regression of these light values on the number of subway stations to identify the extent of centralization in cities with larger subway networks. The independent variable in their dataset is Dlop, which is the difference in natural log of the number of subway stations in a city over 5-year periods. The dependent variables are Dlinear_b1, which is the difference in light gradient over 5-year periods, and Dlinear_b0, which is the difference in light intercept over 5-year periods. They use country GDP (specifically, the difference in natural log of a country's GDP, Dlgdp), population (specifically, the difference in natural log of a country's population, Dlcountry_pop), continent (continent), and 5-year groupings (quinquenial) as controls and cluster on urbancode, which uniquely identifies cities.

The authors created a comprehensive subway dataset to provide the primary independent variable in this analysis. This dataset contains information on station locations and years built for all subway stations worldwide. They make this data available for all users, and it can be downloaded here in the subway_census_v1 folder. For more information on this dataset and a tutorial on its use, see the Geo4Dev Subways Learning Module.

We can use the package statsmodels in order to run this regression. To install, you can use conda install statsmodels if you're using a conda environment or pip install statsmodels if not.

First, we'll read in the replication data, which you can download here as a 7Z folder. 7Z files are compressed using the open-source 7-Zip tool and typically require a third-party app to access. There are many apps available to open 7Z files, but an easy option for Windows users is to download the 7-Zip app directly. MacOS users can use a tool like the Unarchiver. For a more detailed description of working with 7Z files, see this tutorial for Windows or this guide for macOS.

Once we've read in the data, we'll create a pandas datetime version of the year field to use as one of our indices and include urbanname (the city name) as another index field.

Now, we can run our first regression using the light gradient, B, as the dependent variable to measure the effect of subway extent on city centralization.

The coefficient on the Dlop value, which represents the difference in the natural log of subway stations, is the value of interest in this regression: we find a coefficient of 0.0233, which indicates that as the number of subway stations in a city increases, the rate at which light decays becomes more positive. Since a light gradient is negative, with a large negative number indicating the light drops off quickly after leaving a city center, this positive coefficient provides evidence in favor of a smoother light gradient in cities with more extensive subway systems. This aligns with the results in Gonzalez-Navarro and Turner (Table 10, Panel a) and is interpreted as subways encouraging greater decentralization of cities.

We can next consider the same regression using the light intercept - that is, the brightness at a city's center - as the dependent variable.

We find a coefficient of -0.1966 for our variable of interest, Dlop, the difference in natural log of subway stations over 5-year periods. This indicates that as the number of subway stations in a city increases, the brightness of the city center itself decreases, which is further evidence of subways encouraging decentralization of a city.