0% found this document useful (0 votes)
30 views17 pages

Modeling Population Density Using Land Cover Data: Yongzhong Tian, Tianxiang Yue, Lifen Zhu, Nicholas Clinton

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 17

Ecological Modelling 189 (2005) 72–88

Modeling population density using land cover data


Yongzhong Tian a,b,∗ , Tianxiang Yue a , Lifen Zhu c , Nicholas Clinton d
a State Key Laboratory of Resources and Environment Information System, Chinese Academy of Sciences, Beijing 100101, PR China
b School of Resources and Environment Sciences, Southwest China Normal University, Chongqing 400715, PR China
c Center for Chinese Agriculture Policy, Chinese Academy of Sciences, Beijing 100101, PR China
d International Institute for Earth System Sciences, Nanjing University, Nanjing 210093, PR China

Received 18 May 2004; received in revised form 16 February 2005; accepted 24 March 2005
Available online 16 June 2005

Abstract

This study investigates the correlation between land cover data and other factors that affect population distribution. The results
show that land cover data contain sufficient information to infer population distribution and can be used independently to model
the spatial pattern of population density in China. China’s population distribution model (CPDM) was developed based on land
cover data to calculate population density in China at 1 km resolution. For cells in rural areas, population probability coefficients
were calculated based on weighted linear models, the weights of land cover types being derived from multivariate regression
models and on a qualitative order of land types in 12 agro-ecological zones. For cells in urban areas, a power exponential decay
model based on city size and the distance from urban center was employed to calculate population probability coefficients. The
models were validated in sampled cells using ancillary population data. The validation shows the mean relative error of estimated
population to be 3.13 and 5.26% in rural and urban areas, respectively. Compared to existing models, the accuracy of CPDM is
much higher at cell, county and province scales.
© 2005 Elsevier B.V. All rights reserved.

Keywords: Population density; Land cover; Distance decay model; Grid

1. Introduction and Lindenmayer, 1998). This paper focuses on mod-


eling human population density on the basis of land
Population density could be distinguished into cover data.
human population density (Yue et al., 2003, 2005a) The distribution of human population has been iden-
and wildlife population density (Miller et al., 2002; tified as one of the key datasets required for improving
Sekimura et al., 2000; Kokko and Lindstroem, 1998). understanding of human impacts on land and water
For both of them, land cover is a control variable (Yue resources. Human population distribution data will
et al., 2005b; Alexande and Shields, 2003; McCarthy also improve projections of the environmental con-
sequences that may be expected under varying mod-
∗ Corresponding author. Tel.: +86 1064889847. els of economical growth (Clarke and Rhind, 1992;
E-mail address: tianyz@lreis.ac.cn (Y. Tian). Elvidge et al., 1997). These predicted environmental

0304-3800/$ – see front matter © 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.ecolmodel.2005.03.012
Y. Tian et al. / Ecological Modelling 189 (2005) 72–88 73

and economic data provide valuable information for tialized census data in Qinghai–Tibet plateau based
decision-makers such as governments, enterprises and on five factors. Yang et al. (2002) studied the method
individuals. Human population density is one of the of population specializations and used this effect to
major indicators used to describe human population create a simulated population grid of the Shandong
distribution. However, it is a general indicator, and con- province. Sutton et al. (2003) explored some theoretical
sequently hides the internal variability of choropleth and empirical efforts at estimating ambient population
units (Zhang, 1997). The larger the size of the choro- density and proposed a quantitative means for evalu-
pleth unit, the more generalized the data are (Demers, ating their validity. Yue et al. (2005a,b) used surface
1997). So human population density in an adminis- modelling of population distribution (SMPD) based on
trative region does not provide the spatially explicit grid generation method to simulate the population den-
details (Elvidge et al., 1997) necessary to describe sity in 1 km cells in 1930, 1949, 2000 and 2015.
the actual distribution of human population in the Although these studies focused on the factors that
region (Demers, 1997). Usually, two mapping meth- affect population distribution, the correlations and rel-
ods, dot distribution mapping and dasymetric mapping, ative influence of those factors are rarely investigated.
are employed to improve the detail of human popu- The inclusion of too many factors makes modeling
lation distribution mapping. Dot mapping records the more complicated, causes information redundancy, and
amount and location of human population by points, but magnifies the effect of redundant information in pop-
the exact geographical location is not precise (Demers, ulation density simulation (Zhang and Yang, 1992).
1997). Dasymetric mapping based on the idea of choro- Some studies did not fully consider the difference in
pleth maps, invented by Wright in 1936, improves the model parameters between regions. Others ignored the
quality of the original choropleth maps by dissolv- difference of distribution pattern between rural and
ing the boundaries imposed by some smaller sub-areas urban populations and used the same models in both
(Wright, 1936; Demers, 1997). With the rapid increase rural and urban regions. In this study, we used China
in demand for high resolution population data and the as a case study to address three issues:
introduction of new technology, such as geographic
information systems (GIS) and remote sensing (RS), 1. Can land cover data be used to model population
many recent studies use the digital simulation technol- density on grid-cells independently?
ogy of Dasymetric mapping to estimate raster based 2. Using this technique, what is the most effective
human population distributions. method for creating a raster based population den-
Gridded Population of the World (GPW) and Land- sity surface?
scan are frequently used global population datasets. 3. What is the accuracy of estimate?
GPW proportionally allocated total population to grid
cells based on the assumption that population is dis-
tributed evenly over administrative units (Tobler et al., 2. Data sources
1997). LandScan distributed census counts to 30- by
30-s grid-cells based on probability coefficients cal- 2.1. Data of population
culated from road proximity, slope, land cover, and
nighttime lights (Dobson et al., 2000, 2003; Dobson, The original census data comes from Chinese pop-
2003). In addition, Lo (2001) developed allometric ulation by county in 2000 (Chinese Ministry of Public
growth models and linear regression models to model Security, 2001). These data are available as an attribute
the non-agricultural population of China in 1997 using of the administrative polygons at the county level.
the nighttime lights data from the Defense Meteorolog-
ical Satellite Program (DMSP) Operational Linescan 2.2. Data of land cover
System (OLS). Sutton (1997) indicated the limitation
of nighttime data in rural regions and estimated only the The original land cover data comes from Data Cen-
urban population of North America. Wang and Michel ter of Resources and Environment, Chinese Academy
(1996) took advantage of a gravity model to simulate of Sciences (CAS). It is derived from Landsat The-
the urban population density. Liao and Sun (2003) spa- matic Mapping (TM) images in 2000 according to the
74 Y. Tian et al. / Ecological Modelling 189 (2005) 72–88

Table 1 the towns which have lower administrative grades than


Land Cover classification system and residential classification county seats were changed from ‘rural residential’ land
Primary types Secondary types Residential cover type to ‘urban’ type. This step was necessary to
types standardize the labeling of towns and cities (as ‘urban’)
Code Name Code Name
1 Farmland 11 Paddy field A
for the purposes of population distribution modeling.
12 Non-irrigated field A In the original land cover data, areas of dense popula-
tion (‘urban’ areas from the modeling perspective) are
2 Woodland 21 Forest A
22 Shrub A only designated as ‘urban’ if they have an administra-
23 Sparse woodland A tive grade of county seat or higher.
24 Other woodland A
3 Grassland 31 Dense grassland A 2.3. Digital elevation model (DEM)
32 Moderate dense A
grassland The original DEM is the Chinese part of the
33 Sparse grassland A GTOPO30 (global topography at 30-arcsecond reso-
4 Waters 41 River N lution) dataset derived from U.S. Geological Survey’s
42 Lake N (USGS) Earth Resources Observation System (EROS)
43 Reservoir and N
Data Center. After reprojecting and resampling, it was
pond
44 Glacier and snow N converted into a 1 km resolution DEM of China. The
45 Beach N slope data were derived from the DEM.
46 Bottomland N
5 Build up area 51 Urban area R 2.4. Temperatures
52 Rural Residential R
area The temperature data originated from the observa-
53 Other built-up area R tions of 636 climate stations of the National Weather
6 Unused land 61 Sandy land N Bureau. To get the ground temperatures of grid-cells,
62 Gobi N the geo-located temperatures were first converted to
63 Saline-alkali land N
sea level equivalent values according to the altitude
64 Marsh N
65 Bare land N of observatory stations and the temperature lapse rate
66 Bare rock and N (6.4 ◦ C 1000 m). Then they were interpolated to 1 km
gravel land resolution cells using Ordinary Kriging. Finally, the
67 Other unused land N DEM was used to convert the interpolated temperature
Note 1: N means exclusive area, R means residential areas, A means of raster cells at sea level back into that at ground level
non-residential areas, R and A are habitable areas. Note 2: in this according to the temperature lapse rate and the altitude
paper, the land cover types will be replaced by a symbol composed
of cells.
of “land” and their codes. For example, land11 will represent paddy
field. Ancillary data: Other data include railways, high-
ways, rivers and cities. They were derived from the
database of Chinese resources and environment at
Class System of Land use/Land cover in Remote Sense 1:1,000,000 and 1:4,000,000 scales in ArcInfo cover-
Mapping at 1:100,000 scale (Table 1, Liu, 1996; Liu age format.
and Buhe, 2000; Liu and Zhuang, 2003). In its original All the data were integrated into ESRI ArcGIS in an
data format, the land cover data is an ArcInfo coverage. Albers Equal Area Map Projection.
For this study, it was converted into 25 raster files in
Environment Systems Research Institute (ESRI) Grid
format at 1 km resolution using the cell-based encod- 3. Can land cover be used independently to
ing method of percentage breakdown. Each raster file model population density in grid-cells?
represents a land cover type; the value of each grid-
cell in the raster file is the percentage of the type of Population geographers divide the factors that affect
land cover in the grid-cell. Prior to this conversion, all population distribution into two types (Zhang, 1997;
Y. Tian et al. / Ecological Modelling 189 (2005) 72–88 75

Table 2
Principal component analyses of factors affecting population distribution in China
Principal component 1 2 3 4 5
Eigenvalues 5190630 1079054 125633 14 7
Contribution percent 81.16 16.87 1.96 0 0
Eigenvectors
DEM −0.3076 0.9512 0.0258 0.0032 0.0018
Slope −0.0004 0.0007 −0.0004 0.2878 −0.9577
Temperature 0.0013 −0.0034 −0.0002 0.9577 0.2878
Cropland 0.9497 0.3086 −0.0538 −0.0001 −0.0002
Rural residential area 0.0591 −0.008 0.9982 0.0002 −0.0004

Hu, 1983). One is natural factors, such as climate, of the population distribution information. Compar-
elevation and slope, which are the basic factors of pop- atively, the other factors are far less important than
ulation distribution. The second type, which also plays land cover. Although the DEM is also highly correlated
a decisive role in population distribution, is socioeco- with the second principal component, which contains
nomic factors, such as land cover, railway, road, and nearly 17% of the population distribution information,
city location (Zhang, 1997). The most often used fac- it may not be used as a variable to model population
tors for modeling population distribution include eleva- density because of its correlation with other factors
tion, slope, temperature, transportation line proximity, (Table 3).
cities and land cover (Yue et al., 2003). However, most
of these factors are closely related to land cover. 3.1.2. Buffer analyses of land cover and other
factors
3.1. Correlations between land cover and other Buffer analyses of railways, highways, rivers and
factors affecting Chinese population distribution cities were conducted to detect the relationships
between land cover and these factors. Because those
3.1.1. Land cover and natural factors factors are linear or point features, they cannot be
To examine the redundancy of information in the directly used to analyze their correlation with land
factors affecting population distribution, a principal cover. For this study, buffer zones of main highways
component analysis based on 1 km2 grid-cells was con- and rivers (10 zones), as well as railways and cities
ducted on two types of land cover (farmland, rural res- (20 zones), were built at 10 km intervals (Fig. 1), and
idential area) and three main natural factors (elevation, average rural population density, farmland area, and
slope and temperature). The analysis results (Table 2) rural residential area were summed in each zone. This
show that the first principal component includes 81% of established the relationships between population den-
the information about population distribution. Accord- sity and buffer distance, farmland and rural residential
ing to the load matrix, the factor highly correlated area (Table 4). The rural population density used here
with the first principal component is farmland, the is derived from the choropleth map of rural population
coefficient of which reaches 0.9497. It means that density by county unit, converted to a 1 km resolution
land cover data, especially farmland, includes most grid.
Table 3
Pearson correlation coefficients between the DEM and other factors affecting population distribution at three scales
Scale Degrees of freedom Pearson correlation coefficients Critical correlation coefficients
between the DEM and at the 0.01 confidence level
Slope Temperature Cropland Residential area
Cell 9503010 0.3709 −0.7722 −0.4528 −0.2164 <0.1
County 2354 0.6546 −0.6542 −0.5617 −0.4363 <0.1
Province 31 0.5925 −0.68851 −0.6712 −0.5362 0.4426
76 Y. Tian et al. / Ecological Modelling 189 (2005) 72–88

Fig. 1. Main railways, highways, rivers and cities in China.

The results listed in Table 4 show that rural popula- 3.2. Influence of China’s population system and
tion density is significantly correlated with the distance land system upon population distribution
from railways, highways, rivers and cities, indicating
that these factors have important influence on popula- Systems of land use and population location in
tion distribution. The results in Table 4 also show that China are the origin of relationships between popu-
both farmland and rural residential areas have high cor- lation and land cover distribution evident in modern
relation with the average rural population density in the times. China was an agricultural country historically,
buffer zones of railways, highways, rivers and cities, and owning a piece of land has been the dream of Chi-
suggesting that population density change with the dis- nese peasants for generations. Land Revolution in the
tance from these factors can be derived from land cover early 1950s allocated an average amount of land to
data. peasants. Although collective farming was conducted

Table 4
Relationships of average rural population density with buffer distance, farmland area and rural residential area in buffer zones of some affecting
factors
Factors Items Buffer distance (km) Farmland (100 m2 ) Rural residential area (100 m2 )
Railway Model y = −124.57 ln(x1 ) + 704.77 y = 0.0845x2 − 36.353 y = 0.8151x3 + 33.62
R2 0.9955 0.9781 0.984
Highwaya Model y = −126.88ln(x1 ) + 587.94 y = 0.0034x21.3845 y = 0.9244x3 + 9.9662
R2 0.99 0.9998 0.9914
Rivera Model y = 342.96x1−0.1648 y = 0.1132x2 − 80.349 y = 0.6852x3 + 54.116
R2 0.8902 0.961 0.9509
City Model y = 811.77 e−0.017x1 y = 2e − 7x22.5898 y = 0.0889x31.4396
R2 0.9928 0.9544 0.9463
a Buffer analyses of highways and rivers employed ten buffer zones of 10 km intervals and railways and cities employed 20 zones.
Y. Tian et al. / Ecological Modelling 189 (2005) 72–88 77

in 1960s and 1970s, it had little influence on the clus- to the farm is constrained to about 1 km, meaning that
tering of population because the collective farming was peasants are not able to live far from their lands.
employed in the smallest administrative unit (the farm-
ing team, Shengchandui, is usually less than 1 km2 ), 3.3. Two problems caused by scale issues
had no more than 100 persons, and peasants partici-
pating in collective farming still resided in their origi- 3.3.1. Are the transportation lines, rivers and
nal homes. From 1980s, the Household Responsibility cities always necessary for modeling?
System of land use was implemented across the coun- Population distribution is influenced by factors at
try. Peasants contracted and managed dispersed lands all scales simultaneously, however, the intensity and
according to their familial networks. The government manifestations of the influences from different scales
of China recently declared that the household responsi- are very different. The factors at a national scale, such
bility system would continue and the land tenure would as main roads, railways, rivers and cities, control the
extend to 30 years. This shifted the contract term of basic pattern of national population distribution. But at
land from short to long and further enhanced the spa- a county level, the roads between county sites, or towns,
tial attachment between land and peasant. However, or even villages may be much more important for rural
another Chinese system, the Household Registration population distribution. It follows that the entrances of
System (hukou), divided the population into “agricul- larger roads and the stations of railways are better rep-
tural” and “non-agricultural” sectors, with different resented as attractions for population by point features
privileges according to their residential status, and fur- rather than line features. Lower–grade roads are very
ther reinforced the land-peasant bond. As noted by easy to access owing to their numerous entrances, mak-
Zhong (2001), the system almost binds peasants to their ing their “linear attractiveness” more obvious. Fig. 2
lands. illustrates the difference of attraction, from a residential
Generally, the rural–urban migration now is much perspective, between an expressway and an ordinary
easier than in earlier times, and in some regions, peas- highway in Chongqing. Clearly, if administrative units
ants are encouraged to move into cities and towns. at the county scale were used as control areas in popu-
These people are registered as urban citizens and get lation simulation, the factors affecting population at
the corresponding legal statuses and privileges, but at the county scale would be more important and the
the same time, they lose their lands in rural regions and factors at a national scale would be almost unneces-
are not counted in the rural population any more. For sary. However, it is very difficult to get road network
most of the peasants who rush into cities, their inten- data in all counties. Furthermore, a series of tasks,
tions are to hunt for jobs, not to settle down. Before they such as determination of appropriate road types to be
can obtain the legal status and corresponding privileges examined, establishment of function indices, function
as urban citizens, they will not give up their rights to distances and distance decay manners of these roads
contract rural land and, in any case, they would like to for the purposes of population estimation, are prob-
retain their rights to houses and lands in rural regions lematic due to high place to place variability at smaller
(Yang and Wang, 2002; Zhong, 2001). When farming is scales. The same questions would apply to railways,
busy, they will go back to their villages. For this reason, rivers and cities. However, land cover data, which has
the relaxation on reform of migration laws is unlikely close correlation with those factors as discussed in the
to alter the close relationship between rural population foregoing sections, can be easily obtained via remote
and land in the near future. sensing imagery.
In addition, it should also be noted that the progress
of agricultural industrialization in China is slow and 3.3.2. Is the residential area of land cover
the agro-production mode is still predominantly at the sufficient for modeling?
pre-industrial stage in which labor is the most impor- The loss of information in each land type at different
tant productive factor. The distribution of population scales is rather different. Li and Zhuang (2002) stud-
is in large part a relic of a more agrarian time (Zhong, ied land cover data at three scales and found that the
2001). Before the rise of large-scale mechanized agri- area error of all land cover types became larger when
culture, especially in the south, the proximity of labor the scale became smaller. For example, in Shandong
78 Y. Tian et al. / Ecological Modelling 189 (2005) 72–88

Fig. 2. Comparison of the point attraction of expressway and the linear attraction of ordinary highway for population in Beibei, Chongqing.

province of China, the area of urban and rural residen- and 24.8%, respectively, while the difference of wood-
tial land at 1:100,000, compared with that at 1:10,000, land and grassland between the two scales is very
decreased 10.4 and 15.1%, respectively. At the same small. This indicates that small residential areas are
time, paddy field and non-irrigated field increased 18.2 “absorbed” by larger patches of farmland. In addition,

Fig. 3. Contrast of residential areas in three typical topographical regions at two scales.
Y. Tian et al. / Ecological Modelling 189 (2005) 72–88 79

after studying a single property using land cover data at estimation of population density within rural or urban
1 km resolution, Liu et al. (2001) found that more than grid-cells. Integrated models run this risk due to the
60% of residential areas were lost. Fig. 3 compares over generalization of variables controlling population
the residential areas in land cover maps at 1:100,000 distribution when applied to disparate land types.
used in this study with that at 1:10,000 in three typical
topographical regions of China. The comparison shows 4.1.4. Zonal modelling
that little of the residential area is lost in plains, while Significant geographic differences of natural, social,
in mountains and hills, a greater number of residential economic and historical factors have resulted in differ-
areas are lost due to their scattered distribution. So, for ent characters of land use in different regions. Modeling
China, a country mostly composed of mountains and in relatively homogeneous zones can reduce effects
hills, it is not enough to take into account only residen- of these differences on population distribution. The
tial areas when modeling the population in grid-cells model was implemented in homogenous zones in an
using land cover data. effort to minimize population estimate error resulting
from the geographic disparity in the factors affecting
population.
4. Methods
4.2. Model
4.1. Research approach
Consistent with the above analyses and research
4.1.1. Modeling based on land cover approach, the Chinese Population Distribution Model
Land is a synthesis of many natural and social fac- (CPDM) was developed for land cover data at 1 km
tors which have acted in concert for long periods of resolution according to the following:
time (Zhang, 1997). As discussed above, because land
cover is highly correlated with many factors affecting Vjr Vju
POPij = Pir × k + Piu × k (1)
j=1 Vjr j=1 Vju
population distribution, it is a good proxy for estimat-
ing the characteristics of population distribution. This
study determines the feasibility of modeling population Where POPij is the population of the jth cell in ith
density by means of land cover data. county; Pir is the rural population in ith county; Piu is
the urban population in ith county; k is the number of
4.1.2. Controlling total population at the county cells in ith county; Vjr is the rural population probability
scale by rural area and urban area, respectively coefficient of the jth cell in ith county; Vju is the urban
Administrative polygons at the county scale are the population probability coefficient of the jth cell in ith
minimum mapping units available at a countrywide county.
scope at present, and therefore represent the best coun- The first term of Eq. (1) is used to calculate the
try level population data. Quantification of total popu- rural population in cells, and the second term is used
lation by rural area and urban area at the county scale to calculate urban population. From Eq. (1), it can be
therefore avoids the reallocation of population between seen that a key point of the model is the determination
different counties and between rural and urban areas. of population probability coefficients of rural and urban
Population probability coefficients were normalized in cells, which are normalized by county. Thus the rural
rural and urban areas of counties in order to make use and urban population in each county will be distributed
of the best national level population data. to the cells in the county based on their normalized rural
and urban coefficients.
4.1.3. Modeling rural area and urban area
separately 4.2.1. Probability coefficients of rural population
Urban and rural areas were treated separately due Land cover types are divided into two kinds in this
to the difference of affecting factors between rural and study: exclusion areas and habitable areas (Table 1).
urban areas on population distribution. This separation The former, including all the secondary types of waters
was necessary to avoid mistakes such as over or under and unused land, are excluded as input variables for
80 Y. Tian et al. / Ecological Modelling 189 (2005) 72–88

modeling because they are not fit for habitation at least (1) Ecological zoning: Based on the mode of agri-
at present, which means their probability coefficients cultural production, the productivity of farmland,
of population ought to be assigned zero. The latter, heat, water, and landform, China can be classified
including all the other land cover types, also can be sub- into 12 agro-ecological zones (Fig. 4, Chen, 2001).
classified into residential and non-residential areas. (2) Selecting variables: Univariate linear regression
Residential areas, including urban areas, rural residen- models were built between area of habitable land
tial areas and other built-up areas, are the essential types and rural population by county in each eco-
variables for modeling because they are directly related logical zone. If the model was significant, then the
to population distribution. Although non-residential land type in the model was used as one of the vari-
areas, including farmland, woodland and grassland, ables to determine probability coefficients.
are not areas used primarily for dwellings, they are (3) Modeling: The following multivariate regression
never the less habitable and contain some scattered res- model was developed to calculate the weight of
idences. They are likely designated “non-residential” input variables in each ecological zone:
due to the resolution of remotely sensed data, map accu- y = β0 + β1 x1 + β2 x2 + · · · + βn xn (2)
racy, and the scale issues of the land cover data (Fig. 3).
The loss of residence information differs significantly where y is rural population; x is the area of land
between southern and northern regions, plains and cover type chosen in step 2; β is the parameter.
mountains and hills. For example, population is more Although the chosen variables are significant
centralized in the north and plains areas, whereas it in univariate regression, some of them might not
is more scattered in the south, mountainous areas and be significant in multivariate regression. How-
hills. To address this issue and calculate accurate proba- ever, for a reasonable regression equation, only
bility coefficients of cells, it is necessary to examine the those significant variables should be used. More-
residential information inherent to each type of land. over, the collinearity between variables should be
For this purpose, the following process was adopted: as small as possible (William, 2000; Zhang and

Fig. 4. Agro-ecological zones of China (from Chen, 2001).


Y. Tian et al. / Ecological Modelling 189 (2005) 72–88 81

Table 5
Stepwise regression coefficients of land cover types with population by county in each agro-ecological zone
Ecological zones Land cover types

Land 11 Land 12 Land 21 Land 22 Land 23 Land 24 Land 31 Land 32 Land 33 Land52
1 0.00104 0 0 0 0 0 0 0 0 0.22361
2 0.02784 0.02784 0 0 0 0.00755 0 0 0 0.21002
3 0.03869 0.03112 0 0 0.00424 0 0 0 0 0.08826
4 0.07662 0.02550 0.00314 0 0.00266 0.00471 0 0 0 0.10240
5 0.01259 0.00448 0.00049 0 0 0 0.00053 0.00087 0 0.05307
6 0.01596 0.00776 0 0.00056 0.00022 0.00061 0 0 0 0.01610
7 0.02590 0.00418 0.00139 0 0 0 0 0 0 0.48514
8 0.06216 0.04468 0 0 0 0 0 0 0 0.17316
9 0.03566 0.03227 0 0.00236 0 0 0 0.00175 0.00262 0.22797
10 0.04991 0.02052 0 0 0 0 0 0 0 0.14774
11 0.01044 0.01044 0 0 0 0.00095 0 0 0 0.06393
12 0.01250 0.01021 0 0 0 0.00041 0 0 0 0.05540
Note: Italic numbers have been modified according to the qualitative order and the food productivity of each land type.

Yang, 1992). The best solution for these prob- for each grid-cell:
lems is to calculate the parameters using multivari-
ate stepwise regression. To ensure the credibility 
10
Vjr = Ajn Wmn (4)
of the parameters, the critical significance level
n=1
in the calculation was 0.10. Table 5 shows the
regression coefficients of the chosen land cover where Ajn is the area of the nth land type in the jth
types. grid cell; Wmn is the modified stepwise regression
(4) Modifying coefficients: The outcome of this step- coefficient of nth land type in mth ecological zone
wise regression is derived from the statistical cor- shown in Table 5.
relations between population and land, but it must
also obey certain geographic rules. Usually, land 4.2.2. Probability coefficients of urban population
with higher food productivity can support a larger Generally, urban average population density is
population so it is logical to assign a higher coef- directly proportional to urban size. The larger the size,
ficient to this type of land. According to the the larger the population density is (Ye, 2001). How-
degree of correlation between agricultural land and ever, inner differences of urban population density also
rural population, the following qualitative order exist. Usually, population density decreases from the
of weight for land types was built as expression center toward the outside of town. Although the factors
(3), and those coefficients that did not conform to affecting urban population distribution are too many
expression (3) were modified (Table 5) according to be formulated by a simple mathematical equation
to the food productivity of each land type (Cao (Zhang, 1997), as a general rule, the density is corre-
et al., 1995; Chen, 2001). lated with the size of the town and the distance from
the center. That general rule can be expressed as the
rural residential areas ≥ paddy field ≥ following equation:
non-irrigated field > woodland and VL = f (S, D) (5)
grassland > other land ≥ 0 (3) Where VL is the urban population density coefficients
at site L, S is the urban size, and D is the distance of L
(5) Calculating probability coefficients: The follow- from the urban center.
ing weighted linear model was used to calcu- As a model of spatial distribution of urban popula-
late the probability coefficients of rural population tion, the exponential decay model (Clark, 1951) was
82 Y. Tian et al. / Ecological Modelling 189 (2005) 72–88

typical in early research. Thereafter, there appeared In Eq. (7), the functional radius is derived from
many other relative models such as the Gauss model the radius of a circle with same area as the town. In
(Sherratt, 1960; Tanner, 1961), and the negative power- regard to Eq. (7), two other questions deserve men-
exponential model (Smeed, 1961). In recent years, frac- tion. One is how to determine the centers of towns.
tal models for the decay of urban population density If the polygons of towns are irregular, their centroids
were proposed (Chen, 1999; Feng, 2002,). As research may fall outside of them. To solve this problem, the
progressed, some “unified” models were introduced label points of the urban polygons, which always locate
such as the Newling model, a unification of the Clark inside the polygons, were used as their centers. Another
model and Sherratt model (Newling, 1969), and the question relates to the value of σ. According to the
Gamma model, which unifies the Clark and Smeed theory of urban development, a city will experience
models (Batty and Longley, 1994). The former models ‘developing’, ‘developed’ and ‘old’ stages; each stage
are special cases of the unified models (Shen, 2002). has different characteristics of spatial distribution of
The following equation is an often-used power- population density. For example, in conjunction with
exponential model of spatial urban population distri- suburbanization, the population density of urban center
bution: will decrease, which may cause a crateriform distribu-
  σ  tion of population density. Most Chinese cities, despite
r
ρ(r) = ρ0 exp − (6) many years of building since the reform and open-
r0 ing, are still in the early developed stage; many middle
where ρ(r) is the population density at distance r from and small towns are still in the developing stage. This
the urban center; ρ0 is the population density of the means the value of σ will not be high. To distinguish the
urban center; r0 is the functional radius of the town, σ is difference of spatial structure caused by the different
a restriction parameter reflecting the tendency for spa- development stages, all the cities and towns in China
tial changes of information entropy in the urban geo- were classified into three types (main cities, middle
graphic system. Although the model has been widely cities and small towns) according to their size and other
criticized, urban economists have demonstrated its the- socio-economical indicators such as gross domestic
oretical justification (Wang and Michel, 1996). production (GDP), and population (Zhang, 1997; Zhou
It is very difficult to get the central population den- and Xu, 1997; Feng, 2002). The values of σ for the three
sity of all towns in China. However, as discussed above, types are 1.43, 1.26 and 1.14, respectively, which are
the population density has positive correlation with the derived from the population density sampling of 112
size of the town, and the functional radius is also related urban cells.
to urban size. Therefore, by integrating Eqs. (5) and (6), After calculating the probability coefficients of rural
a model based on the size of town and the distance from and urban populations, Eq. (1) was implemented in
urban center was constructed. This model was used to ArcGIS software to create the simulated Chinese pop-
calculate probability coefficients of urban population ulation density map (Fig. 5).
as follows:
  σ 
rn 5. Validation and comparison
Vmn = An ln Am exp − √
Am /π
 σ 5.1. Validation
= An ln Am exp[−(rn π/Am ) ]
= An ln Am exp(−1.9874rnσ A−0.5σ
m ) (7) Samplings of population density were conducted
in rural and urban cells to validate the simulation
where Vmn is the urban population probability coeffi- outcome. For rural population density, pure random
cients of the nth grid-cell in the mth town, An is the sampling was applied. According to the formula for
area of urban land in the nth grid cell, Am is the area necessary sample size (Tian et al., 2003), when the
of the mth town, rn is the distance from the nth cell to sampling precision is more than 95%, allowable error
the urban center, σ is a parameter corresponding to the is less than 7% and the sampling ratio is 0.5, the
stage of development of the town. necessary sample size is 196. A random function was
Y. Tian et al. / Ecological Modelling 189 (2005) 72–88 83

Fig. 5. Simulated Chinese populations of 1 km2 grid-cells in 2000.

introduced to select the 196 sample cells from a sorted of villages or blocks within sampled cells were used
list of all the cells in the grid of simulated population. to assess the accuracy of the simulation (Table 6).
For urban population, stratified sampling was applied The assessment shows that the accuracy of the sim-
and 112 cells from towns with different landforms ulation in rural areas is much higher than that in urban
and sizes were used to do the validation. Because it is areas. The percentage difference usually is less than
very difficult to get the population in the 1 km2 cells 5 in populous regions such as Huanghuaihai plains,
selected for sampling, the average population densities Sichuan basin, and the northeast plain. In some regions

Table 6
Statistics of percentage difference between simulated population and sampled population in Landscan and CPDM
Regions Models Cells Percentage difference Mean absolute relative
difference (%)
>3% >5% >7% >10% >15%
Rural area CPDM Number 196 94 78 69 46 32 3.13
% 100 47.96 39.80 35.20 23.47 16.33
Landscan Number 196 111 102 89 67 55 5.49
% 100 56.63 52.04 45.41 34.18 28.06
Urban area CPDM Number 112 62 58 46 39 26 5.26
% 100 55.36 51.79 41.07 34.82 23.21
Landscan Number 112 76 68 59 50 36 8.17
% 100 67.86 60.71 52.68 44.64 32.14
84 Y. Tian et al. / Ecological Modelling 189 (2005) 72–88

Table 7
Comparison of population error between Landscan and CPDM at county and province scales
Level Model Administrative unit Error

>3% >5% >7% >10% >15% >20% >50%


County Landscan Number 2317 1586 1199 852 502 236 138 22
% 100 67.37 50.93 36.19 21.33 10.03 5.86 0.93
CPDM Number 2317 0 0 0 0 0 0 0
Province Landscan Number 31 17 9 3 2 1 1 0
% 100 54.84 29.03 9.68 6.45 3.23 3.23 0.00
CPDM Number 31 0 0 0 0 0 0 0

with sparse population, the percentage difference is over, 1 km resolution is insufficient to show the inner
very large, even greater than 100 in a few cells, but the details of urban population distribution. Another reason
absolute error of population is low, often less than one for the differences between the sample and the estimate
person. In urban areas, more than half of the sampling from the simulation is that the population density used
cells have more than 3% difference. For some cities, for validation was from villages, which does not corre-
especially for those located in hills and mountains, the spond exactly to the location of sampling cells.
average difference is larger than that in other towns
because of their complicated spatial structure. In this 5.2. Comparisons
study, it was assumed that the towns have only one cen-
ter, which is not accurate for towns with multi-centers The available population datasets at 1 km resolution
or sub-centers. This could explain the discrepancy of in China include the Landscan of Dobson et al. (2003),
urban population density. However, for population sim- the results from Lo (2001) and the results form Yue
ulation at 1 km resolution, the population distribution et al. (2003). However, the outcome of Yue has not
in towns is not as important as that in rural areas. More- been validated and its digital version is unavailable.

Fig. 6. Comparison of population density in Northeast of China.


Y. Tian et al. / Ecological Modelling 189 (2005) 72–88 85

Fig. 7. Geographic profiles of population density in five populous regions of China.


86 Y. Tian et al. / Ecological Modelling 189 (2005) 72–88

Lo estimated only the non-agricultural population of 6. Discussion and conclusions


China, with mean relative error of population density
(in 17 cities) of 40.84% (Lo, 2001). Landscan is the CPDM is a dasymetric interpolation model. Its key
best available population dataset for comparison with function is to distribute census counts to cells based
the outcome of this study (CPDM). on population probability coefficients. Although many
To compare the population of CPDM with Land- factors affect population distribution (elevation, slope,
scan, Landscan was re-projected to the Albers pro- roads, may all be used as input variables to calculate
jection. To avoid “loosing people” when re-projecting probability coefficients), it was found that land cover
Landscan, we first convert Landscan to points, re- is the best choice for population distribution modeling,
project the points, rasterize the points to a high resolu- because it is highly correlated with many other factors
tion, and lastly resample the points to the 1 km cell size. and includes most of the population distribution infor-
The population of CPDM and Landscan are com- mation.
pared at three scales (sampling cells, county, and The raster files of land cover types used in this study
province, Tables 6 and 7). Table 6 shows that the per- were converted from vector land cover polygons using
centage difference of CPDM in sampled cells is lower the percentage breakdown encoding method, to avoid
than that of Landscan in both rural and urban areas. the loss of land information in land cover data with
A closer examination shows that the cells with higher a single property (type). The weights of land types
differences are mainly located in the highway and the were determined by their stepwise regression coeffi-
suburb areas of cites. The validation of Landscan also cient, and also controlled by a qualitative geographical
shows that the population of cells near transportation rule. The control of total population by rural and urban
lines are overestimated and the population of suburbs area at the county scale not only prevented population
are underestimated because the coverage of cities in from being reallocated, but also rendered some national
Landscan is likely a large underestimate. Table 7 shows scale factors, such as main roads and cities, unneces-
that the population errors of 1586 counties and 17 sary for inclusion in the model. The difference between
provinces in Landscan are more than 3%, which are the factors influencing population distributions in urban
respectively 67.37 and 54.84% of the counties and and rural areas was addressed through the use of dif-
provinces of China. Additionally, 21.33% of counties ferent modeling algorithms for urban and rural areas.
and 6.45% of provinces have population errors are than Zonal modeling was used to efficiently reduce the geo-
10% in Landscan. However, the errors in CPDM are graphic difference of land cover and customize model
zero at both county and province scales because of con- parameters for local regions. Based on a distance decay
trol of total population by county. function, the model of urban population coefficients
Fig. 6 demonstrates the difference between CPDM, was built as a function of urban size and distance from
Landscan and mean population density of counties in urban center, which greatly simplifies the calculation.
Northeast of China. Clearly, CPDM and Landscan con- Although CPDM can elucidate basic patterns of urban
tain a much more detailed population distribution than population distribution, some details of the distribution
that of the mean density map. It also can be seen that caused by multi-centers or sub-centers are not taken
the population of Landscan is more centralized in the into consideration.
transportation lines than that of CPDM, and the sub- The validation and comparison of the CPDM output
urbs of cities have more population in CPDM than that shows that the simulation based on land cover is feasi-
in Landscan. ble and the outcome has high accuracy. Compared with
Six pairs of population profiles of CPDM and Land- other models, CPDM has only two input variables, cen-
scan are plotted in Fig. 7. This figure shows that the sus data and land cover data. It reduces the redundancy
ranges of population density in profiles of CPDM are of information and thus avoids the overuse of infor-
less than that of Landscan, many of the cells with mation in the factors affecting population distribution.
high values in Landscan are highways. It also can be At the same time, it improves the simulation accuracy
noted from Fig. 7 that the rural population of CPDM and the calculation efficiency. The method presented
is slightly higher than that of Landscan, especially in in this study is applicable to other countries, especially
Sichuan basin. agricultural countries.
Y. Tian et al. / Ecological Modelling 189 (2005) 72–88 87

Further research is needed to assess whether impro- Dobson, J.E., Bright, E.A., Coleman, P.R., Bhaduri, B.L., 2003.
vements to the simulation are possible. For example, LandScan2000: a new global population geography. In:
Remotely-Sensed Cities. Taylor and Francis, London, pp.
using a finer agro-ecological zoning may make the wei-
267–279.
ghts of land cover types more credible; more detailed Elvidge, C.D., Baugh, K.E., Kihn, E.A., Kroehl, H.W., Davis, E.R.,
information about urban spatial structure may improve 1997. Mapping city lights with nighttime data from the DMSP
the precision of urban population density estimates; operational linescan system. Photogrammet. Eng. Remote Sens.
and validating with the actual population in 1 km grid- 63, 727–734.
Feng, J., 2002. Modeling the spatial distribution of urban population
cells can better depict the source of error in the model.
density and its evolution in Hangzhou. Geogr. Res. 21, 635–646
(in Chinese).
Hu, H.Y., 1983. Study on the Distribution of Population in China.
Acknowledgments East China Normal University Press, Shanghai (in Chinese).
Kokko, H., Lindstroem, J., 1998. Seasonal density dependence, tim-
This work is supported by a Project of the National ing of mortality, and sustainable harvesting. Ecol. Model. 110,
Natural Science Foundation of China (40371094) 293–304.
Li, J., Zhuang, D.F., 2002. Analysis of feasible scales of spatial data.
and the National Basic Research Priorities Program
Acta Geogr. Sinica 57 (Suppl.), 52–59 (in Chinese).
(2002CB412506) of the Ministry of Science and Tech- Liao, S.B., Sun, J.L., 2003. GIS-based spatialization of population
nology of the People’s Republic of China. The authors census data in Qinghai–Tibet plateau. Acta Geogr. Sinica 58,
would like to thank Dr. Mingkui Cao for taking time 25–33 (in Chinese).
to reivew this paper, the authors also thank the annoy- Liu, J.Y., 1996. Study on the Micro Survey of Chinese Resources
mous reviewers for theirs comments and suggestions and Environment by Remote Sensing and its Dynamic. Weather
Press, Beijing (in Chinese).
on the earlier version of the manuscript. Liu, J.Y., Buhe, A.S., 2000. Study on spatial-temporal feature of mod-
ern land-use change in China: using remote sensing technique.
Quaternary Sci. 3, 229–239 (in Chinese).
References Liu, J.Y., Zhuang, D.F., 2003. Land cover classification of China:
integrated analysis of AVHRR imagery and geographical data.
Alexande, R.R., Shields, D.W., 2003. Using land as a control variable Int. J. Remote Sens. 24, 2485–2500.
in density-dependent Bioeconomic models. Ecol. Model. 170, Liu, M.L., Tang, X.M., Liu, J.Y., 2001. Research on scale effect
193–201. of spatial data of 1 km grids. J. Remote Sens. 5, 183–189 (in
Batty, M., Longley, P.A., 1994. Fractal Cities: A Geometry of Form Chinese).
and Function. Academic Press, London. Lo, C.P., 2001. Modeling the population of China using DMSP oper-
Cao, M.K., Ma, S.J., Han, C.R., 1995. Potential productivity and ational linescan system night-time data. Photogrammet. Eng.
human carrying capacity of agro-ecosystems: an analysis of food Remote Sens. 67, 1037–1047.
production potential of China. Agric. Syst. 47, 387–414. McCarthy, M.A., Lindenmayer, D.B., 1998. Population density and
Chen, B.M., 2001. Integrated Productivity and Carrying Capacity movement data for predicting mating systems of arboreal mar-
of Agricultural Resources in China. Weather Press, Beijing (in supials. Ecol. Model. 109, 193–202.
Chinese). Miller, D.H., Jensen, A.L., Hammill, J.H., 2002. Density dependent
Chen, Y.G., 1999. The fractal model of urban population density. J. matrix model for gray wolf population projection. Ecol. Model.
Xinyang Normal College 12, 60–64 (in Chinese). 151, 271–278.
Clark, C., 1951. Urban population densities. J. Roy. Stat. Soc. 114, Newling, B.E., 1969. The spatial variation of urban population den-
490–496. sities. Geogr. Rev. 59, 242–252.
Clarke, J.T., Rhind, D.W., 1992. Population data and global envi- Sekimura, T., Roose, T., Li, B., Maini, P.K., Suzuki, J., Hara, T.,
ronmental change. Human Dimensions of Global Environmental 2000. The effect of population density on shoot morphology of
Change Programme, Report No. 3. International Social Science herbs in relation to light capture by leaves. Ecol. Model. 128,
Council, Paris. 51–62.
Demers, M.N., 1997. Fundamentals of Geographic Information Sys- Sutton, P., 1997. Modeling population density with night-time satel-
tem. Wiley, New York. lite imagery and GIS, computers. Environ. Urban Syst. 21,
Dobson, J.E., Bright, E.A., Coleman, P.R., Durfee, R.C., Worley, 227–244.
B.A., 2000. LandScan: a global population database for estimat- Sutton, P.C., Elvidge, C., Obremski, T., 2003. Building and evaluat-
ing populations at risk. Photogrammet. Eng. Remote Sens. 66, ing models to estimate ambient population density. Photogram-
849–857. metr. Eng. Remote Sens. 69 (5), 545–553.
Dobson, J.E., 2003. Estimating Populations at Risk. In: Geographical Shen, G.Q., 2002. Fractal dimension and fractal growth of
Dimensions of Terrorism. Routledge, New York and London, pp. urbanized areas. Int. J. Geogr. Information Sci. 16, 419–
161–167. 437.
88 Y. Tian et al. / Ecological Modelling 189 (2005) 72–88

Sherratt, G.G., 1960. A model for general urban growth. In: Manage- Yang, X.H., Jiang, D., Wang, N.B., 2002. Method of pixelizing pop-
ment Sciences, Model and Techniques, Proceedings of the Sixth ulation data. Acta Geogr. Sinica 57 (Suppl.), 71–75 (in Chinese).
International Meeting of Institute of Management Sciences, vol. Ye, Y.X., 2001. City and optimization of land use in 21 century. China
2, Pergamon Press, Elmsford, NY, pp. 147–159. Land Sci. 15, 10–13 (in Chinese).
Smeed, R.J., 1961. The Traffic Problem in Towns. Manchester Sta- Yue, T.X., Wang, Y.A., Liu, J.Y., Chen, S.P., Qiu, D.S., Deng, X.Z.,
tistical Society, Manchester. Liu, M.L., Tian, Y.Z., Su, B.P., 2005a. Surface modelling of
Tanner, J. C., 1961. Factors Affecting the Amount Travel. Road human population distribution in china. Ecol. Model. 181 (4),
Research Technical Report No. 51. HMSO, London, pp. 15–220. 461–478.
Tian, Y.Z., Qiu, D.C., Yang, Q.Y., Yin, W., 2003. An allocation model Yue, T.X., Wang, Y.A., Liu, J.Y., Chen, S.P., Tian, Y.Z., Su, B.P.,
of urban land price monitoring sites. J. Southwest China Normal 2005b. MSPD scenarios of spatial distribution of human popu-
Univ. (Nat. Sci.) 28, 313–323 (in Chinese). lation in China. Population Environ. 26 (3), 207–228.
Tobler, W., Deichmann, U., Gottsegen, J., Maloy, K., 1997. World Yue, T.X., Wang, Y.A., Chen, S.P., Liu, J.Y., Qiu, D.S., Deng, X.Z.,
population in a grid of spherical quadrilaterals. Int. J. Population Liu, M.L., Tian, Y.Z., 2003. Numerical simulation of population
Geogr. 3, 203–225. distribution in China. Population Environ. 25 (3), 249–271.
Wang, F.H., Michel, G.J., 1996. Simulating urban population den- Zhang, C., Yang, B.G., 1992. Basic of Quantitative Geography.
sity with a gravity-based model. Socio-Economic Plann. Sci. 30, Higher Education Press, Beijing (in Chinese).
245–256. Zhang, S.Y., 1997. China’s Population Geography. East China Nor-
William, H.G., 2000. Economics Analysis. Prentice-Hall, New mal University Press, Shanghai (in Chinese).
Jersey. Zhong, D.J., 2001. Misplay of China: influence of public domain
Wright, J.K., 1936. A method of mapping densities of population system and residence-registered system on land resources. New
with cape cod as an example. Geogr. Rev. 26, 103–110. Econ. 3, 18–22 (in Chinese).
Yang, C.M., Wang, S., 2002. Study on the choice of path for agricul- Zhou, C.S., Xu, X.Q., 1997. Population distribution and its change
ture system in China. Finance Res. 9, 12–22 (in Chinese). in Guangzhou city. Trop. Geogr. 17, 53–60 (in Chinese).

You might also like