Applying species distribution modeling for the conservation of Iberian protected invertebrates

Abstract. This article outlines the approaches to modeling the distribution of threatened invertebrates using data from atlases, museums and databases. Species Distribution Models (SDMs) are useful for esti ‐ mating species’ ranges, identifying suitable habitats, and identifying the primary factors affecting species’ distributions. The study tackles the strategies used to obtain SDMs without reliable absence data while exploring their applications for conservation. I examine the conservation status of Copris species and Graellsia isabelae by delimiting their populations and exploring the effectiveness of protected areas. I show that the method of pseudo ‐ absence selection strongly determines the model obtained, generating different model predictions along the gradient between potential and realized distributions. After assess ‐ ing the effects of species’ traits and data characteristics on accuracy, I found that species are modeled more accurately when sample sizes are larger, no matter the technique used.

The rapid disappearance of habitats and species starkly contrasts the need to conserve biodiversity against our inability to inventory and protect all species individually. Knowledge about biodiversity remains insufficient because many species are still not described (the "Linnean Shortfall"; Brown and Lomolino 1998) and the distributions of described species often are inadequately defined (the "Wallacean Shortfall"; Lomolino 2004). It is therefore essential to identify threatened species and describe their distributions using approaches that overcome the time and budget constraints of systematic conservation planning. Araújo et al. (2007) demonstrated the need for additional protected areas for the effective conservation of the diversity of plants and vertebrates in the Iberian Peninsula. Preliminary data suggest that the existing network of reserves also would be ineffective in representing invertebrate species (Verdú and Galante 2009). Unfortunately, the conservation of invertebrates faces serious challenges due to their high diversity, complex life cycles and difficult taxonomy, among other factors (see New 1998).
Geographic Information Systems (GIS) significantly advanced the conservation of endangered species because they allow us to delimit species' potential distributions (e.g. Hortal et al. 2005), to control their populations (e.g. Davies et al. 2005), to analyze their niche (Peterson et al. 2002), design networks of protected areas (e.g. Pearce and Boyce 2006), and to forecast the future (e.g. Hill et al. 2002). Together, the databases taken from atlases, museums and herbaria have emerged as a valuable source of species' occurrence records (e.g. Elith and Leathwick 2007). Unfortunately, these data from heterogeneous sources may contain errors or have been obtained using a biased sampling procedure , Newbold 2010. Besides, they do not usually provide reliable absences needed to perform consistent predictive models (Anderson et al. 2003), so alternatives have been sought generating models based only on presences (Hirzel et al. 2002, Pearce andBoyce 2006), sometimes employing pseudo-absences obtained in different ways (Zaniewski et al. 2002, Engler et al. 2004, news andupdate thesis abstract Applying species distribution modeling for the conservation of Iberian protected invertebrates Lobo et al. 2006Lobo et al. , 2010. For my doctoral thesis, I evaluated the utility of SDMs for the conservation of threatened invertebrates in the Iberian Peninsula (Chefaoui 2010). The majority of the species studied here have been designated by the European Union as species of "community interest" requiring protection and conservation (Habitats Directive). I used presence-only data on Iberian threatened invertebrates obtained from museums, atlases and databases. I applied presence-only methods such as ENFA (Ecological Niche Factor Analysis) and MDE (Multi-Dimensional Niche Envelope), in addition to other methods that require presences and absences (here, pseudo-absences): GAM (Generalized Additive Models), GLM (Generalized Linear Models) and NNET (Neural Networks Models). I approached methodological issues concerning the difficulties associated with predicting the distribution of species when reliable absence data are not available, and explored the possibilities of SDMs as a tool for conservation of endangered and threatened Iberian invertebrates. In this respect, I explored the applications of SDM to estimate species ranges, identify suitable habitats and the primary factors affecting species' distribution in order to assess the conservation status of threatened invertebrates.
Dung beetle populations, which are in decline in the Iberian Peninsula, play a critical ecological role in extensive pasture ecosystems by recycling organic matter. We delimited the potential distribution of the two species of Copris (Coleoptera, Scarabaeidae) that inhabit the Iberian Peninsula using ENFA (Chefaoui et al. 2005). ENFA is a presence-only method that compares the environmental values of the localities where the species has been observed with respect to the environmental values of the territory studied (Hirzel et al. 2002). We explored the environmental niche occupied by each species in a small region, the Community of Madrid (CM), to restrict the role of dispersal constraints discriminating possible areas of co-occurrence and identifying the specific environmental characteristics of each species. We identified that solar radiation and the presence of calcareous soils are critical to the presence of Copris hispanus, while Copris lunaris requires siliceous soils and high rainfall. Both Copris species are distributed along a geographic and environmental gradient from the Tajo basin (warmer, dryer, with strong annual weather variations) where only C. hispanus is found, towards the mountain slopes of the Sistema Central (colder, higher rainfall) where C. lunaris predominates. The environmental niches of both species are distributed along a Dry-Mediterranean to Wet -Alpine axis, and overlap in areas of moderate temperatures and precipitations in the north of CM.
We also studied the degree of protection of key populations of C. hispanus and C. lunaris, making a proposal to improve their conservation. To evaluate the conservation status of Copris species, we took into account the size of protected sites as well as the values of habitat suitability in each protected natural site and Natura 2000 network. We found that Copris species were poorly conserved in the previous protected sites network: for C. hispanus only two protected sites measured around 30 km 2 , and for C. lunaris a single area measured 183 km 2 . However, protection provided by Sites of Community Importance (SCIs) seems to improve the general conservation status of these species in CM because the area and connectivity of protected sites have been increased substantially. Chefaoui and Lobo (2008) assessed the effects of pseudo-absences on model performance when reliable absence data are not available. We compared seven procedures to generate pseudoabsence data to be used in GLM-logistic regressed models. These pseudo-absences were selected randomly or by means of presence-only methods (ENFA and MDE) to model the distribution of a threatened endemic Iberian moth species (Graellsia isabelae). Our purpose was to show the possibility of achieving different forecasted distributions depending on the method and the threshold used to select these pseudo-absences.
The results showed that the pseudoabsence selection method greatly influenced the percentage of explained variability, the scores of the accuracy measures and, most importantly, the SDM applied to invertebrate conservation predicted range size. As we extracted pseudoabsences from environmental regions further from the optimum established by presence data, the models obtained better accuracy scores, and over-prediction increased. Conversely, the profile techniques that generated wider unsuitable areas, produced functions with lower percentages of explained deviance and poorer accuracy scores, but more restricted predictive distribution maps, similar to the observed distribution. The random selection of pseudo-absences generated the most constrained predictive distribution map.
Based on results of the aforementioned work, we identified the environmental variables most relevant for explaining the distribution of Graellsia isabelae and assessed this species' conservation status (Chefaoui and Lobo 2007). We modeled the potential distribution of the insect by performing GLM with pseudo-absence data selected from an ENFA model. We found that the best predictor variables were summer precipitation (ranging from 1250 mm to 3250 mm), aridity, and mean elevation. This species prefers habitats with mid-range mountain conditions. With respect to host plants, the presence of G. isabelae was associated mainly with Pinus sylvestris and P. nigra.
Moreover, we found 8 areas exclusively in the eastern Iberian territory, and a larger unoccupied habitat in the western Iberian Peninsula, indicating that this species is probably not in equilibrium with its environment because of historical factors (Chefaoui and Lobo 2007). We suggested that the current distribution of the species was associated with the dynamism of its host plants during glacial periods of the Holocene, when the forests of Pinus sylvestris decreased strongly in the northwestern part of the peninsula. After analyzing the possibility of connectivity and fragmentation of the eight populations delimited as well as the degree of protection of G. isabelae on the SCIs, we found that the SCIs under protection did not seem sufficient to maintain current populations. Moreover, our study rejected the idea that the species was expanding its range due to reforestation. Because the conservation of G. isabelae depends on the forests of Pinus sylves-tris and P. nigra located both inside and near to SCIs, we suggested that the reintroduction of the species in these habitats could improve its conservation.
To understand the limitations and possibilities of SDM techniques, we evaluated the effects of species' traits and data characteristics on the accuracy of SDMs for red-listed invertebrates (Chefaoui et al. 2011). We applied three SDM techniques (GAM, GLM and NNET) using pseudoabsences to model the distribution of 20 threatened Iberian invertebrates. We correlated the accuracy of the obtained models with several data characteristics and species' ecological traits. We examined two data characteristics, the amount of data (N) and the relative occurrence area (ROA), and both significantly affected the accuracy of the models. Greater AUC values and higher sensitivity scores were obtained from samples for which there were more than 200 records. In general, species whose distributions were most accurately modelled were those with a greater sample size or smaller ROA. In addition, species related to habitats that are problematic to detect using GIS data, such as riparian or humid areas, seemed to be more difficult to predict.

Summary
The performance of SDMs depends on the type of data and the characteristics of the species. Presence-only methods (ENFA and MDE) achieved worse validation results and overpredicted more than techniques using pseudo-absences. Nevertheless, presence-only methods can be very useful for obtaining pseudo-absences and discovering the environmental response of species. The method of pseudo-absence selection strongly determined the predicted range size, generating different model predictions along the gradient between potential and realized distributions. There is an added difficulty in obtaining predictions that closely approximate the realized distribution of species under non-equilibrium conditions, because both presence and absence data may be possible under similar environmental conditions. Irrespective of the approach used, species' distributions are modelled more accurately when samfrontiers of biogeography 3.3, 2011 -© 2011 the authors; journal compilation © 2011 The International Biogeography Society ple sizes are larger. Species in habitats that are difficult to detect using GIS data, such as riparian species, thus may tend to be more difficult than most to predict.

Availability of thesis
Printed and PDF copies are available in the Science Faculty Library, Universidad Autónoma de Madrid (http://biblioteca.uam.es/ciencias/). A PDF copy is also available at request from the author.