geo-perspective Recent advances in probabilistic species pool delineations

. A species pool is the set of species that could potentially colonize and establish within a community. It has been a commonly used concept in biogeography since the early days of MacArthur and Wilson’s work on Island Biogeography. Despite their simple and appealing definition, an operational application of species pools is bundled with a multitude of problems, which have often resulted in arbitrary decisions and workarounds when defining species pools. Two recently published papers address the operational problems of species pool delineations, and show ways of delineating them in a probabilistic fashion. In both papers, species pools were delineated using a process-based, mechanistical approach, which opens the door for a multitude of new applications in biogeography. Such applications include detecting the hidden signature of biotic interactions, disentangling the geographical structure of community assembly processes, and incorporating a temporal extent into species pools. Although similar in their conclusions, both ‘probabilistic approaches’ differ in their implementation and definitions. Here I give a brief overview of the differences and similarities of both approaches, and identify the challenges and advantages in their application.


Introduction
The species pool is a common concept used in theoretical and applied studies in biogeography (Ricklefs 1987, Cornell andHarrison 2014). A species pool is the set of species that could potentially colonize and establish within a community (Zobel 1997, Zobel et al. 1998, Pärtel et al. 2011a, Lessard et al. 2012. Despite such an initially simple, appealing definition, an operational application of a species pool is bundled with a multitude of problems, often resulting in arbitrary decisions, such as fixed geographical extents of regional pools, or thresholds on environmental parameters. These fixed extents and thresholds are often applied simply because we lack a quantitative mechanism to ascertain which species belong to the species pool and which do not. Does a spatial extent of a few km 2 surrounding a site capture all species which can potentially occur in it? Is an environmental threshold of 5°C sufficient to decide whether a species belongs to the species pool or not? Do we simply use all species for which we have data? These questions accumulate when researchers are faced with the need to delineate a species pool, and so does the associated problem of not having an empirical answer to many of them.
While carefully selected thresholds can be appropriate for some research questions (Lewis et. al 2016), arbitrary choices are generally undesirable, and can even have severe effects on the outcome of studies of community-assembly processes (Kraft et al. 2007, Kissling et al. 2012, Eiserhardt et al. 2013, Karger et al. 2015. The definition of a species pool, itself, suggests that in practice species pools should be treated probabilistically. This is an important point, which has been made by several authors before (e.g. Ewald 2002, Karlson & Cornell 2002, Graves & Rahbek 2005, but unfortunately has often been ignored in biogeographical studies because of the practical problems this would create when actually applying it.
Two recent papers (Karger et al. 2016, however, pick up the practical challenges of avoiding arbitrary decisions when delineating species pools. They highlight how species pools can be delineated in a probabilistic fashion, how they can be used to detect biotic interactions

Abstract.
A species pool is the set of species that could potentially colonize and establish within a community. It has been a commonly used concept in biogeography since the early days of MacArthur and Wilson's work on Island Biogeography. Despite their simple and appealing definition, an operational application of species pools is bundled with a multitude of problems, which have often resulted in arbitrary decisions and workarounds when defining species pools. Two recently published papers address the operational problems of species pool delineations, and show ways of delineating them in a probabilistic fashion. In both papers, species pools were delineated using a process-based, mechanistical approach, which opens the door for a multitude of new applications in biogeography. Such applications include detecting the hidden signature of biotic interactions, disentangling the geographical structure of community assembly processes, and incorporating a temporal extent into species pools. Although similar in their conclusions, both 'probabilistic approaches' differ in their implementation and definitions. Here I give a brief overview of the differences and similarities of both approaches, and identify the challenges and advantages in their application.

opinions, perspectives & reviews
frontiers of biogeography 8.2, e30545, 2016 Comparison of a site-specific (black) and species-specific (red) approach. (a) A sitespecific approach is based on a fixed distance from a focal site to an occurrence (d), while (b) a species-specific approach is based on a dispersal vector ( ). If d and are the same, both approaches will result in the same probability for a species to be part of the species pool. If, however, two species with different dispersal abilities are considered (here shown as different dispersal kernels), the probabilities will differ.

a) b)
graphical structure of community-assembly processes (Karger et al. 2016), or investigate timerelated processes of community assembly (Karger et al. 2016). Although similar in their conclusions, both 'probabilistic approaches' differ in their implementation and definitions. These distinctions are important if probabilistic species pools are implemented in the future, and here I therefore give a short overview of the differences and similarities of both approaches.

Site-specific vs. species-specific approaches
Both approaches , Karger et al. 2016) are probabilistic, but differ in the way the probabilities are calculated. The method used by  can be called a 'site-specific' approach, while that adopted by Karger et al. (2016) is a 'species-specific' one. A 'site-specific approach' should not be confused with a 'sitespecific species pool'; species pools are always linked to a focal site, by definition. In the approach used by , all probabilities are calculated based on a directional vector originating at the site (Fig 1a). To do so,  use the distance from a focal site to define probabilities of dispersal. The further away another site is, the lower the probability that a species occurring in this distant site is part of the species pool of the focal site. Therefore, all species with an equal distance to the focal site have the same probability to be part of the species pool.
The approach by Karger et. al (2016) differs as it uses a 'species-specific approach'. Probabilities of dispersal (P DD ) are in this case given by a dispersal kernel centred on each occurrence of a species. In this approach, species which have the same distance from a focal site (or focal unit, sensu Karger et. al 2016), do not necessarily have the same probability to be part of the species pool. Their probability instead depends on a speciesspecific dispersal rate, estimated using dispersal traits or simply set to a fixed distance per year.
Both approaches certainly have their advantages and drawbacks. Karger et al.'s (2016) approach ideally needs very detailed information on dispersal rates for each species that could oc-cur in a focal unit, information which is often not available. Even though dispersal rates are known for a few species, and are available via a few databases (e.g. the TRY 1 and D3 2 databases for plant traits), information on actual dispersal rates is typically sketchy. A practical alternative, for the time being, is to estimate dispersal from related morphological or ecological traits (Tamme et al. 2014), to circumvent this problem.
Using a fixed distance from a focal site, as  did, is therefore an easy workaround of this problem, and certainly can be justified in specific cases. In the case of , who studied hummingbirds -a rather homogeneous taxonomic group with comparable dispersal traits -equal dispersal probabilities between species might be appropriate. This problematic, however, if such an approach is extended to more than one taxonomic group with heterogeneous dispersal traits. Mathematically, both approaches will result in the same probabilities if fixed distances are used (Fig. 1a). However, they differ as soon as different dispersal rates per species are assumed (Fig. 1b).

Integrating phylogenies (Lessard et al. 2016) and time frames (Karger et al. 2016)
Despite several similarities of both publications, they also contain unique aspects. The integration of probabilistic species pools into an analysis of phylogenetic community composition is certainly the most innovative part of the  paper. They were able to show that shifting the applied environmental threshold on the species pool can severely influence the inference which can be drawn from phylogenetic community patterns. This important contribution shows that we have to rethink a decade of analysing community phylogenetic patterns. Karger et al. (2016) did not mention phylogenies, but introduced explicit time frames for which species pools are defined. Although time only entered their equations via the rates of dispersal over time, a full integration into environmental and biotic factors could be achieved in a similar manner. The quantification of time-related processes such as exact dispersal rates, changes in environmental conditions and biotic interactions might pose the largest challenge in applying their approach in the future. However, it also opens up a wide range of possibilities to investigate the influence of time-related processes on the size and composition of species pools, which have so far been elusive.

Detecting biotic processes
Detecting biotic processes in ecological communities is at best challenging, as it usually requires detailed observational data for a large amount of species, or a long-term experimental setup. A method to detect such biotic processes without directly measuring them might therefore be a neat workaround of the time consuming exercise of directly measuring them (e.g. de Bello et al. 2012).  indirectly detected a signal of biotic processes in phylogenetic community patterns. By first accounting for environmental filtering, they constructed a null community in which the effects of environmental filtering are already accounted for. The subsequent analysis revealed patterns resembling those expected under competition. Although this procedure does not get around the common problem of inferring processes from patterns of phylogenetic relatedness, it does illustrate how species pools can be used to disentangle processes from each other in a comparably simple and practicable way.
The Karger et al. (2016) paper argues in a similar direction, although using a simulated approach based on a mechanistic niche model. By simulating differently nested species pools (a dispersal pool, a dispersal + environmentally filtered pool, and a dispersal + environment + biotically filtered pool), they show that increasing the constraint on species pools increasingly lets species pools reassemble realized assemblages. The remaining discrepancies between realized communities and species pools are then attributed to the processes not included in the species pool deline-ation (e.g. if the species pool includes dispersal and environment, then the remaining variance can be attributed to biotic processes). This is a line of argument, found within both papers, that is similar to the concept of 'dark diversity' (Mokany and Paini 2011, Pärtel et al. 2011a, 2011b, Ronk et al. 2015) -whatever remains in differences between realized and simulated community structure can be attributed to the processes not included in a species pool delineation.

Geographical patterns of community assembly processes
Similar to the concepts used to detect biotic processes in a single assemblage, both papers also show the geographical distribution of different processes. From  a geographical distribution of biotic processes can be inferred from the differences in phylogenetic clustering after accounting for environmental filtering. Increasing the strength of, for example, temperature filtering on the species pool (going from a 0 th quantile to the 99 th quantile) shows that biotic processes are not independent of the elevational distribution of the sampled assemblages (Fig. 2a). Karger et al. (2016) showed the geographical distribution of assembly processes by highlighting the difference between different filtered pools and realized community structure for a gridded dataset. They calculated the area under the receiver operating characteristic curve (AUC) to characterize where dispersal and environmentally filtered species pools of Ranunculaceae described realized assemblage structure to a large degree (Fig. 2b, orange). The remaining variance can be then attributed to processes not included in the species pool definition (arguably biotic processes) (Fig. 2b, blue), allowing, for the first time, the study of the geographical distribution of different assembly processes.

Concluding remarks
The recurring problem of avoiding arbitrary choices in species pool delineations can, to a large degree, be avoided by using a functional (de Bello et al. 2012) and probabilistic species pool (Ewald 2002, Karger et al. 2016).
However, the application of such an approach is mainly hindered by the availability of the data and therefore there are still few methods which can be applied. The uses of probabilistic species pools, such as the detection of biotic processes, their geographical distribution, and the possibility of creating time-dependent species pools, could easily outweigh the efforts needed for delineating probabilistic pools.
An important distinction is that, although both the approaches presented here are 'probabilistic', they differ in the way they are calculated; in many cases the difference between 'site-specific' and 'species-specific' approaches can lead to different probabilities of species pool membership. One needs to decide which approach fits better into a given study system. The two papers focused on herein represent a way out of the myriad arbitrary ways to delineate species pools. Although still difficult to implement, the currently increasing amount of species-specific data calls for increasing application of probabilistic frameworks instead of the still commonly used arbitrary delineations of species pools.

Figure 2.
Two examples of how the elevational and geographical distribution of community assembly processes can be inferred from probabilistic species pools. (a) Using a variety of thresholds (using quantiles) of the probabilities to include or exclude species from the pool, shows where biotic processes are operational. The investigated hummingbird assemblages in the lowlands below 1000 m elevation do not seem to be affected by biotic processes, while those above 1000 m become phylogenetically more even (lower NRI) when the environmental filtering is increased (black circles). (b) Delineating different probabilistic species pools for Ranunculaceae in Germany and comparing them with realized communities can indicate the areas in which the processes not included in the species pool delineations might be important for community assembly (indicated by low AUC values, p/a ~ P = present/absent ~ Probability). Abbreviations: Ψ = probabilistic species pool, process subscripts: DRT = dispersal based suitable dispersal pathways, DD = dispersal based on a fixed distance, ER = environmental suitability based on niche modelling, EB = environmental suitability based on Beals smoothing, x denotes combinations of factor groups.