A Bayesian hierarchical model of postlarval delta smelt entrainment: integrating transport, length composition, and sampling efficiency in estimates of loss

Hydrodynamic models have been used to estimate rates of ichthyoplankton transport across marine and estuarine environments and subsequent geographic isolation of a portion of the population (i.e., entrainment). Combining simulated data from hydrodynamic models with data from fish populations can provide more information, including estimates of regional abundance. Entrainment of postlarval delta smelt (Hypomesus transpacificus), a threatened species endemic to California’s Sacramento–San Joaquin Delta, caused by water export operations, was modeled using a Bayesian hierarchical model. The model was fit using data spanning years 1995–2015 from multiple sources: hydrodynamic particle tracking, fish length composition, mark–recapture, and count data from entrainment monitoring. Estimates of the entrainment of postlarval delta smelt ranged from 10 (SD = 23) in May 2006 to 561 791 (SD = 246 423) in May 2002. A simulation study indicated that all model parameters were estimable, but errors in transport data led to biased estimates of entrainment. Using only single data sources rather than integration through hierarchical modeling would have underestimated uncertainty in entrainment estimates or resulted in bias if transport, survival, or sampling efficiency were unaccounted for.

Entrainment is a physical process whereby groups are advected and spatially redistributed through the 37 action of moving water, resulting in geographic isolation of the entrained portion of the population.

38
Hydrodynamic variation can have profound effects on entrainment of ichthyoplankton, and models that 39 do not account for these hydrodynamics may result in severely biased estimates of entrainment (Blumberg 40 et al. 2004;White et al. 2010). Water movement may be natural (Lough and Manning 2001) or the result 41 of water diversion for human use, such as power generation (Kelso and Milburn 1979) or water extraction D r a f t 60 recapture studies often yield information about survival and sampling efficiency. However, such data are 61 rarely integrated with transport data in models of entrainment.

63
Integrated data analysis is a useful technique for combining information from multiple sources to allow 64 estimation of parameters not identifiable from a single data source (Besbeas et al. 2002), and such 65 analyses are commonly used to model fish population dynamics and assess the status of fish stocks 66 (Methot and Wetzel 2013). Integration of particle tracking data with a population dynamics model of 67 abundance, however, is uncommon. Most studies using particle tracking data to model entrainment do not 68 attempt to estimate abundance but are designed to validate hypothesized interactions between source 69 location, advection, and behavior (Miller 2007); thus, particle tracking data are compared to monitoring 70 data in correlative analyses rather than integrated analyses within a single model (Baumann et al. 2006;71 Hinckley et al. 2016). Hierarchical modeling is one approach to integrate disparate data types, through use 72 of a multi-level structure (Rochette et al. 2013). Importantly, a hierarchy can facilitate greater statistical 73 power through reduced parameterization and partitioning of errors. A hierarchical approach to modeling 74 abundance of entrained populations can combine independent forms of data in order to model fish 75 population and observation dynamics, such as transport probabilities, survival, length or age composition, 76 and sampling efficiency.

78
Previous applications of particle tracking data have generally sought to model spatial structure in two or 79 three dimensions, because the initial distribution, or origin, of fish is a critical factor (Heimbuch et al. 84 are very small relative to the volume of the system, and the effect of water diversion on system 85 hydrodynamics is minimal. Conversely, water extraction from smaller riverine systems for irrigation may D r a f t 86 represent a much higher volume relative to the system and have a greater impact on hydrodynamics, 87 potentially pulling ichthyoplankton along a simplified linear route. For instance, as much as 35% of 88 inflow to the Sacramento-San Joaquin Delta is exported during April-June (CSWR 1999), and exports 89 rates are often sufficient to reverse the net flows in the two rivers conveying water to the pumps, leading 90 to entrainment events. If the probability of transport from a region of low escapement to sampling 91 locations, e.g., the water diversions, is known, then a simplified 1-dimensional transport model may

272
Sub-processes were modeled as a sequence of probabilities, the product of which was the probability that 273 a fish passing through the source region was observed at the fish facilities. Our primary goal was to 274 estimate the number of postlarval delta smelt at risk of entrainment, either because they entered the 275 facility intake channels or because they were in the South Delta and therefore subject to higher risk of 276 mortality or isolation from the population. A general overview of the model for each sub-process is 277 below, but detailed mathematical descriptions of each sub-process model may be found in Appendix B.

279
Sub-process 1: Transport of fish to facilities. Transport probabilities in year t and month v from TR qtvs 280 the source region q to one of four sinks s (Table 1), or final particle locations (labeled SWP, CVP, LRZ,

281
or IRZ), were estimated from observed particle tracking data. We used a Dirichlet-multinomial model to 282 describe the transport sub-process, where the Dirichlet component accounted for state process variability 283 and the multinomial component accounted for observation error in the particle tracking model data. As 284 previously stated, particle tracking data were generated for three potential source regions (Fig. 1), but 285 only one source region was selected to represent delta smelt transport in a single month. Separate models 286 were fit to each source region's particle tracking data, but only a subset of the probabilities estimated for 287 each source region were applied to delta smelt.

289
Sub-process 2: Survival. During transport from the source to sink regions and before sampling at fish 290 facilities, fish were exposed to predation, starvation and thermally-induced mortality, also known as pre-291 screen loss (Gingras 1997;Castillo et al. 2012). No estimates of South Delta mortality exist, but we D r a f t 292 hypothesized that mortality during transport to the SWP and CVP was a function of transport time.

293
Longer transport times resulted in greater exposure to predation and higher mortality. Mortality of fish 294 transported to the LRZ was not modeled, and all fish passing through the source region and transported to 295 the IRZ were assumed to be removed from the population. Survival probability for the source SV1 qtv 296 region selected as the source for a year-month was applied to all fish arriving at the SWP and CVP.

298
Additional mortality is experienced by fish going to the SWP. Before entering the SWP intake channel,

299
fish cross a large body of water, Clifton Court Forebay, where they are vulnerable to predation. SV2 tv1 300 was the probability of a delta smelt surviving the journey from the Forebay radial gates (where the 301 forebay connects to Old River) to the SWP intake channel in a given year and month. Unlike the other 302 sub-processes in the entrainment model, this sub-process was not directly observed or informed by any 303 data source. Because CVP does not have a forebay, we assumed fish transported to CVP were not subject 304 to this source of mortality and set SV2 tv2 = 1.

306
Sub-process 3: Population length structure. We modeled the length structure, or length frequency 307 distribution, of the postlarval delta smelt population so we could use this information in the next sub-308 process, which addresses size selectivity of the louvers. As in the transport sub-process, we used a D r a f t : Sampling efficiency at fish facilities. The sampling efficiency sub-process handled 317 different aspects of how fish facility gear captured entrained fish and was divided into two sampling 318 probabilities. The fish facility louvers elicit an avoidance behavior that cause entrained fish to move into 319 bypass channels that divert them into holding tanks at the fish facilities. Louvers are spaced 320 approximately 20 mm apart. Delta smelt can easily pass through them. We assumed the louvers were size 321 selective, and that the probability of a delta smelt being diverted increases as fish size increases.

322
Conditional probabilities that a fish in the length sample from the fish facilities is of length l (denoted

323
) depended on population length structure and the length-based selectivity of the louvers .

401
For the simple multinomial model, which assumed known total recoveries at SWP and CVP and known 402 values, maximum likelihood estimates of were unbiased, as expected, and were relatively precise TR S 403 for the given sets of values (Table 2). Assuming that transport was the only source of variation in the 404 number of fish arriving at SWP and CVP, estimability of seems a reasonable assumption. This S 405 conclusion was reinforced by the second set of more complicated simulations that is discussed next.

407
The second set of simulations explored estimability of entrainment transport, survival, population length, 408 sampling efficiency and selectivity parameters in the more complex Bayesian hierarchical model. All

409
parameters, including the number of entrained fish, appeared to be estimable using the integrated data 410 analysis and Bayesian hierarchical model described here. We concluded that parameters were estimable if 411 z-scores were between -2 and 2, posterior contraction values were greater than 0, and the 95% coverage 412 of the true value was greater than 0.80 (Table 3) 422 limited information to estimate these values, and posterior distributions for these parameters should be 423 checked to be sure they moved away from the prior distribution.

425
Under the level of error we simulated in transport data, coverage of most parameters was similar to when 426 no additional error was simulated, but some posterior means of transport parameters appeared to be 427 negatively biased (Table 3). The distribution of z-scores for transport parameters included more negative 428 values when additional transport error was simulated; negative bias in transport parameters was sufficient

439
Periods with the lowest entrainment estimates were associated with 0s in counts at fish facilities.  year effect on IRZ transport for the Lower Old and Middle rivers source region was not significantly 457 different from 0. Lack of significance was inferred from 95% credible interval coverage of posterior 458 distributions of 0. The model appeared to fit SWP, CVP, and LRZ particle tracking data with little error 459 ( Fig. D6), but model fit to IRZ particle tracking data was worse in May and June than in April. In 460 contrast to SWP and CVP data, IRZ transport data were not anchored by observed counts of fish.

461
Bayesian P-values for particle tracking data were between 0.49 and 0.71 for all sink-month combinations 462 (Table 5). Joint posterior correlation of transport parameters with other model parameters was minimal, 463 but some correlation was evident between LRZ and IRZ intercept and slope parameters (R 2 < 0.58).

465
Effective multinomial samples size for the model using particle tracking data converged on an estimate of 466 225, using the iterative method of McAllister and Ianelli (1997). All analyses were therefore performed D r a f t 467 with fixed at 225. Posterior means and coefficients of variation of were sensitive to the value of PT E

468
. When was set equal to 4,000, posterior means decreased 18 to 24% on average (  (Table 6). However, the magnitude of the decrease declined from April to June, which makes 485 sense because the estimated values of increase over this period. did not appear to be sensitive to the louver selectivity model ( ; Table 6).

510
Simulation testing indicated that biases could arise when all transport, survival and efficiency errors were 511 simultaneous estimated, at least when survival data were missing, which is the current situation for 512 postlarval delta smelt. As was stated previously, the mark-recapture model was fit externally, and

545
The inclusion of multiple data sets, in an integrated analysis, meant that information about the abundance 546 of entrained fish came from multiple sources, and integration of mechanistic transport data from 547 hydrodynamics modeling, direct information about sampling efficiency from mark-recapture studies, and 548 observed counts at fish facilities (sometimes viewed as an index of entrainment) leveraged those data 549 sources against each other. The transport data from particle tracking, for example, provided a measure of 550 movement not contained in other data sets, while counts from SWP and CVP fish facilities adjusted the 551 particle tracking-based transport probabilities and scaled the entrainment estimates. In a sense, the fitted 552 model may be viewed as a calibration of the model provided by particle tracking, and lack of fit, such as 553 the larger May and June particle tracking residuals for the Low and Indirect Risk Zones, may represent 554 instances where particle tracking data was not representative of fish transport. Goodness of fit testing on 555 multiple data sets, within the integrated framework provided a framework to assess those deficiencies and 556 update entrainment estimates should new information become available. Even with hierarchical modeling, 557 some ambiguity remains for delta smelt entrainment because some critical processes were poorly 558 informed by available data (e.g. no survival data). While these data insufficiencies compromised delta 559 smelt estimates, they also demonstrated the benefits of integrating the estimation of several model 560 components and provided a basis to prioritize future data collection.

562
Modeling the spatial structure of entrainment

655
Using a time series approach and more highly resolved transport information, a spatially-explicit model (a 656 special case of the first spatial structure setting mentioned above) could estimate spatially stratified 657 abundance for the entire population, then apply strata-specific transport rates to sampling locations. The

658
drawback to a spatially-explicit approach is that starting population distributions may be highly uncertain from which unknown behaviors may modify transport over distances of more than 80 km during the 664 process of entrainment. In the single-region approach we applied, transport assumptions were applied 665 over shorter distances of up to 35 km during low outflow and less than 15 km at high outflow.

667
An alternative for modeling transport using particle tracking data would make use of information about here. The residual pattern of more negative CVP residuals in June suggested a conflict between counts 702 from fish facilities and transport data provided by particle tracking. We expected that as fish progressed to 703 later lifestages, behavioral development would decouple observed transport in counts at fish facilities 704 from hydrodynamics and observed transport in particle tracking data. Further, simulation testing indicated 705 that the precision of entrainment estimates in the Bayesian hierarchical framework were sensitive to errors 706 in transport data; the estimates were only as good as the information provided by particle tracking data.

707
The high standard deviation associated with modeled postlarval delta smelt survival may be symptomatic   elevated mortality or what that elevated mortality may be. Although particle tracking information 727 accounts for variation in transport due to flow, exports, and tidal cycles, information was not retained 728 about the length of time particle were in the South Delta before returning to the Low Risk Zone.

730
Mark-recapture studies could also help to refine the model of length-based louver selectivity. Sensitivity 731 analyses, however, suggested that delta smelt entrainment estimates were not sensitive to selectivity. Low

756
Entrainment of ichthyoplankton through aquatic systems is a complex function of system hydrodynamics, 757 but sophisticated biophysical models coupling hydrodynamics and particle tracking can provide 758 information about transport rates. Bayesian hierarchical modeling provides a useful tool for integrating 759 models for a sequence of events with disparate datasets that partially inform dynamics of the sequence.

760
With these tools and admittedly strong assumptions about how entrained fish were routed through the 761 system, we demonstrated the possibility of estimating entrainment as the regional abundance of a pelagic 762 postlarval fish while accounting for complex but common sources of variation.

764
While past models of entrainment have focused on more complex transport dynamics in marine and 765 estuarine systems, water extraction from a riverine system with more or less linear flow dynamics 766 presents opportunities to make simplifying assumptions about the dimensionality of transport. It is 767 particularly convenient to assume that entrainment follows a linear route through a single region,   D r a f t  D r a f t D r a f t  D r a f t

Sacramento-San Joaquin Delta and San Francisco Estuary
D r a f t  D r a f t Table A5. Mark-recapture data reported by Sutphin and Svoboda (2016)  D r a f t Intercept and slope were assumed to be similar among months and source regions, for the same sink; TR therefore, all intercept and slope were modeled as normally distributed random effects of a single TR sink-specific intercept and slope.
Let represent the number of particles in the particle tracking model that were released in source m PT qtvs region q in year t and month that reached sink by the end of the month. We assumed these counts followed a multinomial distribution with total effective number of particles and probability vector PT TR qtv· (B4) .

PT qtv·~M ultinomial( PT , TR qtv· )
The number of particles released in a source region in the particle tracking model (4,000) was an arbitrarily large number. As multinomial variance scales with sample size, the effect of particle tracking data on parameter estimates could be made arbitrarily large. To remove this arbitrary consequence, we derived a single effective multinomial sample size using the method described by McAllister and PT Ianelli (1997). for the source region selected as the source for a year-month was applied to all fish arriving at the SV1 qtv SWP and CVP.
The probability of a delta smelt surviving the journey from the Forebay radial gates (where the forebay connects to Old River) to the SWP intake channel in a given year and month was treated as a logit-

SV2 tv1
normal random variable: thus, the survival probability increased or decreased with time (and age). CVP does not have a forebay, so set was set to 1.

SV2 tv2
Sub-process 3: Population length structure. Let represent the population length frequency for length tv class l. We modeled these quantities using a Dirichlet random variable with month-length specific parameters . For l in the set (1, 2, … 5), let the vector vl (B7) .

LN tv·~D irichlet( LN v· )
By using this structure, we assumed that length distributions within the same month were similar across years.
D r a f t We modeled the vector of adjusted counts with a multinomial distribution 20MM tv· (B8) .
20MM tv·~M ultinomial( 20MM tv , LN tv· ) We adjusted the length data from the 20mm Survey to account for gear selectivity (Mitchell et al. 2019).
Gear selectivity uncertainty was found to be a relatively minor source of variability in 20mm Survey data For the effective sample sizes we used the total number of tows, conducted by the 20mm Survey 20MM tv in which the length data were collected. We scaled the adjusted counts so they summed to .

20MM tv
Length samples of fish populations may exhibit overdispersion due to the tendency of fish to aggregate by length or age. The standard approach to address this overdispersion when estimating population proportion at length, from multinomially distributed length samples, is to set the effective multinomial sample size to the number of tows (McAllister and Ianelli 1997). Note that this approach was different from the approach used to derive , for which information such as number of tows does not exist.

PT
Sub-process 4: Sampling efficiency at fish facilities. We first estimated selectivity probability-at-length with fish facility length data using a logistic function with an asymptotic maximum value of one.

SEL
Given that the louvers are similar for the SWP and CVP, we used the same length-based louver selectivity model for both facilities, but estimated facility-specific maximum efficiencies. Let represent the D r a f t number of delta smelt in length class l that were diverted and measured at fish facilities. We assumed these counts followed a multinomial distribution with effective sample size and probability vector We used a logistic function to describe louver selectivity: where l takes the values 1, 2, 3, 4, and 5 (representing the five length bins), controls the steepness of 0 the curve, and is the l value at which = 0.5. We assumed fish in the fifth length bin had the 1 SEL highest selectivity probability relative to the other bins (i.e., ), which allowed us to solve for SEL 5 = 0.999 analytically: . The parameter , was estimated during model fitting.
We defined the length-specific probabilities of fish being diverted to the fish facilities, or overall sampling efficiency, of fish facility f for diverting length-l delta smelt as where the fraction on the right hand side is a facility-specific scaling factor between 0 and 1 and were maximum sampling efficiencies. We estimated year-month-facility specific using a Bayesian model ftv fit to the mark-recapture experiment data sets. These models were fit externally to the hierarchical model

Simulation model and data
In the first set of simulations, independence was assumed among the individual fish comprising , and D r a f t Given known values for and , is clearly estimable, e.g., is a method of moments estimate. and posterior contraction comparing the prior to posterior standard deviation D r a f t (B18) . tv = 1 -sd(posterior S tv )/sd(prior S tv ) Additional simulations focused on the effect of errors in the particle tracking model based probabilities of movement from the source region to the four sinks on posterior distributions of key parameters. Particle tracking data were assumed to represent observations of the transport process, with some unknown level of additional noise induced by the disconnect between simulated hydrodynamics and true delta smelt transport rates. To assess model performance with noisy transport information, two sets of simulations were performed, one set in which new particle tracking data were simulated from true transport rates and one set in which particle tracking data were simulated from true transport parameters, times random lognormal error. That is, Eq. 1 was modified to (B19) , TR qtv·~D irichlet( TR qtv· × tv· ) where error was normally distributed with mean 0 and standard deviation 0.3.

Model fitting and diagnostics
While a weakly informative uniform prior distribution was derived for , uninformative priors were used for all other parameters. Dirichlet parameters α were given Gamma(0.01,0.01) priors, and all logistic regression priors β, η, and γ were drawn from Normal(0, ) distributions. A Normal(0, ) 2 6 2 6 distribution induced the desired Uniform(0,1) distribution on inverse logistic transformation. Hyperparameters for the linear models of LRZ and IRZ transport probability (Eq. C1 and C2) were assigned Normal(0, ) priors. Penalized complexity prior distributions, Exponential(4.6), were assigned to all 20 D r a f t Gelman and Rubin's diagnostic (Gelman and Rubin 1992). Model convergence was assumed if trace plots showed that all chains were producing samples of parameters with similar posterior distributions that did not shift with additional samples and if Gelman and Rubin's statistic was less than 1.05 for all parameters.
Correlation between parameters was assessed by calculating coefficients of determination (R 2 ) for all pairwise combinations of parameter posteriors. Residual plots of the fish facility counts, population length structure, and fish facility sampling efficiency models were explored for evidence of model fit and bias.
A graphical posterior predictive check was performed and Bayesian P-values were calculated for each data component (PT,m 20mm ,m FF ,and obs). Each data component was replicated from the fitted model, and Bayesian P-values were based on a comparison of goodness of fit to replicated and observed data.