Sample conditional age-at-length data

Sample conditional age-at-length (CAAL) data from expected values of length proportions and expected values of age proportions (conditional on length) from the operating model (OM) and writes the samples to file for use by the estimation model (EM).

sample_calcomp(
  dat_list,
  exp_vals_list,
  outfile = NULL,
  fleets,
  years,
  Nsamp_lengths,
  Nsamp_ages,
  method = "simple_random",
  ESS_lengths = NULL,
  ESS_ages = NULL,
  lcomps_sampled = FALSE,
  ...
)

Arguments

dat_list: A Stock Synthesis data list object as read in from SS_readdat. Be sure to correctly specify which section of the data file you want to work with when reading it in using the section argument. Where, section = 1 reads in the input values used to run the model and section = 2 reads in the expected values generated given all the input to the OM. section = 3 is not used within ss3sim, but this section provides bootstrapped data sets that have been sampled internally within SS.
exp_vals_list: This is a data list containing all expected values. It should not be modified by previous sampling functions to contain sampled data.
outfile: A character string specifying the file name to use when writing the information to the disk. The string must include the proper file extension. No file is written using the default value of NULL, which leads to increased speed because writing the file takes time and computing resources.
fleets: *A vector of integers specifying which fleets to include. The order of the fleets pertains to the input order of other arguments. An entry of fleets=NULL leads to zero samples for any fleet.
years: *A list the same length as fleets giving the years as numeric vectors. If no fleet collected samples, keep the value to years=NULL.
Nsamp_lengths: A numeric list of the same length as fleets. Either single values or vectors of the same length as the number of years can be passed through. Single values are repeated for all years. If no fleet collected samples, specify Nsamp_lengths = NULL. Specifically, for sample_calcomp, Nsamp_lengths denotes the total number of length samples for a given year and fleet across all length bins that can be used to then sample the conditional age at length samples. Nsamp_lengths must be greater than or equal to Nsamp_ages.
Nsamp_ages: A numeric list of the same length as fleets. Either single values or vectors of the same length as the number of years can be passed through. Single values are repeated for all years. If no fleet collected samples, specify Nsamp_ages = NULL. Specifically, for sample_calcomp, Nsamp_ages denotes the total number of conditional age at length samples for a given year and fleet across all length bins. Nsamp_ages must be less than Nsamp_lengths.
method: The method used to sample ages from the lengths. Options are "simple_random" and "length_stratified". In "simple_random" (the default option), the fish aged are randomly sampled from the age bins, so the number sampled in each age bin is not equal. In "length_stratified", an equal number of fish are aged from each length bin.
ESS_lengths: The final effective sample size (ESS) associated with the simulated length data generated for conditional age at length samples. The ESS is not used to generate the simulated data but can be used as an input sample size in subsequent models that estimate population parameters or status. The default, NULL, leads to the true (internally calculated) effective sample size being used, which is Nsamp_lengths for the multinomial case. ESS_lengths should be a numeric list of the same length as fleets. Either single values or vectors of the same length as the number of years can be passed through. Single values are repeated for all years. Note that the dimensions of ESS_lengths must be compatible with the dimensions of Nsample_lengths.
ESS_ages: The final effective sample size (ESS) associated with the simulated conditional age at length data. The ESS is not used to generate the simulated data but can be used as an input sample size in subsequent models that estimate population parameters or status. The default, NULL, leads to the true (internally calculated) effective sample size being used, which is Nsamp_ages for the multinomial case. ESS_ages should be a numeric list of the same length as fleets. Either single values or vectors of the same length as the number of years can be passed through. Single values are repeated for all years. Note that the dimensions of ESS_lengths must be compatible with the dimensions of Nsample_ages. The input value will be apportioned among the conditional age at length bins as the Nsamp_ages is and therefore can be a fractional value.
lcomps_sampled: Have marginal length comps already been sampled and are included in dat_list[["lencomp"]]? If FALSE, expected values are in present in datlist[["lencomp"]].
...: Any argument you want to be a column in the new data frame of composition data. All extra arguments should be named columns in data. Each argument needs to be a list of length length(fleets). Or, you can use a single value that will be repeated for each combination of fleet, year, ... in your data.

Value

A modified .dat file if !is.null(outfile). A list object containing the modified .dat file is returned invisibly.

Details

There are many steps needed to sample CAAL data because ages are not independent from lengths. The data is located in the .dat file alongside age compositions. CAAL have the added complexity of one line per length bin. Thus, each row represents the observed age distribution for a length bin conditioned on the fish lengths that were observed in the length compositions. The age distribution will be truncated for older or younger fish. Often, many rows will be empty because no fish of that length bin were observed. These empty rows are not needed in the .dat file.

The sampling process includes the following steps:

Lengths are sampled based on the desired number of lengths, $N$. $N$ is the maximum amount that could be aged.
Those lengths are binned to create a length distribution, i.e., numbers of fish in each length bin.
Ages are sampled from fish that contributed to the length distribution. Several strategies are possible for sampling ages from those fish
age all fish,
take random subset of fish independent of length bin, or
take a fixed number of fish from each length bin.

ss3sim can currently only handle randomly sampling ages from lengthed fish. Future versions could include the last option; please contact the developers if you are interested in helping facilitate this.

Note that the overall total sample size for all CAAL bins is specified by the user for the given fleet and year in Nsamp_ages. These sample sizes and the expected values of age proportions (conditional on length) are used to sample for realistic age proportions. If all fish are aged, then no resampling is performed. If no fish are aged for a row of age proportions in conditional age at length data, then that row is discarded. If all fish are not aged, then a new sample size must be drawn. This new sample size must be less than or equal to the number of fish that were sampled for their length. This new sample size is used to draw ages randomly from the expected values. If we consider all rows for a fleet and year (one for each length bin), then the sum of those will be the sample size for the CAAL data. However, if the CAAL sample size is less than the length sample size, We accomplish this in the code by doing sampling without replacement for vectors of length bins equal to the number of fish in them. This ensures realistic sampling. If the option (3) above were implemented, a different strategy would need to be implemented. For instance, if the user wants 10 fish from each length bin but only 5 fish were observed, what to do? A value of NULL for fleets indicates to delete the CAAL data but not the marginal age data.

When Dirichlet sampling is used for length compositions, the number of fish observed will be real-valued and not whole fish. One cannot simply multiply by the length composition sample size to get whole numbers because they are real and rounding or truncating would be unsatisfactory. Currently, the function simply draws a multinomial sample from the length compositions of specified size (Nsamp). However, this does not guarantee that fewer fish are aged than lengthed. If you are specifying a small number of fish to age relative to length, then this might be alright. However, we discourage the use of Dirichlet length samples when using CAAL data as currently implemented.

Note that this function cannot handle all types of CAAL sampling. This function requires that there be a row of CAAL data for each length data bin (for each year and fleet that sampling is specified to be performed), where Lbin_lo and Lbin_hi are the same value. Note also that this sampling procedure represents simple random sampling for CAAL, where (1) lengths are sampled randomly, (2) fish are lengthed and placed into bins, and (3) a subset of lengthed fish are aged, where a constant proportion from each length bin are selected for aging. This does not represent length stratified sampling where a subset of lengthed fish are aged, and a constant number from each length bin is selected for aging, although these data could also be put into a Stock Synthesis model as CAAL.

Author

Cole Monnahan, Kotaro Ono

Arguments

Value

Details

See also

Author