Sample conditional age-at-length (CAAL) data from expected values of length proportions and expected values of age proportions (conditional on length) from the operating model (OM) and writes the samples to file for use by the estimation model (EM).
sample_calcomp(
dat_list,
exp_vals_list,
outfile = NULL,
fleets,
years,
Nsamp_lengths,
Nsamp_ages,
method = "simple_random",
ESS_lengths = NULL,
ESS_ages = NULL,
lcomps_sampled = FALSE,
...
)
A Stock Synthesis data list object as read in from
SS_readdat
.
Be sure to correctly specify which section of the data file you want
to work with when reading it in using the section
argument.
Where, section = 1
reads in the input values used to run the model
and section = 2
reads in the expected values generated given all the
input to the OM. section = 3
is not used within ss3sim, but this
section provides bootstrapped data sets that have been sampled internally
within SS.
This is a data list containing all expected values. It should not be modified by previous sampling functions to contain sampled data.
A character string specifying the file name to use
when writing the information to the disk. The string must include
the proper file extension. No file is written using the default value
of NULL
, which leads to increased speed because writing the
file takes time and computing resources.
*A vector of integers specifying which fleets to include.
The order of the fleets pertains to the input order of other arguments.
An entry of fleets=NULL
leads to zero samples for any fleet.
*A list the same length as fleets
giving the years as
numeric vectors. If no fleet collected samples, keep the value to
years=NULL
.
A numeric list of the same length as fleets. Either
single values or vectors of the same length as the number of years can be
passed through. Single values are repeated for all years. If no fleet
collected samples, specify Nsamp_lengths = NULL
. Specifically, for
sample_calcomp
, Nsamp_lengths
denotes the total number of
length samples for a given year and fleet across all length bins that can be
used to then sample the conditional age at length samples.
Nsamp_lengths
must be greater than or equal to Nsamp_ages
.
A numeric list of the same length as fleets. Either single
values or vectors of the same length as the number of years can be passed
through. Single values are repeated for all years. If no fleet collected
samples, specify Nsamp_ages = NULL
. Specifically, for
sample_calcomp
, Nsamp_ages
denotes the total number of
conditional age at length samples for a given year and fleet across all
length bins. Nsamp_ages
must be less than Nsamp_lengths
.
The method used to sample ages from the lengths. Options are "simple_random" and "length_stratified". In "simple_random" (the default option), the fish aged are randomly sampled from the age bins, so the number sampled in each age bin is not equal. In "length_stratified", an equal number of fish are aged from each length bin.
The final effective sample size (ESS) associated with the
simulated length data generated for conditional age at length samples. The
ESS is not used to generate the simulated data but can be used as an input
sample size in subsequent models that estimate population parameters or
status. The default, NULL, leads to the true (internally calculated)
effective sample size being used, which is Nsamp_lengths
for the
multinomial case. ESS_lengths
should be a numeric list of the same
length as fleets. Either single values or vectors of the same length as the
number of years can be passed through. Single values are repeated for all
years. Note that the dimensions of ESS_lengths must be compatible with the
dimensions of Nsample_lengths
.
The final effective sample size (ESS) associated with the
simulated conditional age at length data. The ESS is not used to generate
the simulated data but can be used as an input sample size in subsequent
models that estimate population parameters or status. The default, NULL,
leads to the true (internally calculated) effective sample size being used,
which is Nsamp_ages for the multinomial case. ESS_ages
should be
a numeric list of the same length as fleets. Either single values or vectors
of the same length as the number of years can be passed through. Single
values are repeated for all years. Note that the dimensions of ESS_lengths
must be compatible with the dimensions of Nsample_ages
. The input
value will be apportioned among the conditional age at length bins as the
Nsamp_ages
is and therefore can be a fractional value.
Have marginal length comps already been sampled and are
included in dat_list[["lencomp"]]
? If FALSE
, expected values are in
present in datlist[["lencomp"]]
.
Any argument you want to be a column in the new data frame of composition
data. All extra arguments should be named columns in data
.
Each argument needs to be a list of length length(fleets)
. Or, you can use a
single value that will be repeated for each combination of fleet, year, ...
in your data.
A modified .dat
file if !is.null(outfile)
. A list object
containing the modified .dat
file is returned invisibly.
There are many steps needed to sample CAAL data because
ages are not independent from lengths.
The data is located in the .dat
file alongside age compositions.
CAAL have the added complexity of one line per length bin.
Thus, each row represents the observed age distribution for
a length bin conditioned on the fish lengths that were observed in the length compositions.
The age distribution will be truncated for older or younger fish.
Often, many rows will be empty because no fish of that length bin were observed.
These empty rows are not needed in the .dat file.
The sampling process includes the following steps:
Lengths are sampled based on the desired number of lengths, $N$. $N$ is the maximum amount that could be aged.
Those lengths are binned to create a length distribution, i.e., numbers of fish in each length bin.
Ages are sampled from fish that contributed to the length distribution. Several strategies are possible for sampling ages from those fish
age all fish,
take random subset of fish independent of length bin, or
take a fixed number of fish from each length bin.
ss3sim can currently only handle randomly sampling ages from lengthed fish. Future versions could include the last option; please contact the developers if you are interested in helping facilitate this.
Note that the overall total sample size for all CAAL bins is specified by
the user for the given fleet and year in Nsamp_ages
.
These sample sizes and the expected values of age proportions
(conditional on length) are used to sample for realistic age proportions.
If all fish are aged,
then no resampling is performed.
If no fish are aged for a row of age proportions in conditional age at length data,
then that row is discarded.
If all fish are not aged,
then a new sample size must be drawn.
This new sample size must be less than or equal to the number of fish that were sampled for their length.
This new sample size is used to draw ages randomly from the expected values.
If we consider all rows for a fleet and year (one for each length bin),
then the sum of those will be the sample size for the CAAL data.
However, if the CAAL sample size is less than the length sample size,
We accomplish this in the code by
doing sampling without replacement for vectors of length bins equal to the number of fish in them.
This ensures realistic sampling.
If the option (3) above were implemented,
a different strategy would need to be implemented.
For instance,
if the user wants 10 fish from each length bin but only 5 fish were observed,
what to do?
A value of NULL for fleets indicates to delete the CAAL data but
not the marginal age data.
When Dirichlet sampling is used for length compositions,
the number of fish observed will be real-valued and not whole fish.
One cannot simply multiply by the length composition sample size to get whole numbers because
they are real and
rounding or truncating would be unsatisfactory.
Currently, the function simply draws a multinomial sample from the length compositions of specified size (Nsamp
).
However, this does not guarantee that fewer fish are aged than lengthed.
If you are specifying a small number of fish to age relative to length,
then this might be alright.
However, we discourage the use of Dirichlet length samples when using CAAL data as currently implemented.
Note that this function cannot handle all types of CAAL sampling. This function requires that there be a row of CAAL data for each length data bin (for each year and fleet that sampling is specified to be performed), where Lbin_lo and Lbin_hi are the same value. Note also that this sampling procedure represents simple random sampling for CAAL, where (1) lengths are sampled randomly, (2) fish are lengthed and placed into bins, and (3) a subset of lengthed fish are aged, where a constant proportion from each length bin are selected for aging. This does not represent length stratified sampling where a subset of lengthed fish are aged, and a constant number from each length bin is selected for aging, although these data could also be put into a Stock Synthesis model as CAAL.
Other sampling functions:
clean_data()
,
sample_agecomp()
,
sample_catch()
,
sample_discard()
,
sample_index()
,
sample_lcomp()
,
sample_mlacomp()
,
sample_wtatage()