RGRDC¶
reheatfunq.coverings
¶
Facilities to compute Random Regional R-Disk Coverings (RGRDCs). A RGRDC is a derived product of a global point data set (e.g. a global heat flow database). The covering consists of sequentially generated disks of a radius \(R\) randomly distributed over Earth under the constraint that
From the set of points within the disk, no two points are closer than the minimum distance \(d_\mathrm{min}\) from each other.
No data point is part of a previous disk.
The disk center is not contained within an optional exclusion polygon that represents a region of interest for local analysis.
There are more than a minimum number of points remaining in the disk.
The function random_global_R_disk_coverings()
computes RGRDCs. It
operates by iteratively drawing random disk centers on the sphere and
testing whether all conditions can be met. After a maximum number of disk
centers have been drawn, the algorithm terminates. The function is used in
the following notebooks:
jupyter/REHEATFUNQ/03-Gamma-Conjugate-Prior-Parameters.ipynb
jupyter/REHEATFUNQ/A2-Goodness-of-Fit_R_and-Mixture-Distributions.ipynb
jupyter/REHEATFUNQ/A6-Comparison-With-Other-Distributions.ipynb
The function conforming_data_selection()
can ensure the
\(d_\mathrm{min}\) criterion within a set of heat flow measurements. It
proceeds to resolve conflicts to this criterion by iteratively dropping a
random data point of a violating data point pair until no more data point
pairs violate the criterion.
The function bootstrap_data_selection()
creates a number of such
conforming data selections using random decisions for each conflict.
- random_global_R_disk_coverings(R, min_points, hf, buffered_poly_xy, proj_str, N=10000, MAX_DRAW=100000, dmin=0.0, seed=982981, used_points=None, a=6378137.0)¶
Uses rejection sampling to draw a number of exclusive regional distributions.
- Parameters:
R (float) – Radius \(R\) of the RGRDC (in m).
min_points (int) – Minimum number of points within a distribution after all other conditions are met. If the number of data points is less, the proposed disk is rejected.
hf (array_like) – Array of heat flow data points of shape
(N,3)
, whereN
is the number of data points. The second dimension must contain a tuple \((q_i, \lambda_i, \phi_i)\) for each data point, where \(q_i\) is the heat flow, \(\lambda_i\) the longitude in degrees, and \(\phi_i\) the latitude in degrees.buffered_poly_xy (list[array_like]) – List of polygons which will reject disks if their centers fall within one of the polygons. Each element of the list must be a
(M[i],2)
-shaped numpy array where \(M[i]\) is the number of points composing the i th polygon and the second dimension iterates the coordinates \(x\) and \(y\). The coordinates are interpreted within the coordinate system described by theproj_str
parameter.proj_str (str) – A PROJ string describing a projected coordinate system within which the polygons supplied in the
buffered_poly_xy
parameter are interpreted.N (int, optional) – Target number of accepted disks. Might not be reached but can lead to an early exit. The default is high enough that likely
MAX_DRAW
is saturated before.MAX_DRAW (int, optional) – Maximum number of disk centers to generate. Might not be reached if
N
is small.dmin (float, optional) – Minimum inter-point distance for the conforming selection criterion (in m).
seed (int, optional) – Seed passed to
np.random.default_rng()
.used_points (list[int], optional) – A list of data point indices that can be marked as used a priori.
a (float, optional) – Large half axis of the sphere used. This parameter is used for a
scipy.spatial.KDTree
-based fast data point query before computing geodesic distances between data points.
- Returns:
valid_points (list) – A list of \(v\) centroids of the accepted disks.
used_points (set) – A set of all points which are part of an accepted heat flow distribution.
distributions (list) – The list of \(v\) distributions, each a one-dimensional numpy array of sorted heat flow values.
lolas (list) – The list of data point coordinates corresponding to the heat flow data within
distributions
. Each is a two-dimensional numpy array in which the second dimension iterates a tuple \((\lambda,\phi)\) of geographic coordinates.distribution_indices (list) – The list of index lists of the data points used in the
distributions
. Each is a one-dimensional array of integer indices into the input data set that compose the corresponding entry ofdistributions
. The indicesdistribution_indices[i]
are generally not in the same order as the heat flow values indistributions[i]
.
- conforming_data_selection(const double[:, :] xy, double dmin_m, rng=128)¶
This methods applies the spatial data filtering technique described in the paper, sub-sampling the data so that the minimum distance remains above dmin_m.
The selection process for non-conforming data pairs is stochastic but reproducible with identical random number generator rng.
- Parameters:
xy (array_like) –
(N,2)
array of data points in a projected Euclidean coordinate system (in m).dmin_m (float) – Minimum inter-point distance for the conforming selection criterion (in m).
rng (int | numpy.random.Generator) – A seed or random generator to draw from for reproducibility.
- Returns:
mask – A mask filtering out non-conforming data points.
- Return type:
numpy.ndarray
- bootstrap_data_selection(const double[:, ::1] xy, double dmin_m, size_t B, rng=127)¶
Computes a set of bootstrap samples of heat flow data points conforming to the data selection criterion.
- Parameters:
xy (array_like) –
(N,2)
array of data points in a projected Euclidean coordinate system (in m).dmin_m (float) – Minimum inter-point distance for the conforming selection criterion (in m).
B (int) – Number of bootstrap samples to draw.
rng (int | numpy.random.Generator) – A seed or random generator to draw from for reproducibility.
- Returns:
subselections – A list of index arrays. Each index array lists the indices of a conforming data selection within the data array
xy
. The number of index arrays is at mostB
. Duplicate data selections are returned only once.- Return type:
list