RGRDC¶

`reheatfunq.coverings`¶

Facilities to compute Random Regional R-Disk Coverings (RGRDCs). A RGRDC is a derived product of a global point data set (e.g. a global heat flow database). The covering consists of sequentially generated disks of a radius \(R\) randomly distributed over Earth under the constraint that

From the set of points within the disk, no two points are closer than the minimum distance \(d_\mathrm{min}\) from each other.
No data point is part of a previous disk.
The disk center is not contained within an optional exclusion polygon that represents a region of interest for local analysis.
There are more than a minimum number of points remaining in the disk.

The function random_global_R_disk_coverings() computes RGRDCs. It operates by iteratively drawing random disk centers on the sphere and testing whether all conditions can be met. After a maximum number of disk centers have been drawn, the algorithm terminates. The function is used in the following notebooks:

The function conforming_data_selection() can ensure the \(d_\mathrm{min}\) criterion within a set of heat flow measurements. It proceeds to resolve conflicts to this criterion by iteratively dropping a random data point of a violating data point pair until no more data point pairs violate the criterion.

The function bootstrap_data_selection() creates a number of such conforming data selections using random decisions for each conflict.

random_global_R_disk_coverings(R, min_points, hf, buffered_poly_xy, proj_str, N=10000, MAX_DRAW=100000, dmin=0.0, seed=982981, used_points=None, a=6378137.0)¶

Uses rejection sampling to draw a number of exclusive regional distributions.

Parameters:

R (float) – Radius \(R\) of the RGRDC (in m).
min_points (int) – Minimum number of points within a distribution after all other conditions are met. If the number of data points is less, the proposed disk is rejected.
hf (array_like) – Array of heat flow data points of shape (N,3), where N is the number of data points. The second dimension must contain a tuple \((q_i, \lambda_i, \phi_i)\) for each data point, where \(q_i\) is the heat flow, \(\lambda_i\) the longitude in degrees, and \(\phi_i\) the latitude in degrees.
buffered_poly_xy (list[array_like]) – List of polygons which will reject disks if their centers fall within one of the polygons. Each element of the list must be a (M[i],2)-shaped numpy array where \(M[i]\) is the number of points composing the i th polygon and the second dimension iterates the coordinates \(x\) and \(y\). The coordinates are interpreted within the coordinate system described by the proj_str parameter.
proj_str (str) – A PROJ string describing a projected coordinate system within which the polygons supplied in the buffered_poly_xy parameter are interpreted.
N (int, optional) – Target number of accepted disks. Might not be reached but can lead to an early exit. The default is high enough that likely MAX_DRAW is saturated before.
MAX_DRAW (int, optional) – Maximum number of disk centers to generate. Might not be reached if N is small.
dmin (float, optional) – Minimum inter-point distance for the conforming selection criterion (in m).
seed (int, optional) – Seed passed to np.random.default_rng().
used_points (list[int], optional) – A list of data point indices that can be marked as used a priori.
a (float, optional) – Large half axis of the sphere used. This parameter is used for a scipy.spatial.KDTree-based fast data point query before computing geodesic distances between data points.

Returns:

valid_points (list) – A list of \(v\) centroids of the accepted disks.
used_points (set) – A set of all points which are part of an accepted heat flow distribution.
distributions (list) – The list of \(v\) distributions, each a one-dimensional numpy array of sorted heat flow values.
lolas (list) – The list of data point coordinates corresponding to the heat flow data within distributions. Each is a two-dimensional numpy array in which the second dimension iterates a tuple \((\lambda,\phi)\) of geographic coordinates.
distribution_indices (list) – The list of index lists of the data points used in the distributions. Each is a one-dimensional array of integer indices into the input data set that compose the corresponding entry of distributions. The indices distribution_indices[i] are generally not in the same order as the heat flow values in distributions[i].

conforming_data_selection(const double[:, :] xy, double dmin_m, rng=128)¶

This methods applies the spatial data filtering technique described in the paper, sub-sampling the data so that the minimum distance remains above dmin_m.

The selection process for non-conforming data pairs is stochastic but reproducible with identical random number generator rng.

Parameters:

xy (array_like) – (N,2) array of data points in a projected Euclidean coordinate system (in m).
dmin_m (float) – Minimum inter-point distance for the conforming selection criterion (in m).
rng (int | numpy.random.Generator) – A seed or random generator to draw from for reproducibility.

Returns:

mask – A mask filtering out non-conforming data points.

Return type:

numpy.ndarray

bootstrap_data_selection(const double[:, ::1] xy, double dmin_m, size_t B, rng=127)¶

Computes a set of bootstrap samples of heat flow data points conforming to the data selection criterion.

Parameters:

xy (array_like) – (N,2) array of data points in a projected Euclidean coordinate system (in m).
dmin_m (float) – Minimum inter-point distance for the conforming selection criterion (in m).
B (int) – Number of bootstrap samples to draw.
rng (int | numpy.random.Generator) – A seed or random generator to draw from for reproducibility.

Returns:

subselections – A list of index arrays. Each index array lists the indices of a conforming data selection within the data array xy. The number of index arrays is at most B. Duplicate data selections are returned only once.

Return type:

list

RGRDC¶

`reheatfunq.coverings`¶

REHEATFUNQ

Navigation

Related Topics

RGRDC¶

reheatfunq.coverings¶

`reheatfunq.coverings`¶