RGRDC

reheatfunq.coverings

Facilities to compute Random Regional R-Disk Coverings (RGRDCs). A RGRDC is a derived product of a global point data set (e.g. a global heat flow database). The covering consists of sequentially generated disks of a radius R randomly distributed over Earth under the constraint that

  1. From the set of points within the disk, no two points are closer than the minimum distance dmin from each other.

  2. No data point is part of a previous disk.

  3. The disk center is not contained within an optional exclusion polygon that represents a region of interest for local analysis.

  4. There are more than a minimum number of points remaining in the disk.

The function random_global_R_disk_coverings() computes RGRDCs. It operates by iteratively drawing random disk centers on the sphere and testing whether all conditions can be met. After a maximum number of disk centers have been drawn, the algorithm terminates. The function is used in the following notebooks:

The function conforming_data_selection() can ensure the dmin criterion within a set of heat flow measurements. It proceeds to resolve conflicts to this criterion by iteratively dropping a random data point of a violating data point pair until no more data point pairs violate the criterion.

The function bootstrap_data_selection() creates a number of such conforming data selections using random decisions for each conflict.


random_global_R_disk_coverings(R, min_points, hf, buffered_poly_xy, proj_str, N=10000, MAX_DRAW=100000, dmin=0.0, seed=982981, used_points=None, a=6378137.0)

Uses rejection sampling to draw a number of exclusive regional distributions.

Parameters:
  • R (float) – Radius R of the RGRDC (in m).

  • min_points (int) – Minimum number of points within a distribution after all other conditions are met. If the number of data points is less, the proposed disk is rejected.

  • hf (array_like) – Array of heat flow data points of shape (N,3), where N is the number of data points. The second dimension must contain a tuple (qi,λi,ϕi) for each data point, where qi is the heat flow, λi the longitude in degrees, and ϕi the latitude in degrees.

  • buffered_poly_xy (list[array_like]) – List of polygons which will reject disks if their centers fall within one of the polygons. Each element of the list must be a (M[i],2)-shaped numpy array where M[i] is the number of points composing the i th polygon and the second dimension iterates the coordinates x and y. The coordinates are interpreted within the coordinate system described by the proj_str parameter.

  • proj_str (str) – A PROJ string describing a projected coordinate system within which the polygons supplied in the buffered_poly_xy parameter are interpreted.

  • N (int, optional) – Target number of accepted disks. Might not be reached but can lead to an early exit. The default is high enough that likely MAX_DRAW is saturated before.

  • MAX_DRAW (int, optional) – Maximum number of disk centers to generate. Might not be reached if N is small.

  • dmin (float, optional) – Minimum inter-point distance for the conforming selection criterion (in m).

  • seed (int, optional) – Seed passed to np.random.default_rng().

  • used_points (list[int], optional) – A list of data point indices that can be marked as used a priori.

  • a (float, optional) – Large half axis of the sphere used. This parameter is used for a scipy.spatial.KDTree-based fast data point query before computing geodesic distances between data points.

Returns:

  • valid_points (list) – A list of v centroids of the accepted disks.

  • used_points (set) – A set of all points which are part of an accepted heat flow distribution.

  • distributions (list) – The list of v distributions, each a one-dimensional numpy array of sorted heat flow values.

  • lolas (list) – The list of data point coordinates corresponding to the heat flow data within distributions. Each is a two-dimensional numpy array in which the second dimension iterates a tuple (λ,ϕ) of geographic coordinates.

  • distribution_indices (list) – The list of index lists of the data points used in the distributions. Each is a one-dimensional array of integer indices into the input data set that compose the corresponding entry of distributions. The indices distribution_indices[i] are generally not in the same order as the heat flow values in distributions[i].

conforming_data_selection(const double[:, :] xy, double dmin_m, rng=128)

This methods applies the spatial data filtering technique described in the paper, sub-sampling the data so that the minimum distance remains above dmin_m.

The selection process for non-conforming data pairs is stochastic but reproducible with identical random number generator rng.

Parameters:
  • xy (array_like) – (N,2) array of data points in a projected Euclidean coordinate system (in m).

  • dmin_m (float) – Minimum inter-point distance for the conforming selection criterion (in m).

  • rng (int | numpy.random.Generator) – A seed or random generator to draw from for reproducibility.

Returns:

mask – A mask filtering out non-conforming data points.

Return type:

numpy.ndarray

bootstrap_data_selection(const double[:, ::1] xy, double dmin_m, size_t B, rng=127)

Computes a set of bootstrap samples of heat flow data points conforming to the data selection criterion.

Parameters:
  • xy (array_like) – (N,2) array of data points in a projected Euclidean coordinate system (in m).

  • dmin_m (float) – Minimum inter-point distance for the conforming selection criterion (in m).

  • B (int) – Number of bootstrap samples to draw.

  • rng (int | numpy.random.Generator) – A seed or random generator to draw from for reproducibility.

Returns:

subselections – A list of index arrays. Each index array lists the indices of a conforming data selection within the data array xy. The number of index arrays is at most B. Duplicate data selections are returned only once.

Return type:

list