RGRDC

reheatfunq.coverings

Facilities to compute Random Regional R-Disk Coverings (RGRDCs). A RGRDC is a derived product of a global point data set (e.g. a global heat flow database). The covering consists of sequentially generated disks of a radius \(R\) randomly distributed over Earth under the constraint that

  1. From the set of points within the disk, no two points are closer than the minimum distance \(d_\mathrm{min}\) from each other.

  2. No data point is part of a previous disk.

  3. The disk center is not contained within an optional exclusion polygon that represents a region of interest for local analysis.

  4. There are more than a minimum number of points remaining in the disk.

The function random_global_R_disk_coverings() computes RGRDCs. It operates by iteratively drawing random disk centers on the sphere and testing whether all conditions can be met. After a maximum number of disk centers have been drawn, the algorithm terminates. The function is used in the following notebooks:

The function conforming_data_selection() can ensure the \(d_\mathrm{min}\) criterion within a set of heat flow measurements. It proceeds to resolve conflicts to this criterion by iteratively dropping a random data point of a violating data point pair until no more data point pairs violate the criterion.

The function bootstrap_data_selection() creates a number of such conforming data selections using random decisions for each conflict.


random_global_R_disk_coverings(R, min_points, hf, buffered_poly_xy, proj_str, N=10000, MAX_DRAW=100000, dmin=0.0, seed=982981, used_points=None, a=6378137.0)

Uses rejection sampling to draw a number of exclusive regional distributions.

Parameters:
  • R (float) – Radius \(R\) of the RGRDC (in m).

  • min_points (int) – Minimum number of points within a distribution after all other conditions are met. If the number of data points is less, the proposed disk is rejected.

  • hf (array_like) – Array of heat flow data points of shape (N,3), where N is the number of data points. The second dimension must contain a tuple \((q_i, \lambda_i, \phi_i)\) for each data point, where \(q_i\) is the heat flow, \(\lambda_i\) the longitude in degrees, and \(\phi_i\) the latitude in degrees.

  • buffered_poly_xy (list[array_like]) – List of polygons which will reject disks if their centers fall within one of the polygons. Each element of the list must be a (M[i],2)-shaped numpy array where \(M[i]\) is the number of points composing the i th polygon and the second dimension iterates the coordinates \(x\) and \(y\). The coordinates are interpreted within the coordinate system described by the proj_str parameter.

  • proj_str (str) – A PROJ string describing a projected coordinate system within which the polygons supplied in the buffered_poly_xy parameter are interpreted.

  • N (int, optional) – Target number of accepted disks. Might not be reached but can lead to an early exit. The default is high enough that likely MAX_DRAW is saturated before.

  • MAX_DRAW (int, optional) – Maximum number of disk centers to generate. Might not be reached if N is small.

  • dmin (float, optional) – Minimum inter-point distance for the conforming selection criterion (in m).

  • seed (int, optional) – Seed passed to np.random.default_rng().

  • used_points (list[int], optional) – A list of data point indices that can be marked as used a priori.

  • a (float, optional) – Large half axis of the sphere used. This parameter is used for a scipy.spatial.KDTree-based fast data point query before computing geodesic distances between data points.

Returns:

  • valid_points (list) – A list of \(v\) centroids of the accepted disks.

  • used_points (set) – A set of all points which are part of an accepted heat flow distribution.

  • distributions (list) – The list of \(v\) distributions, each a one-dimensional numpy array of sorted heat flow values.

  • lolas (list) – The list of data point coordinates corresponding to the heat flow data within distributions. Each is a two-dimensional numpy array in which the second dimension iterates a tuple \((\lambda,\phi)\) of geographic coordinates.

  • distribution_indices (list) – The list of index lists of the data points used in the distributions. Each is a one-dimensional array of integer indices into the input data set that compose the corresponding entry of distributions. The indices distribution_indices[i] are generally not in the same order as the heat flow values in distributions[i].

conforming_data_selection(const double[:, :] xy, double dmin_m, rng=128)

This methods applies the spatial data filtering technique described in the paper, sub-sampling the data so that the minimum distance remains above dmin_m.

The selection process for non-conforming data pairs is stochastic but reproducible with identical random number generator rng.

Parameters:
  • xy (array_like) – (N,2) array of data points in a projected Euclidean coordinate system (in m).

  • dmin_m (float) – Minimum inter-point distance for the conforming selection criterion (in m).

  • rng (int | numpy.random.Generator) – A seed or random generator to draw from for reproducibility.

Returns:

mask – A mask filtering out non-conforming data points.

Return type:

numpy.ndarray

bootstrap_data_selection(const double[:, ::1] xy, double dmin_m, size_t B, rng=127)

Computes a set of bootstrap samples of heat flow data points conforming to the data selection criterion.

Parameters:
  • xy (array_like) – (N,2) array of data points in a projected Euclidean coordinate system (in m).

  • dmin_m (float) – Minimum inter-point distance for the conforming selection criterion (in m).

  • B (int) – Number of bootstrap samples to draw.

  • rng (int | numpy.random.Generator) – A seed or random generator to draw from for reproducibility.

Returns:

subselections – A list of index arrays. Each index array lists the indices of a conforming data selection within the data array xy. The number of index arrays is at most B. Duplicate data selections are returned only once.

Return type:

list