PyDP

Algorithms

class pydp.algorithms.laplacian.BoundedMean(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

BoundedMean computes the average of values in a dataset, in a differentially private manner.

Incrementally provides a differentially private average. All input vales are normalized to be their difference from the middle of the input range. That allows us to calculate the sum of all input values with half the sensitivity it would otherwise take for better accuracy (as compared to doing noisy sum / noisy count). This algorithm is taken from section 2.5.5 of the following book (algorithm 2.4): https://books.google.com/books?id=WFttDQAAQBAJ&pg=PA24#v=onepage&q&f=false

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.BoundedSum(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

BoundedSum computes the sum of values in a dataset, in a differentially private manner.

Incrementally provides a differentially private sum, clamped between upper and lower values. Bounds can be manually set or privately inferred.

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.BoundedStandardDeviation(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

BoundedStandardDeviation computes the standard deviation of values in a dataset, in a differentially private manner.

Incrementally provides a differentially private standard deviation for values in the range [lower..upper]. Values outside of this range will be clamped so they lie in the range. The output will also be clamped between 0 and (upper - lower).

The implementation simply computes the bounded variance and takes the square root, which is differentially private by the post-processing theorem. It relies on the fact that the bounded variance algorithm guarantees that the output is non-negative.

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.BoundedVariance(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

BoundedVariance computes the variance of values in a dataset, in a differentially private manner.

Incrementally provides a differentially private variance for values in the range [lower..upper]. Values outside of this range will be clamped so they lie in the range. The output will also be clamped between 0 and (upper - lower)^2. Since the result is guaranteed to be positive, this algorithm can be used to compute a differentially private standard deviation.

The algorithm uses O(1) memory and runs in O(n) time where n is the size of the dataset, making it a fast and efficient. The amount of noise added grows quadratically in (upper - lower) and decreases linearly in n, so it might not produce good results unless n >> (upper - lower)^2.

The algorithm is a variation of the algorithm for differentially private mean from “Differential Privacy: From Theory to Practice”, section 2.5.5: https://books.google.com/books?id=WFttDQAAQBAJ&pg=PA24#v=onepage&q&f=false

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Max(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

Max computes the Max value in the dataset, in a differentially private manner.

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Min(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

Min computes the minium value in the dataset, in a differentially private manner.

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Median(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

Median computes the Median value in the dataset, in a differentially private manner.

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Count(epsilon: float = 1.0, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

Count computes the Count of number of items in the dataset, in a differentially private manner.

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Percentile(epsilon: float = 1.0, percentile: float = 0.0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, dtype: str = 'int')

Perencetile finds the value in the dataset with that percentile, in a differentially private manner.

add_entries(data: List[Union[int, float]]) None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float

Returns the epsilon set at initialization.

property epsilon: float

Returns the epsilon set at initialization.

property l0_sensitivity: float

Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float

Returns the linf_sensitivity set at initialization.

memory_used() float

Returns the memory currently used by the algorithm in bytes.

merge(summary)

Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

property percentile: float

percentile Gets the value that was set in the constructor.

privacy_budget_left() float

Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() None

Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

Numerical Mechanisms

class pydp.algorithms.numerical_mechanisms.NumericalMechanism

Base class for all (Ɛ, 𝛿)-differenially private additive noise numerical mechanisms.

add_noise(*args, **kwargs)

Overloaded function.

  1. add_noise(self: pydp.NumericalMechanism, result: int, privacy_budget: float) -> int

  2. add_noise(self: pydp.NumericalMechanism, result: int, privacy_budget: float) -> int

  3. add_noise(self: pydp.NumericalMechanism, result: float, privacy_budget: float) -> float

  4. add_noise(self: pydp.NumericalMechanism, result: int) -> int

  5. add_noise(self: pydp.NumericalMechanism, result: int) -> int

  6. add_noise(self: pydp.NumericalMechanism, result: float) -> float

property epsilon

The Ɛ of the numerical mechanism

memory_used(self: pydp.NumericalMechanism) int
noise_confidence_interval(self: pydp.NumericalMechanism, confidence_level: float, privacy_budget: float, noised_result: float) pydp.ConfidenceInterval

Returns the confidence interval of the specified confidence level of the noise that AddNoise() would add with the specified privacy budget. If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y]

noised_value_above_threshold(self: pydp.NumericalMechanism, arg0: float, arg1: float) bool

Quickly determines if result with added noise is above certain threshold.

class pydp.algorithms.numerical_mechanisms.LaplaceMechanism

Bases: NumericalMechanism

property diversity

The diversity of the Laplace mechanism.

get_uniform_double(self: pydp.LaplaceMechanism) float
property sensitivity

The L1 sensitivity of the query.

class pydp.algorithms.numerical_mechanisms.GaussianMechanism

Bases: NumericalMechanism

property delta

The 𝛿 of the Gaussian mechanism.

property l2_sensitivity

The L2 sensitivity of the query.

property std

The standard deviation parameter of the Gaussian mechanism underlying distribution.

Distributions

class pydp.distributions.GaussianDistribution
sample(self: pydp.GaussianDistribution, scale: float = 1.0) float
Samples the Gaussian with distribution Gauss(scale*stddev).
scale

A factor to scale stddev.

property stddev

Returns stddev

class pydp.distributions.LaplaceDistribution

Draws samples from the Laplacian distribution.

get_diversity(self: pydp.LaplaceDistribution) float

Returns the parameter defining this distribution, often labeled b.

get_uniform_double(self: pydp.LaplaceDistribution) float

Returns a uniform random integer of in range [0, 2^53).

sample(self: pydp.LaplaceDistribution, scale: float = 1.0) float

Samples the Laplacian distribution Laplace(u, scale*b).

Parameters:

scale – A factor to scale b.

Util

pydp.util.Geometric() int
pydp.util.UniformDouble() float
pydp.util.correlation(arg0: List[float], arg1: List[float]) float

Returns linear correlation coefficient.

pydp.util.get_next_power_of_two(arg0: float) float

Outputs value of a power of two that is greater than and closest to the given numerical input.

pydp.util.mean(*args, **kwargs)

Overloaded function.

  1. mean(arg0: List[float]) -> float

Calculation of the mean of given set of numbers for a double int data type.

  1. mean(arg0: List[int]) -> float

Calculation of the mean of given set of numbers for an int data type.

pydp.util.order_statistics(arg0: float, arg1: List[float]) float

Sample values placed in ascending order.

pydp.util.qnorm(arg0: float, arg1: float, arg2: float) pydp._pydp.StatusOrD

Quantile function of normal distribution, inverse of the cumulative distribution function.

pydp.util.standard_deviation(arg0: List[float]) float

Standard Deviation, the square root of variance.

pydp.util.variance(arg0: List[float]) float

Calculate variance for a set of values.

pydp.util.vector_filter(arg0: List[float], arg1: List[bool]) List[float]

Filtering a vector using a logical operatio with only values selected using true output in their positions.

pydp.util.vector_to_string(arg0: List[float]) str

Conversion of a vector to a string data type.

ML

Partition Selection

class pydp.algorithms.partition_selection.PartitionSelectionStrategy

Base class for all (Ɛ, 𝛿)-differenially private partition selection strategies.

should_keep(num_users: int) bool

Decides whether or not to keep a partition with num_users based on differential privacy parameters and strategy.

pydp.algorithms.partition_selection.create_partition_strategy(strategy: str, epsilon: float, delta: float, max_partitions_contributed: int) PartitionSelectionStrategy

Creates a PartitionSelectionStrategy instance.

Parameters:
  • strategy

    One of:
    • ’truncated_geomteric’: creates a Truncated Geometric Partition Strategy.

    • ’laplace’: creates a private partition strategy with Laplace mechanism.

    • ’gaussian’: creates a private partition strategy with Gaussian mechanism.

  • epsilon – The \(\varepsilon\) of the partition mechanism

  • delta – The \(\delta\) of the partition mechanism

  • max_partitions_contributed – The maximum amount of partitions contributed by the strategy.