PyDP

Algorithms

class pydp.algorithms.laplacian.BoundedMean(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

BoundedMean computes the average of values in a dataset, in a differentially private manner.

Incrementally provides a differentially private average. All input vales are normalized to be their difference from the middle of the input range. That allows us to calculate the sum of all input values with half the sensitivity it would otherwise take for better accuracy (as compared to doing noisy sum / noisy count). This algorithm is taken from section 2.5.5 of the following book (algorithm 2.4): https://books.google.com/books?id=WFttDQAAQBAJ&pg=PA24#v=onepage&q&f=false

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.BoundedSum(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

BoundedSum computes the sum of values in a dataset, in a differentially private manner.

Incrementally provides a differentially private sum, clamped between upper and lower values. Bounds can be manually set or privately inferred.

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.BoundedStandardDeviation(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

BoundedStandardDeviation computes the standard deviation of values in a dataset, in a differentially private manner.

Incrementally provides a differentially private standard deviation for values in the range [lower..upper]. Values outside of this range will be clamped so they lie in the range. The output will also be clamped between 0 and (upper - lower).

The implementation simply computes the bounded variance and takes the square root, which is differentially private by the post-processing theorem. It relies on the fact that the bounded variance algorithm guarantees that the output is non-negative.

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.BoundedVariance(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

BoundedVariance computes the variance of values in a dataset, in a differentially private manner.

Incrementally provides a differentially private variance for values in the range [lower..upper]. Values outside of this range will be clamped so they lie in the range. The output will also be clamped between 0 and (upper - lower)^2. Since the result is guaranteed to be positive, this algorithm can be used to compute a differentially private standard deviation.

The algorithm uses O(1) memory and runs in O(n) time where n is the size of the dataset, making it a fast and efficient. The amount of noise added grows quadratically in (upper - lower) and decreases linearly in n, so it might not produce good results unless n >> (upper - lower)^2.

The algorithm is a variation of the algorithm for differentially private mean from “Differential Privacy: From Theory to Practice”, section 2.5.5: https://books.google.com/books?id=WFttDQAAQBAJ&pg=PA24#v=onepage&q&f=false

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Max(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

Max computes the Max value in the dataset, in a differentially private manner.

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Min(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

Min computes the minium value in the dataset, in a differentially private manner.

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Median(epsilon: float = 1.0, delta: float = 0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

Median computes the Median value in the dataset, in a differentially private manner.

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Count(epsilon: float = 1.0, l0_sensitivity: int = 1, linf_sensitivity: int = 1, dtype: str = 'int')

Count computes the Count of number of items in the dataset, in a differentially private manner.

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

class pydp.algorithms.laplacian.Percentile(epsilon: float = 1.0, percentile: float = 0.0, lower_bound: Optional[Union[int, float]] = None, upper_bound: Optional[Union[int, float]] = None, dtype: str = 'int')

Perencetile finds the value in the dataset with that percentile, in a differentially private manner.

add_entries(data: List[Union[int, float]]) → None

Adds multiple inputs to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current list passed is not added.

add_entry(value: Union[int, float]) → None

Adds one input to the algorithm.

Note: If the data exceeds the overflow limit of storage, the current data passed is not added.

property delta: float: Returns the epsilon set at initialization.

property epsilon: float: Returns the epsilon set at initialization.

property l0_sensitivity: float: Returns the l0_sensitivity set at initialization.

property linf_sensitivity: float: Returns the linf_sensitivity set at initialization.

memory_used() → float: Returns the memory currently used by the algorithm in bytes.

merge(summary): Merges serialized summary data into this algorithm. The summary proto must represent data from the same algorithm type with identical parameters. The data field must contain the algorithm summary type of the corresponding algorithm used. The summary proto cannot be empty.

noise_confidence_interval(confidence_level: float, privacy_budget: float) → float

Returns the confidence_level confidence interval of noise added within the algorithm with specified privacy budget, using epsilon and other relevant, algorithm-specific parameters (e.g. bounds) provided by the constructor.

This metric may be used to gauge the error rate introduced by the noise.

If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y].

By default, NoiseConfidenceInterval() returns an error. Algorithms for which a confidence interval can feasibly be calculated override this and output the relevant value.

Conservatively, we do not release the error rate for algorithms whose confidence intervals rely on input size.

property percentile: float: percentile Gets the value that was set in the constructor.

privacy_budget_left() → float: Returns the remaining privacy budget.

quick_result(data: List[Union[int, float]]) → Union[int, float]

Runs the algorithm on the input using the epsilon parameter provided in the constructor and returns output.

Consumes 100% of the privacy budget.

Note: It resets the privacy budget first.

reset() → None: Resets the algorithm to a state in which it has received no input. After Reset is called, the algorithm should only consider input added after the last Reset call when providing output.

result(privacy_budget: Optional[float] = None, noise_interval_level: Optional[float] = None) → Union[int, float]

Gets the algorithm result.

The default call consumes the remaining privacy budget.

When privacy_budget (defined on [0,1]) is set, it consumes only the privacy_budget amount of budget.

noise_interval_level provides the confidence level of the noise confidence interval, which may be included in the algorithm output.

serialize()

Serializes summary data of current entries into Summary proto. This allows results from distributed aggregation to be recorded and later merged.

Returns empty summary for algorithms for which serialize is unimplemented.

Numerical Mechanisms

class pydp.algorithms.numerical_mechanisms.NumericalMechanism

Base class for all (Ɛ, 𝛿)-differenially private additive noise numerical mechanisms.

add_noise(*args, **kwargs)

Overloaded function.

add_noise(self: pydp.NumericalMechanism, result: int, privacy_budget: float) -> int
add_noise(self: pydp.NumericalMechanism, result: int, privacy_budget: float) -> int
add_noise(self: pydp.NumericalMechanism, result: float, privacy_budget: float) -> float
add_noise(self: pydp.NumericalMechanism, result: int) -> int
add_noise(self: pydp.NumericalMechanism, result: int) -> int
add_noise(self: pydp.NumericalMechanism, result: float) -> float

property epsilon: The Ɛ of the numerical mechanism

memory_used(self: pydp.NumericalMechanism) → int

noise_confidence_interval(self: pydp.NumericalMechanism, confidence_level: float, privacy_budget: float, noised_result: float) → pydp.ConfidenceInterval: Returns the confidence interval of the specified confidence level of the noise that AddNoise() would add with the specified privacy budget. If the returned value is <x,y>, then the noise added has a confidence_level chance of being in the domain [x,y]

noised_value_above_threshold(self: pydp.NumericalMechanism, arg0: float, arg1: float) → bool: Quickly determines if result with added noise is above certain threshold.

class pydp.algorithms.numerical_mechanisms.LaplaceMechanism

Bases: NumericalMechanism

property diversity: The diversity of the Laplace mechanism.

get_uniform_double(self: pydp.LaplaceMechanism) → float

property sensitivity: The L1 sensitivity of the query.

class pydp.algorithms.numerical_mechanisms.GaussianMechanism

Bases: NumericalMechanism

property delta: The 𝛿 of the Gaussian mechanism.

property l2_sensitivity: The L2 sensitivity of the query.

property std: The standard deviation parameter of the Gaussian mechanism underlying distribution.

Distributions

class pydp.distributions.GaussianDistribution

sample(self: pydp.GaussianDistribution, scale: float = 1.0) → float

Samples the Gaussian with distribution Gauss(scale*stddev).

scale: A factor to scale stddev.

property stddev: Returns stddev

class pydp.distributions.LaplaceDistribution

Draws samples from the Laplacian distribution.

get_diversity(self: pydp.LaplaceDistribution) → float: Returns the parameter defining this distribution, often labeled b.

get_uniform_double(self: pydp.LaplaceDistribution) → float: Returns a uniform random integer of in range [0, 2^53).

sample(self: pydp.LaplaceDistribution, scale: float = 1.0) → float

Samples the Laplacian distribution Laplace(u, scale*b).

Parameters:: scale – A factor to scale b.

Util

pydp.util.Geometric() → int

pydp.util.UniformDouble() → float

pydp.util.correlation(arg0: List[float], arg1: List[float]) → float: Returns linear correlation coefficient.

pydp.util.get_next_power_of_two(arg0: float) → float: Outputs value of a power of two that is greater than and closest to the given numerical input.

pydp.util.mean(*args, **kwargs)

Overloaded function.

mean(arg0: List[float]) -> float

Calculation of the mean of given set of numbers for a double int data type.

mean(arg0: List[int]) -> float

Calculation of the mean of given set of numbers for an int data type.

pydp.util.order_statistics(arg0: float, arg1: List[float]) → float: Sample values placed in ascending order.

pydp.util.qnorm(arg0: float, arg1: float, arg2: float) → pydp._pydp.StatusOrD: Quantile function of normal distribution, inverse of the cumulative distribution function.

pydp.util.standard_deviation(arg0: List[float]) → float: Standard Deviation, the square root of variance.

pydp.util.variance(arg0: List[float]) → float: Calculate variance for a set of values.

pydp.util.vector_filter(arg0: List[float], arg1: List[bool]) → List[float]: Filtering a vector using a logical operatio with only values selected using true output in their positions.

pydp.util.vector_to_string(arg0: List[float]) → str: Conversion of a vector to a string data type.

ML

Partition Selection

class pydp.algorithms.partition_selection.PartitionSelectionStrategy

Base class for all (Ɛ, 𝛿)-differenially private partition selection strategies.

should_keep(num_users: int) → bool: Decides whether or not to keep a partition with num_users based on differential privacy parameters and strategy.

pydp.algorithms.partition_selection.create_partition_strategy(strategy: str, epsilon: float, delta: float, max_partitions_contributed: int) → PartitionSelectionStrategy

Creates a PartitionSelectionStrategy instance.

Parameters:

strategy –
One of:
- ’truncated_geomteric’: creates a Truncated Geometric Partition Strategy.
- ’laplace’: creates a private partition strategy with Laplace mechanism.
- ’gaussian’: creates a private partition strategy with Gaussian mechanism.
epsilon – The \(\varepsilon\) of the partition mechanism
delta – The \(\delta\) of the partition mechanism
max_partitions_contributed – The maximum amount of partitions contributed by the strategy.