Functions of synthetic data generators.¶
Created on Fri Apr 10 17:17:41 2020
This script is for generating synthetic data. You can use multi-class data to generate balance dataset.
Abdullah BAS abdullah.bas@boun.edu.tr BME Bogazici University Istanbul / Uskudar @author: abas
-
class
trklearn.
ADASYN
¶ Bases:
object
-
ADASYN
(target, verbose=True, B=1, K=15, threshold=0.7)¶ This class is implementation of ADASYN.
- Parameters
Xnp (np.array) – Input array must be numpy
target (np.array) – response array
verbose (bool, optional) – If zero will not output any verbose. Defaults to True.
B (int, optional) – B is the balance ratio that you want to reach. Defaults to 1.
K (int, optional) – Numbers of neighbours. Defaults to 15.
threshold (float, optional) – Activation threshold. Above this balance ratio function will not work. Defaults to 0.7.
- Returns
[description]
- Return type
[type]
-
fit_resample
(target, B=1, K=15, threshold=0.7)¶ fit_resample stands for outputting the generated data combined with the input data
- Parameters
Xnp (float) – Input data
target (float,int) – Corresponding response.
B (int, optional) – Balance ratio. It is the threshold for the generated data. Defaults to 1.
K (int, optional) – K-neigbours. Defaults to 15.
threshold (float, optional) – It is the threshold for imbalance ratio. Function runs only below this ratio. Defaults to 0.7.
- Returns
Output xnp [float]: Output target
- Return type
[float]
-
-
class
trklearn.
ASUWO
¶ Bases:
object
-
ASUWO
(target, n, k, irt, de=- 1, normalization=1)¶ ASUWO is supporting multi-class synthetic data generation.
- Parameters
(type (target) – all): Input array must be numpy array
(type – all): Corresponding response to Xnp
n ([type]) –
k (int) – Neighbours number
irt (float) – Minimum imbalance ratio targeted
knn ([type]) – [description]
de (int, optional) – . Defaults to -1.
normalization (bool, optional) – Switch for normalization. Defaults to 1.
-
-
trklearn.
EuclidianDistance
(data1, data2)¶ Euclidian Distance implementation
- Parameters
data1 (float) – data point 1
data2 (float) – data point 2
- Returns
distance between two data points
- Return type
[float]
-
class
trklearn.
MinMaxNormalization
(data, axis=0)¶ Bases:
object
- Min-Max Normalization. Was using in conjunction of ADASYN to test results
data: Data to be normalized axis: 0 is by columns, 1 is by rows
returns: Normalized data
-
class
trklearn.
SMOTE
¶ Bases:
object
-
SMOTE
(target, N=- 1, threshold=0.7, verbose=True)¶ Implementation of SMOTE algorithm
- Parameters
Xnp (float) – Input data
target (float) – Target/Response array
N (int, optional) – Maximum class. If it is not known left it to default. Defaults to -1.
threshold (float, optional) – [description]. Defaults to 0.7.
verbose (bool, optional) – [description]. Defaults to True.
- Returns
xiP2 only generated synthetic data
- Return type
[float]
-
fit_resample
(target, N=- 1, threshold=0.7, verbose=True)¶ fit_resample is the function that outputs the input data and generated data combined in one array.
- Parameters
Xnp (float) – Input data
target (float) – Target/Response array
N (int, optional) – Class that has the maximum elements. Defaults to -1.
threshold (float, optional) – Activation threshold of imbalance ratio. Below this threshold function will run. Defaults to 0.7.
verbose (bool, optional) – Defaults to True.
- Returns
Xnp output [float]: target output
- Return type
[float]
-