Functions of synthetic data generators.

Created on Fri Apr 10 17:17:41 2020

This script is for generating synthetic data. You can use multi-class data to generate balance dataset.

Abdullah BAS abdullah.bas@boun.edu.tr BME Bogazici University Istanbul / Uskudar @author: abas

class trklearn.ADASYN

Bases: object

ADASYN(target, verbose=True, B=1, K=15, threshold=0.7)

This class is implementation of ADASYN.

Parameters
  • Xnp (np.array) – Input array must be numpy

  • target (np.array) – response array

  • verbose (bool, optional) – If zero will not output any verbose. Defaults to True.

  • B (int, optional) – B is the balance ratio that you want to reach. Defaults to 1.

  • K (int, optional) – Numbers of neighbours. Defaults to 15.

  • threshold (float, optional) – Activation threshold. Above this balance ratio function will not work. Defaults to 0.7.

Returns

[description]

Return type

[type]

fit_resample(target, B=1, K=15, threshold=0.7)

fit_resample stands for outputting the generated data combined with the input data

Parameters
  • Xnp (float) – Input data

  • target (float,int) – Corresponding response.

  • B (int, optional) – Balance ratio. It is the threshold for the generated data. Defaults to 1.

  • K (int, optional) – K-neigbours. Defaults to 15.

  • threshold (float, optional) – It is the threshold for imbalance ratio. Function runs only below this ratio. Defaults to 0.7.

Returns

Output xnp [float]: Output target

Return type

[float]

class trklearn.ASUWO

Bases: object

ASUWO(target, n, k, irt, de=- 1, normalization=1)

ASUWO is supporting multi-class synthetic data generation.

Parameters
  • (type (target) – all): Input array must be numpy array

  • (type – all): Corresponding response to Xnp

  • n ([type]) –

  • k (int) – Neighbours number

  • irt (float) – Minimum imbalance ratio targeted

  • knn ([type]) – [description]

  • de (int, optional) – . Defaults to -1.

  • normalization (bool, optional) – Switch for normalization. Defaults to 1.

trklearn.EuclidianDistance(data1, data2)

Euclidian Distance implementation

Parameters
  • data1 (float) – data point 1

  • data2 (float) – data point 2

Returns

distance between two data points

Return type

[float]

class trklearn.MinMaxNormalization(data, axis=0)

Bases: object

Min-Max Normalization. Was using in conjunction of ADASYN to test results

data: Data to be normalized axis: 0 is by columns, 1 is by rows

returns: Normalized data

class trklearn.SMOTE

Bases: object

SMOTE(target, N=- 1, threshold=0.7, verbose=True)

Implementation of SMOTE algorithm

Parameters
  • Xnp (float) – Input data

  • target (float) – Target/Response array

  • N (int, optional) – Maximum class. If it is not known left it to default. Defaults to -1.

  • threshold (float, optional) – [description]. Defaults to 0.7.

  • verbose (bool, optional) – [description]. Defaults to True.

Returns

xiP2 only generated synthetic data

Return type

[float]

fit_resample(target, N=- 1, threshold=0.7, verbose=True)

fit_resample is the function that outputs the input data and generated data combined in one array.

Parameters
  • Xnp (float) – Input data

  • target (float) – Target/Response array

  • N (int, optional) – Class that has the maximum elements. Defaults to -1.

  • threshold (float, optional) – Activation threshold of imbalance ratio. Below this threshold function will run. Defaults to 0.7.

  • verbose (bool, optional) – Defaults to True.

Returns

Xnp output [float]: target output

Return type

[float]