Statistical Resampling in Python I
In Machine Learning, resampling is a common technique (a.k.a. cross-validation), but I'm getting more and more into resampling for all kinds of statistical calculations. Here's a simple Python script:
from sklearn.utils import resample
data = [0.1, 0.3, 0.5, 0.9, 1.5, 1.6, 1.7, 2.1]
# Bootstrap sample boot = resample(data, replace= True, n_samples=3, random_state=12)
print('Bootstrap Sample: %s' %boot)
# Out of bag observation oob = [x for x in data if x not in boot]
OOB Sample: [0.1, 0.3, 0.5, 1.5, 1.6, 2.1]
What do we try to achieve? The bootstrap method is simply a resampling technique to estimate statistics on a population by sampling a dataset with replacement.
This technique can be incredibly useful. For example, what's the minimum number of samples so that we can apply the Benford algorithm? Ideally, we define it with a Confidence Interval such as 90%, 95%, and 99%. I will post this use case in a future blog post.