The Protocols and Structures for Inference (PSI) project has developed an architecture for presenting machine learning algorithms, their inputs (data) and outputs (predictors) as resource-oriented RESTful web services in order to make machine learning technology accessible to a broader range of people than just machine learning researchers.
Data Science Studio (DSS) from Dataiku is a complete Data Science software tool for developers and analysts, which significantly shortens the time-consuming load-clean-train-test-deploy cycles of building predictive applications. A community edition and a free trial available.
Resampling is a method which researchers use to determine where their model is accurate enough or not and also find different problem of their model. The common process in machine learning is taking a part of all data and use it as a validation set, the method which is called Cross-Validation resampling.
1-Randomization exact test: Randomization exact test is a test procedure in which data arerandomly re-assigned so that an exact p-value is calculated based on the permutateddata.
2-Cross validation Simple cross-validation. Take regression as an example. In the process of implementinga simple cross-validation, the first sub-sample is usually used for deriving the regressionequation while another sub-sample is used for generating predicted scores from the firstregression equation. Next, the cross-validity coefficient is computed by correlating thepredicted scores and the observed scores on the outcome variable.
Double cross-validation. Double cross-validation is a step further than its simplecounterpart. Take regression as an example again. In double cross-validation regressionequations are generated in both sub-samples, and then both equations are used togenerate predicted scores and cross-validity coefficients.
Multicross-validation. Multicross-validation is an extension of double cross-validation.In this form of cross-validation, double cross-validation procedures are repeated manytimes by randomly selecting sub-samples from the data set. In the context of regressionanalysis, beta weights computed in each sub-sample are used to predict the outcomevariable in the corresponding sub-sample. Next, the observed and predicted scores of theoutcome variable in each sub-sample are used to compute the cross validated coefficient.
3-Jackknife Jackknife is a step beyond cross-validation. In Jackknife, the same test is repeated byleaving one subject out each time. Thus, this technique is also called leave one out. Thisprocedure is especially useful when the dispersion of the distribution is wide or extremescores are present in the data set. In these cases it is expected that Jackknife couldreturn a bias-reduced estimation.
4-Bootstrap in bootstrap, the originalsample could be duplicated as many times as the computing resources allow, and thenthis expanded sample is treated as a virtual population. Then samples are drawn fromthis population to verify the estimators. Obviously the "source" for resampling inbootstrap could be much larger than that in the other two. In addition, unlike crossvalidation and Jackknife, the bootstrap employs sampling with replacement