Business Problem: The client required a model to predict biological activities of different molecules to ascertain their on- & off-target effectiveness
Solution: After evaluating several statistical models, an ensemble of models based on Lasso Regression and SVMs in R.
The key challenge was handling the huge datasets.Dimension Reduction Techniques were used to ensure that the test and training data distribution are the same. Statistical models evaluated includesLasso Regression using the glmnet package in R, SVM using the e1071 package in R, Random Forest using the randomForest package in R, SupportVectorRegressor, RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor, LassoCV and Bayesian Ridge model using the Skilearn package in Python
Technology: R 2.15.1, Skilearn package in Python