Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can I explain this drop in performance on test data?
I am asking the question here, despite the fact that I hesitated to post it on CrossValidated (or DataScience) StackExchange. i've a dataset of 60 classified objects (for use for education) and 150 unlabeled gadgets (for take a look at). The goal of the hassle is to expect the labels of the 150 gadgets (this used to receive as a homework problem). For every object, I computed 258 capabilities. considering each item as a sample, i've X_train : (60,258), y_train : (60,) (labels of the items used for training) and X_test : (one hundred fifty,258). due to the fact the solution of the homework hassle was given, I additionally have the proper labels of the a hundred and fifty objects, in y_test : (a hundred and fifty,).

A good way to are expecting the labels of the a hundred and fifty items, I pick out to apply a LogisticRegression (the Scikit-research implementation). The classifier is educated on (X_train, y_train), after the facts has been normalized, and used to make predictions for the a hundred and fifty objects. the ones predictions are compared to y_test to assess the performance of the model. For reproducibility, I replica the code i have used.

Forum Jump:

Users browsing this thread: 1 Guest(s)

About Becoming A Data Scientist is a blog created by Renee Teate to track her path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist". She created this club so participants can work together and help one another learn data science. See her other site DataSciGuide for more learning resources.

Sponsored by DataCamp!