Winningmodeldocumentation
Name: OwenZhang
Location: NJ,USA
Email: zhonghua.zhang2006@gmail.com
Competition: Clickthroughrateprediction
1. Summary
Thefinalsolutionisamanuallytunedblend(basedonPBfeedback)of4differentmodels
(RandomForest/sklearn,GBDT/xgboost,OnlineSGD/VowpalWabbit,Factorizationmachine/3
idiots).Thissolutionislargelybasedon3idiot’swinningsolutiontotheCriteocompetition
(https://github.com/guestwalk/kaggle2014criteo)withmoderateamountoffeature
engineeringandmanualtuning.
2. FeatureSelection/Extraction
Afewdifferentapproacheswereutilizedforfeatureengineering:
2a.Combiningsite/appbasedfeatures.Thesefeaturesarecomplementary(inthe
senseofifoneismissingtheotheroneisnot),socombiningthemwillatleastsave
space.
2b.Priordaymean(y)encodingforcategoricalfeatures.Thesearedoneinboth
univerateandmultivariateapproaches
2c.Countsandsequenceofdevice_ip.Device_ipseemstobeareasonableproxyof
useridentity.
2d.FactorizationMachinebasedpredictionsusingrawfeaturesand
counts/sequences.
2e.GBDTpredictedleafnodeusingrawfeatures,counts/sequences,andpriorday
mean(y)encodedcategoricalfeatures.
2f.Somemanualinteractions,especiallyapp_site_id*C1421
3. ModelingtechniquesandTraining
a. Iusedday30asvalidationandhadverystable(andcomparableinscore
movement)resultsthroughoutthecompetition.
b. 3idiots’factorizationmachineturnstobeextremelyeffectiveinthisproblem.
FMbasedmodelsarethebestindividualmodelsinthissolution.Itisworth
notingthattheyoutperformVWwithmanuallybuilt2wayinteractions.
i. BestFMwithGBDTfeaturesget~.3830onpublicLB
c. IspentfairamountoftimetuningVW(vowpalwabbit)models,espciallyaround
interactions.Idefinednamespacebyfeaturetype(C1421,device,app/site,
deviceid/ip,GBDTprediction)andtestedtwowayinteractionsthroughadhoc
(almostastepwise)process.
i. ItriedVWbuiltinFTRLoptimizationbutcannotgetittoperformbetter
thanthedefaultadaptiveprocedure.