CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Thu, 24 Jul 2025 16:35:21 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20091001033534 location: https://web.archive.org/web/20091001033534/https://github.com/hintly/hintly_ensemble/tree/master server-timing: captures_list;dur=6.501534, exclusion.robots;dur=0.042533, exclusion.robots.policy;dur=0.026632, esindex;dur=0.020129, cdx.remote;dur=35.025815, LoadShardBlock;dur=542.886479, PetaboxLoader3.datanode;dur=173.442808, PetaboxLoader3.resolve;dur=255.430179 x-app-server: wwwb-app202 x-ts: 302 x-tr: 648 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=1 set-cookie: SERVER=wwwb-app202; path=/ x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 301 server: nginx date: Thu, 24 Jul 2025 16:35:22 GMT content-type: text/html; charset=utf-8 content-length: 106 x-archive-orig-server: nginx/0.7.61 x-archive-orig-date: Thu, 01 Oct 2009 03:35:33 GMT x-archive-orig-connection: close x-archive-orig-status: 301 Moved Permanently location: https://web.archive.org/web/20091001033534/https://github.com/hintly/hintly_ensemble x-archive-orig-x-runtime: 5ms x-archive-orig-content-length: 106 x-archive-orig-cache-control: no-cache cache-control: max-age=1800 memento-datetime: Thu, 01 Oct 2009 03:35:34 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Wed, 02 Sep 2009 08:46:37 GMT", ; rel="prev memento"; datetime="Sun, 06 Sep 2009 02:50:50 GMT", ; rel="memento"; datetime="Thu, 01 Oct 2009 03:35:34 GMT", ; rel="next memento"; datetime="Tue, 01 Dec 2009 04:19:11 GMT", ; rel="last memento"; datetime="Mon, 03 Feb 2020 21:34:15 GMT" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: 51_12_20091001010305_crawl102.gpg-c/51_12_20091001033419_crawl101.arc.gz server-timing: captures_list;dur=0.781300, exclusion.robots;dur=0.036965, exclusion.robots.policy;dur=0.022017, esindex;dur=0.016226, cdx.remote;dur=35.938265, LoadShardBlock;dur=524.486750, PetaboxLoader3.resolve;dur=511.262083, PetaboxLoader3.datanode;dur=253.750331, load_resource;dur=273.556086 x-app-server: wwwb-app202 x-ts: 301 x-tr: 879 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=1 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Thu, 24 Jul 2025 16:35:23 GMT content-type: text/html; charset=utf-8 x-archive-orig-server: nginx/0.7.61 x-archive-orig-date: Thu, 01 Oct 2009 03:35:34 GMT x-archive-orig-connection: close x-archive-orig-status: 200 OK x-archive-orig-etag: "05148d25af73646d997531a4966266c7" x-archive-orig-x-runtime: 126ms x-archive-orig-content-length: 31048 x-archive-orig-cache-control: private, max-age=0, must-revalidate x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Thu, 01 Oct 2009 03:35:34 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Sun, 06 Sep 2009 02:50:50 GMT", ; rel="prev memento"; datetime="Sun, 06 Sep 2009 02:50:50 GMT", ; rel="memento"; datetime="Thu, 01 Oct 2009 03:35:34 GMT", ; rel="next memento"; datetime="Sat, 07 Nov 2009 20:38:15 GMT", ; rel="last memento"; datetime="Mon, 03 Feb 2020 20:11:41 GMT" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: 51_12_20091001010305_crawl102.gpg-c/51_12_20091001033419_crawl101.arc.gz server-timing: captures_list;dur=0.502383, exclusion.robots;dur=0.023766, exclusion.robots.policy;dur=0.014237, esindex;dur=0.010274, cdx.remote;dur=40.654987, LoadShardBlock;dur=310.979356, PetaboxLoader3.datanode;dur=210.832510, PetaboxLoader3.resolve;dur=293.341697, load_resource;dur=247.138807 x-app-server: wwwb-app202 x-ts: 200 x-tr: 691 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip hintly's hintly_ensemble at master - GitHub

hintly / hintly_ensemble

Description:	edit
Homepage:	edit
Public Clone URL:	git://github.com/hintly/hintly_ensemble.git Give this clone URL to anyone. `git clone git://github.com/hintly/hintly_ensemble.git`
Your Clone URL:	git@github.com:hintly/hintly_ensemble.git Use this clone URL yourself. `git clone git@github.com:hintly/hintly_ensemble.git`

correct typo; spell out SA

Daniel Haran (author)

Thu Sep 03 17:38:07 -0700 2009

commit 660c3a752bcc307eb89391b414556e6fc6f75345
tree 0383d773420a965ac1dc168099a5535336b98e90
parent e7a0b3c8b3bc95c23ca9c5d14ac745c5210a1b90

hintly_ensemble /

name	age	history message
README.rdoc	Thu Sep 03 17:38:07 -0700 2009	correct typo; spell out SA [Daniel Haran]
blending/	Wed Sep 02 18:04:04 -0700 2009	blending algo [Xiang Liang]
download/	Thu Sep 03 04:14:30 -0700 2009	data set [Xiang Liang]
include/	Sun Aug 30 19:23:50 -0700 2009	source [Xiang Liang]
knni-all/	Mon Aug 31 03:05:01 -0700 2009	item based knn with language and repos name inf... [Xiang Liang]
knni/	Sun Aug 30 19:23:50 -0700 2009	source [Xiang Liang]
knnu-all/	Sun Aug 30 21:01:25 -0700 2009	knnu all [Xiang Liang]
knnu/	Sun Aug 30 19:29:16 -0700 2009	user based knn [Xiang Liang]
knnui/	Sun Aug 30 19:50:31 -0700 2009	knnui [Xiang Liang]
popular/	Mon Aug 31 03:38:40 -0700 2009	recommend popular repos [Xiang Liang]
repo-all/	Thu Sep 03 04:11:08 -0700 2009	repo all [Xiang Liang]
repos/	Mon Aug 31 05:33:56 -0700 2009	repos and collaborators [Xiang Liang]
results.txt	Sun Aug 30 11:52:39 -0700 2009	top entry so far based on xlvector's entry [Daniel Haran]

README.rdoc

About this entry

First, thanks to Github for putting on a great contest.

This entry was supposed to be a blend of xlvector and jeremybarnes’ solutions, but due to time it ended up being just xlvector’s.

Liang Xiang (xlvector) will explain his model shortly, Jeremy’s already explained a great deal in his README (read it, it’s great!).

The secret sauce?

More data equals better results.

We also crammed in more heuristics.

Next steps

Anyone wanting to build on this should probably pay close attention to blending. The algorithms themselves aren’t incredibly new, but there’s probably low-hanging fruit with neural networks, decision trees (or combinations) for blending ordered and weighted lists.

Algorithms

Item Based KNN

This algo is in knni directory. I use cosine similarity with inverse user frequency to measure item-item similarity.

We also use reponame and language information in measuring item-item similarity, this algo is implemented in knni-all.

User Based KNN

This algo is in knnu directory. I use cosine similarity with inverse item frequency to measure user-user similarity.

We also use reponame and language information in measuring user-user similarity, this algo is implemented in knnu-all.

Let’s take language information for example, given two user a, b. Let La[i] be the percent of repo which is written in language i user a watch. Then, the language similarity of user a, b can be calculated by cosine(La, Lb).

Hybrid User/Item KNN

This algo is in knnui directory

Removing unlike items

If we want to find the repos a user will watch, we can solve this problem by removing items he/she will not watch. In this way, I use some extremum value to remove unlike items.

For example, given a user u, if the popularity of the most popular repo he/she watch is N. Then, in the recommendation list, we should decent the likeness of the repos which popularity is larger than N. In blending/main.cpp, function userMostPopular do this task.

Further more, we know the date of the repo when it is created. Let T1 be the earliest create date of repos user u watch, and T2 be the latest create date of repos user u watch. Then, we should increase the weight of repos created between T1 and T2. In blending/main.cpp, function postProcessByDate do this task.

Result Diversity

In the contest, we find users who watch a lot of repos are hard to be predicted. This is because they have diversity interest, therefore, it is hard to cover their interest in only 10 recommendations. In this way, we should let recommendations list include different interest field of users and we have to improve the diversity of recommendations.

This algo is implemented in function diversity() in file blending/main.cpp

Extra data

repo_watch number. I download repo information by Github API, a repo have many information, but watch times is the best information. Let N1(i) be the watch time of repo i in data.txt and N2(i) be the real watch time of repo i, then, if N1(i) == N2(i), it means i will never be in the recommendation list.

collaborators information, this information is also very important, it is used in directory repos/

Without these extra data, I can only get 2600 by the data github provide. So, extra data is important in the contest.

Blending

The most important lesson I learn from Netflix Prize is one algo can not solve everything. In this way, we can get better result by blending algos I used above. I used a simple linear blending method, a divide users into 3 groups by the number of repos they watched. In every group, I use simulated annealing to learn the linear blending weight of different algos and get the best blending weight. Then, I reset these weight by submission. Github contest allow a competitor to sumbit many results every day, so the best blending weight can be learning by submission, therefore, the best result is overfitting in test set.

However, if I only using the blending weight got from SA, I can only get 253x in test set. So, I think jeremybarnes did a good job in github contest.

Original Source | Taken Source