X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=heikki%2FREADME-HEIKKI;h=0b547a1f93c0beaaaeafb416294f8c12a55bccb2;hb=refs%2Fheads%2Franking-h;hp=a6ca02d6e56851df64a805eccb53462ea5dd643f;hpb=b6b190610799a920163200fd5920406adcc3f6c0;p=pazpar2-moved-to-github.git diff --git a/heikki/README-HEIKKI b/heikki/README-HEIKKI index a6ca02d..0b547a1 100644 --- a/heikki/README-HEIKKI +++ b/heikki/README-HEIKKI @@ -38,3 +38,52 @@ Next: See if I can implement a round robin. - keep an array of structs with the pointer, and locate the client number that way - robin-score = pos * n_clients + client_num +relevance_new_rec is called every time a new record pops up. One or more to count_word, +exactly one to done_rec. That's where I can compare to the ranking of the previous +record. struct_relevance is one structure I have for myself, global (for the user +session), so I can keep my stuff in there, possibly an array of things for each target. + +I should also add stuff directly to the client, and to the record, as I need. + +Next: Plot the tf/idf scores against round-robin sorted order. Will be messy, +but later when we get a target that returns sorted records, it will make sense. + + +Wed 27-Nov +Setting up multiple SOLR targets in the same pazpar2 + - Add #999 to the z-urls, so pazpar2 won't merge them. Different number for each + +This URL shows the databases, with their numbers +http://lui.indexdata.com/solr/select?q=database:*&facet=true&facet.method=fc&facet.field=author_exact&facet.field=subject_exact&facet.field=date&facet.field=medium_exact&facet.field=database&rows=0&facet.mincount=1 + +Add this to the target defs + + +After this, it should be possible to get records from different databases, some +with many records, some with a few. This is a good testing ground for merging +rankings! Test first with a round-robin, and plot the scores. + +Thu 28-Nov +Ok, I can now merge a number of SOLR databases (harvest jobs), and plot their rankings +as solr gives them, in the order of different merge strategies +Next: Add the normalizing merge strategy. Then plot different strategies against different queries +Write a conclusion, and consider this plotting job done + + +Fri 13-Dec-2013 +Adam is adding a float type to pazpar2. I have made a prrof of concept of the normalizing +by curve fitting. I think it is time to close this branch, and start (re)implementing +things in the main branch. Keep the old branch around for reference! + +Need new config options: + - sort: native, native + position + - or per target: native score / fake score from position / use tf/idf + - per target: weight for combining rankings (cluster merge), so we can trust one + target more than others + - per target: boost rankings + +Start coding: + - in relevance-prepare-read, go through records, collect scores in arrays (per target), + - fit the curve, normalize the scores. + - cluster scoring +