Heikki's experiments with ranking

Personal notes, likely to be out of date.

Things to experiment with, and find out, and mess about

Goals:
 - Understand the ranking
 - make a better ranking merging algorithm


Tue 19-Nov-2013 Started this branch


Wed 20-Nov-2013 Make a script that tests ranking against yaz-zserver
(as that is the default config). Mostly to have a script to build on later.

Thu 21-Nov-2013. Start my own complete config

Fri 22-Nov-2013. Adam defined a new sort type, relevance_h, and put it place
in the code. Now I have a place to implement my stuff. Relevant places:
 pazpar2_config.c:1020  - minor
 session.c:1318 - call relevance_prepare_read also for my type
 reclists.c:104 - parse params
 reclists.c:166 - compare function (for quicksort)
 relevance.c:417 - calculate score
         (same function as for relevance, but with extra arg for type)

The compare function compares positions, when sorting by Metadata_sortkey_position
This loops through the records (in the cluster) and finds the smallest rec->pos
and then compares those.

Next: See if I can implement a round robin.
 - clients.h declares int clients_count(void)
 - rec->client is a pointer to the client, but we don't have an ordinal from that
   - keep an array of structs with the pointer, and locate the client number that way
 - robin-score = pos * n_clients + client_num

relevance_new_rec is called every time a new record pops up. One or more to count_word,
exactly one to done_rec. That's where I can compare to the ranking of the previous
record. struct_relevance is one structure I have for myself, global (for the user
session), so I can keep my stuff in there, possibly an array of things for each target.

I should also add stuff directly to the client, and to the record, as I need.

Next: Plot the tf/idf scores against round-robin sorted order. Will be messy,
but later when we get a target that returns sorted records, it will make sense.


Wed 27-Nov
Setting up multiple SOLR targets in the same pazpar2
 - Add #999 to the z-urls, so pazpar2 won't merge them. Different number for each

This URL shows the databases, with their numbers
http://lui.indexdata.com/solr/select?q=database:*&facet=true&facet.method=fc&facet.field=author_exact&facet.field=subject_exact&facet.field=date&facet.field=medium_exact&facet.field=database&rows=0&facet.mincount=1

Add this to the target defs
<set name="pz:extra_args" value="fq=database:4902">

After this, it should be possible to get records from different databases, some
with many records, some with a few. This is a good testing ground for merging
rankings! Test first with a round-robin, and plot the scores.

Thu 28-Nov
Ok, I can now merge a number of SOLR databases (harvest jobs), and plot their rankings
as solr gives them, in the order of different merge strategies
Next: Add the normalizing merge strategy. Then plot different strategies against different queries
Write a conclusion, and consider this plotting job done