root / xap / tags / 0.1.5 / doc / indexer.txt

Revision 226:7bbf27312bd7, 4.6 kB (checked in by Lafaye Philippe (RAGE2000) <lafaye@…>, 10 months ago)

Add a new version

Line 
1=======
2indexer
3=======
4
5The indexer provides:
6
7- client-side modules : API for client to ask for indexations and query the
8  Xapian database. When an indexation is asked, it is stored in a sql
9  database;
10
11- server-side application: a standalone thread that indexes what has been
12  asked by reading the sql database.
13
14
15Let's import the modules used by the client-side::
16
17    >>> import indexer
18    >>> import searcher
19
20Let's reset the SQL DB first::
21
22    >>> indexer.reset()
23
24Let's also reset the Xapian DB::
25
26    >>> from xapindexer import force_reset
27    >>> force_reset()
28
29The Xapian DB should be empty now::
30
31    >>> searcher.corpus_size()
32    0
33
34Indexation
35==========
36
37Each indexable content has a unique id, and a text to index::
38
39    >>> uid = '1'
40    >>> text = 'my taylor is not rich anymore'
41
42Let's index it::
43
44    >>> indexer.index_document(uid, text)
45
46Another one::
47
48    >>> indexer.index_document('2', 'pluto is a dog')
49
50Let's start the worker that is in charge of asynchronous indexation::
51
52    >>> from xapindexer import start_server
53    >>> start_server()
54
55Let's wait a bit so the worker has the time to read the SQL Database
56and do the work::
57
58    >>> import time
59    >>> while indexer.is_working():
60    ...     time.sleep(0.2)
61
62`is_working` looks in the SQL DB if there is some work left.
63
64The Xapian DB has two documents now::
65
66    >>> searcher.corpus_size()
67    2
68
69You could also use some fonctionnality of the librairy for pre-process the
70text before the indexing and use the stemming alghorithm.
71You just need to pass the iso language of the text into attributes::
72
73    >>> uid = '3'
74    >>> text = "Stemming is the process for reducing inflected (or sometimes"\
75    ... " derived) words to their stem, base or root form."
76    >>> indexer.index_document(uid, text, language='en')
77
78We can also try with french sentence, with some accents::
79    >>> uid = '4'
80    >>> text = "La lexémisation d'un mot est la fonction qui associe un"\
81    ... " lexÚme à celui-ci."
82    >>> indexer.index_document(uid, text, language='fr')
83
84Let's wait a bit so the worker has the time to read the SQL Database
85and do the work::
86
87    >>> import time
88    >>> while indexer.is_working():
89    ...     time.sleep(0.2)
90
91Searching
92=========
93
94Let's search now, with `searcher`. Operator is AND by default::
95
96    >>> res = searcher.search('rich')
97    >>> list(res)
98    ['1']
99    >>> res = searcher.search('pluto')
100    >>> list(res)
101    ['2']
102    >>> res = searcher.search('dog')
103    >>> list(res)
104    ['2']
105    >>> res = searcher.search('rich dog')
106    >>> list(res)
107    []
108
109Or operator::
110
111    >>> res = searcher.search('rich dog', or_=True)
112    >>> res = list(res)
113    >>> res.sort()
114    >>> res
115    ['1', '2']
116
117Like the indexer, you could use the stemming fonction for search a word.
118For exemple if you try to search the word `reducer` it will be refere to the
119stem `reduc` like the word `reducing` in the exemple n°3::
120
121    >>> res = searcher.search('reducer', language='en')
122    >>> list(res)
123    ['3']
124
125In french::
126    >>> res = searcher.search('lexemiser', language='fr')
127    >>> list(res)
128    ['4']
129
130We have an API to detect if a document is present::
131
132    >>> searcher.document_exists('2')
133    True
134    >>> searcher.document_exists('ttt')
135    False
136
137And another one to retrieve indexed terms::
138
139    >>> list(searcher.document_terms('2'))
140    ['dog', 'is', 'pluto']
141
142Reindexation
143============
144
145The document can also be reindexed::
146
147    >>> indexer.index_document('2', 'pluto is a cat')
148    >>> indexer.work_in_process()
149    ([u'2'], [])
150
151Let's wait a bit::
152
153    >>> while indexer.is_working():
154    ...     time.sleep(0.2)
155
156Let's make sure the document has been reindexed::
157
158    >>> list(searcher.document_terms('2'))
159    ['cat', 'is', 'pluto']
160
161Then check the indexation has changed::
162
163    >>> res = searcher.search('rich dog', or_=True)
164    >>> list(res)
165    ['1']
166
167Or deleted::
168
169    >>> res = searcher.search('pluto')
170    >>> list(res)
171    ['2']
172    >>> indexer.delete_document('2')
173    >>> while indexer.is_working():
174    ...     time.sleep(0.2)
175    >>> res = searcher.search('pluto')
176    >>> list(res)
177    []
178
179statistics
180==========
181
182We can also do a bit of statistics::
183
184    >>> #import stats
185    >>> #stats.query_count('pluto') > 1
186    True
187
188And provide search suggestions, let's do a few search::
189
190    >>> searcher.search('platon')
191    <generator object at 0x...>
192    >>> searcher.search('plisser')
193    <generator object at 0x...>
194
195    >>> #list(stats.query_suggestions('pl'))
196    [(u'pluto', ...), (u'platon', ...), (u'plisser', ...)]
197
198This is useful for example, to provide an ajaxified search box,
199were we display suggestions as the user types...
Note: See TracBrowser for help on using the browser.