| 1 | ======= |
|---|
| 2 | neartag |
|---|
| 3 | ======= |
|---|
| 4 | |
|---|
| 5 | This module implements the k-nearest neighbor algorithme (k-NN) that allows |
|---|
| 6 | to compute the distance between elements, given a set of value. Each value |
|---|
| 7 | is a dimension and the set are the coordinates of the element in the multi |
|---|
| 8 | dimensional space. |
|---|
| 9 | |
|---|
| 10 | For tags, the idea is to find neighbours of a given user, depending on the |
|---|
| 11 | tags she uses. The `NearestByTag` class is instanciated with those tags:: |
|---|
| 12 | |
|---|
| 13 | >>> from neartag import NearestByTag |
|---|
| 14 | >>> tags = ["django", "python", "zen", "fun", "scary"] |
|---|
| 15 | >>> solver = NearestByTag(tags) |
|---|
| 16 | |
|---|
| 17 | Then each user is added with her name and tag values (boolean value):: |
|---|
| 18 | |
|---|
| 19 | >>> user_1 = 'user 1', ["django", "python"] |
|---|
| 20 | >>> user_2 = 'user 2', ["zen", "fun", "scary"] |
|---|
| 21 | >>> user_3 = 'user 3', ["django"] |
|---|
| 22 | >>> user_4 = 'user 4', ["django", "python"] |
|---|
| 23 | >>> for user, tags in (user_1, user_2, user_3, user_4): |
|---|
| 24 | ... solver.add_user(user, tags) |
|---|
| 25 | |
|---|
| 26 | The class then will give a sorted list of neighbours of a given user:: |
|---|
| 27 | |
|---|
| 28 | >>> solver.neighbours('user 1') |
|---|
| 29 | [(0.16..., 'user 4'), (0.3..., 'user 3'), (1.0, 'user 2')] |
|---|
| 30 | >>> solver.neighbours('user 2') |
|---|
| 31 | [(0.83..., 'user 3'), (1.0, 'user 1'), (1.0, 'user 4')] |
|---|
| 32 | >>> solver.neighbours('user 3') |
|---|
| 33 | [(0.33..., 'user 1'), (0.33..., 'user 4'), (0.83..., 'user 2')] |
|---|
| 34 | >>> solver.neighbours('user 4') |
|---|
| 35 | [(0.16..., 'user 1'), (0.33..., 'user 3'), (1.0, 'user 2')] |
|---|
| 36 | |
|---|
| 37 | The smallest the returned value is, the closest the user is. |
|---|
| 38 | |
|---|
| 39 | `neighbours` will return at most 10 neighbours, but this size can be changed:: |
|---|
| 40 | |
|---|
| 41 | >>> solver.neighbours('user 1', 1) |
|---|
| 42 | [(0.16..., 'user 4')] |
|---|
| 43 | |
|---|
| 44 | This class works in-memory, since the loaded values are small enough to fit. |
|---|
| 45 | |
|---|
| 46 | How to use it with an application |
|---|
| 47 | ================================= |
|---|
| 48 | |
|---|
| 49 | Tags changes all the time in an application. The best use is to instanciate |
|---|
| 50 | the class over data retrieved from a database and to compute the distances, |
|---|
| 51 | then to save them within a dedicated table. Since the computation can take |
|---|
| 52 | time, a thread worker can update those distances from time to time in the |
|---|
| 53 | background. |
|---|
| 54 | |
|---|
| 55 | |
|---|