Mon, 19 Nov 2012 10:37:47 +0100 [normalize] In simplify() replace punctuation by a space
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 10:37:47 +0100] rev 144
[normalize] In simplify() replace punctuation by a space Punctuation used to be removed, now the punctuation is replaced by a space. It enables minhashing to split it and send "ivry seine" and "ivry sur seine" in the same bucket, whereas "ivrysurseine" and "ivsyseine" were not.
Thu, 15 Nov 2012 16:48:36 +0100 Add some XXX on Adrien's comments
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 16:48:36 +0100] rev 143
Add some XXX on Adrien's comments
Thu, 15 Nov 2012 14:38:15 +0100 [minhashing] Add a verbose mode
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 14:38:15 +0100] rev 142
[minhashing] Add a verbose mode
Thu, 15 Nov 2012 14:17:58 +0100 [aligner] Remove a useless import
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 14:17:58 +0100] rev 141
[aligner] Remove a useless import
Thu, 15 Nov 2012 14:17:39 +0100 Remove a useless file
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 14:17:39 +0100] rev 140
Remove a useless file
Thu, 15 Nov 2012 14:42:05 +0100 Respect (or try to respect) pep8
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 14:42:05 +0100] rev 139
Respect (or try to respect) pep8
Thu, 15 Nov 2012 09:38:52 +0100 [minhashing] consume less memory
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 09:38:52 +0100] rev 138
[minhashing] consume less memory Storing the whole document matrix was useless because it was a boolean and sparse matrix. So the only stored element are equal to one, and therefore it's useless to store the data list : only the position are interesting. During the signature step, the read lines are useless because they aren't use anymore, so they are deleted to save memory.
Wed, 14 Nov 2012 17:45:49 +0100 [test] Remove a useless line
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 17:45:49 +0100] rev 137
[test] Remove a useless line
Wed, 14 Nov 2012 12:17:02 +0100 [minhashing] Compute the union step by step
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 12:17:02 +0100] rev 136
[minhashing] Compute the union step by step
Wed, 14 Nov 2012 11:23:10 +0100 [minhashing] Buckets are row-dependant
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 11:23:10 +0100] rev 135
[minhashing] Buckets are row-dependant “We can use the same hash function for all the bands, but we use a separate bucket array for each band, so columns with the same vector in different bands will not hash to the same bucket.”
Wed, 14 Nov 2012 11:23:49 +0100 [dataio] Add a forgotten encoding attribut
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 11:23:49 +0100] rev 134
[dataio] Add a forgotten encoding attribut
Wed, 14 Nov 2012 10:50:10 +0100 [test] write the tests for the `alignall()` function
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 10:50:10 +0100] rev 133
[test] write the tests for the `alignall()` function
Wed, 14 Nov 2012 10:30:09 +0100 [aligner] Align the code on 80 car
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 10:30:09 +0100] rev 132
[aligner] Align the code on 80 car
Wed, 14 Nov 2012 10:50:31 +0100 [aligner] Make the alignall() function
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 10:50:31 +0100] rev 131
[aligner] Make the alignall() function
Wed, 14 Nov 2012 10:27:57 +0100 [aligner] Let the user decide on wheter return the global alignement matrix or not
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 10:27:57 +0100] rev 130
[aligner] Let the user decide on wheter return the global alignement matrix or not
Wed, 14 Nov 2012 09:40:30 +0100 [minhashing] Export the demo of minhashing to the new api
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 09:40:30 +0100] rev 129
[minhashing] Export the demo of minhashing to the new api
Tue, 13 Nov 2012 16:46:16 +0100 [aligner] Remove useless arguments from findneighbours_clustering()
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:46:16 +0100] rev 128
[aligner] Remove useless arguments from findneighbours_clustering()
Mon, 26 Nov 2012 10:13:34 +0100 [aligner] clustering: Don't crash if sets are small.
Simon Chabot <simon.chabot@logilab.fr> [Mon, 26 Nov 2012 10:13:34 +0100] rev 127
[aligner] clustering: Don't crash if sets are small.
Tue, 13 Nov 2012 16:42:57 +0100 [aligner,dataio] Export the results writing to a independant function
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:42:57 +0100] rev 126
[aligner,dataio] Export the results writing to a independant function
Tue, 13 Nov 2012 16:41:31 +0100 [test] Write the test for test_findneighbours_clustering
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:41:31 +0100] rev 125
[test] Write the test for test_findneighbours_clustering
Tue, 13 Nov 2012 16:04:17 +0100 [dataio] Export some function to the dataio.py file
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:04:17 +0100] rev 124
[dataio] Export some function to the dataio.py file
Tue, 13 Nov 2012 15:39:13 +0100 [test] add more tests
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:39:13 +0100] rev 123
[test] add more tests
Tue, 13 Nov 2012 16:04:38 +0100 [align] Update API
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:04:38 +0100] rev 122
[align] Update API *** [demo] update demo to the new API
Tue, 13 Nov 2012 15:38:52 +0100 [matrix] Make API closer to scipy.spatial and add metrics handling
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:38:52 +0100] rev 121
[matrix] Make API closer to scipy.spatial and add metrics handling
Tue, 13 Nov 2012 15:38:19 +0100 Cosmit
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:38:19 +0100] rev 120
Cosmit
Tue, 13 Nov 2012 15:37:59 +0100 [minhashing] Modify API + change threshold handling
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:37:59 +0100] rev 119
[minhashing] Modify API + change threshold handling
Tue, 13 Nov 2012 15:37:25 +0100 [normalize] Better tokenizer for unicode + stopwords
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:37:25 +0100] rev 118
[normalize] Better tokenizer for unicode + stopwords
Tue, 13 Nov 2012 15:36:30 +0100 Few corrections and add tests
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:36:30 +0100] rev 117
Few corrections and add tests
Tue, 13 Nov 2012 10:46:31 +0100 [minhashing] Compute complexite on a huge file
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 10:46:31 +0100] rev 116
[minhashing] Compute complexite on a huge file
Mon, 12 Nov 2012 19:14:52 +0100 [minhashing] plot complexity
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 19:14:52 +0100] rev 115
[minhashing] plot complexity
Mon, 12 Nov 2012 18:30:00 +0100 [matrix] Let's the distance matrix API looks like scipy's one
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 18:30:00 +0100] rev 114
[matrix] Let's the distance matrix API looks like scipy's one
Mon, 12 Nov 2012 16:45:44 +0100 [minhashing] Don't copy uselessly the signature matrix
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 16:45:44 +0100] rev 113
[minhashing] Don't copy uselessly the signature matrix
Mon, 12 Nov 2012 17:22:18 +0100 [minhashing] Faster signaturing using numpy
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 17:22:18 +0100] rev 112
[minhashing] Faster signaturing using numpy
Mon, 12 Nov 2012 16:46:46 +0100 [minhashing] rewrite main for testing purposes
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 16:46:46 +0100] rev 111
[minhashing] rewrite main for testing purposes
Mon, 12 Nov 2012 11:10:24 +0100 [minhashing] Typo
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 11:10:24 +0100] rev 110
[minhashing] Typo
Fri, 09 Nov 2012 14:27:22 +0100 Try to optimize buckets computation in min hashing
Vincent Michel <vincent.michel@logilab.fr> [Fri, 09 Nov 2012 14:27:22 +0100] rev 109
Try to optimize buckets computation in min hashing
Fri, 09 Nov 2012 13:26:12 +0100 [refactoring] First round of refactoring/review
Vincent Michel <vincent.michel@logilab.fr> [Fri, 09 Nov 2012 13:26:12 +0100] rev 108
[refactoring] First round of refactoring/review
Mon, 12 Nov 2012 09:34:36 +0100 Add french_lemmas file
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 09:34:36 +0100] rev 107
Add french_lemmas file
Fri, 09 Nov 2012 11:15:41 +0100 [aligner] add findneighbours docstring
Simon Chabot <simon.chabot@logilab.fr> [Fri, 09 Nov 2012 11:15:41 +0100] rev 106
[aligner] add findneighbours docstring
Fri, 09 Nov 2012 11:08:24 +0100 Add and delete some XXX
Simon Chabot <simon.chabot@logilab.fr> [Fri, 09 Nov 2012 11:08:24 +0100] rev 105
Add and delete some XXX
Fri, 09 Nov 2012 10:03:02 +0100 [aligner] Don't append and pop. Check before.
Simon Chabot <simon.chabot@logilab.fr> [Fri, 09 Nov 2012 10:03:02 +0100] rev 104
[aligner] Don't append and pop. Check before.
Fri, 09 Nov 2012 10:02:25 +0100 [aligner] Enable the user to give the signature matrix for minhashing
Simon Chabot <simon.chabot@logilab.fr> [Fri, 09 Nov 2012 10:02:25 +0100] rev 103
[aligner] Enable the user to give the signature matrix for minhashing
Thu, 08 Nov 2012 16:36:50 +0100 [demo] minibatch is faster
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 16:36:50 +0100] rev 102
[demo] minibatch is faster
Thu, 08 Nov 2012 16:36:04 +0100 [aligner] Change the default value of k.
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 16:36:04 +0100] rev 101
[aligner] Change the default value of k. k=1 is a more current used value
Thu, 08 Nov 2012 16:30:16 +0100 [Todo] Add a TODO file
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 16:30:16 +0100] rev 100
[Todo] Add a TODO file
Thu, 08 Nov 2012 15:53:27 +0100 [aligner] Handle any dimension for clustering and kdtree
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 15:53:27 +0100] rev 99
[aligner] Handle any dimension for clustering and kdtree
Thu, 08 Nov 2012 15:38:05 +0100 [aligner] Continue to use 1xM matrices for KDTree
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 15:38:05 +0100] rev 98
[aligner] Continue to use 1xM matrices for KDTree
Thu, 08 Nov 2012 14:44:08 +0100 [demo] Display how much time the run took
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 14:44:08 +0100] rev 97
[demo] Display how much time the run took
Thu, 08 Nov 2012 14:43:32 +0100 [aligner] Instead of returning 1xN matrices, return MxN ones
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 14:43:32 +0100] rev 96
[aligner] Instead of returning 1xN matrices, return MxN ones
Thu, 08 Nov 2012 13:25:20 +0100 [demo] Use kmeans instead of kdtree (for testing)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 13:25:20 +0100] rev 95
[demo] Use kmeans instead of kdtree (for testing)
Thu, 08 Nov 2012 13:24:41 +0100 [aligner] Use lazy import for minhashing and kdtree
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 13:24:41 +0100] rev 94
[aligner] Use lazy import for minhashing and kdtree
Thu, 08 Nov 2012 13:24:13 +0100 [aligner] Add kmeans to the available searchers list
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 13:24:13 +0100] rev 93
[aligner] Add kmeans to the available searchers list
Thu, 08 Nov 2012 11:41:01 +0100 [demo] For demo2, don't read the whole file
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 11:41:01 +0100] rev 92
[demo] For demo2, don't read the whole file
Mon, 12 Nov 2012 18:31:44 +0100 [aligner] Define autocasted as a global function
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 18:31:44 +0100] rev 91
[aligner] Define autocasted as a global function
Thu, 08 Nov 2012 10:07:56 +0100 [demo] For testing purpose, run only the given demo
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 10:07:56 +0100] rev 90
[demo] For testing purpose, run only the given demo
Thu, 08 Nov 2012 10:07:21 +0100 [demo] Use the sparql queries handling instead of csvvfile
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 10:07:21 +0100] rev 89
[demo] Use the sparql queries handling instead of csvvfile
Thu, 08 Nov 2012 10:06:22 +0100 [aligner] Handle sparql queries
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 10:06:22 +0100] rev 88
[aligner] Handle sparql queries
Thu, 08 Nov 2012 10:08:57 +0100 [aligner] Makes parsefile works with unicode
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 10:08:57 +0100] rev 87
[aligner] Makes parsefile works with unicode
Wed, 07 Nov 2012 17:46:11 +0100 Let's make the tests and demo paths indepedant
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 17:46:11 +0100] rev 86
Let's make the tests and demo paths indepedant
Wed, 07 Nov 2012 17:24:42 +0100 [demo] Use the new implemantation of findneigbours()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 17:24:42 +0100] rev 85
[demo] Use the new implemantation of findneigbours()
Wed, 07 Nov 2012 17:09:20 +0100 [aligner] higher level implementation of KDTree and Minhashing
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 17:09:20 +0100] rev 84
[aligner] higher level implementation of KDTree and Minhashing
Wed, 07 Nov 2012 15:58:46 +0100 [demo] Add a new demo (with a kdtree)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 15:58:46 +0100] rev 83
[demo] Add a new demo (with a kdtree)
Wed, 07 Nov 2012 12:18:00 +0100 [demo] add exemple on custom normalization function usage
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 12:18:00 +0100] rev 82
[demo] add exemple on custom normalization function usage
Wed, 07 Nov 2012 11:43:26 +0100 [demo] Add a new demo of alignment usage
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 11:43:26 +0100] rev 81
[demo] Add a new demo of alignment usage
Wed, 07 Nov 2012 10:47:41 +0100 [normalize] Let's nltk be optional
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:47:41 +0100] rev 80
[normalize] Let's nltk be optional
Wed, 07 Nov 2012 10:39:13 +0100 [minhashing] Spelling mistake
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:39:13 +0100] rev 79
[minhashing] Spelling mistake
Wed, 07 Nov 2012 10:29:40 +0100 [matrix] spelling mistakes
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:29:40 +0100] rev 78
[matrix] spelling mistakes
Wed, 07 Nov 2012 10:21:46 +0100 [distances] spelling mistakes
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:21:46 +0100] rev 77
[distances] spelling mistakes
Wed, 07 Nov 2012 10:08:44 +0100 [aligner] Spelling mistakes
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:08:44 +0100] rev 76
[aligner] Spelling mistakes
Wed, 07 Nov 2012 10:06:17 +0100 [aligner] Remove useless dependancies
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:06:17 +0100] rev 75
[aligner] Remove useless dependancies
Tue, 06 Nov 2012 18:03:14 +0100 [demo] Add comments
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 18:03:14 +0100] rev 74
[demo] Add comments
Tue, 06 Nov 2012 16:27:12 +0100 [aligner] Set the parsefile function into aligner module
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 16:27:12 +0100] rev 73
[aligner] Set the parsefile function into aligner module
Tue, 06 Nov 2012 14:05:31 +0100 Add the demo file
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 14:05:31 +0100] rev 72
Add the demo file
Tue, 06 Nov 2012 17:37:36 +0100 Extract the alignment process from the cube to be independant
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 17:37:36 +0100] rev 71
Extract the alignment process from the cube to be independant *** amends 7d398efa1ab38937d1a6aae63ec13fcbfcad1d3f
Tue, 06 Nov 2012 10:51:43 +0100 [matrix] Cancel the 188 changset. Multiplying matrices was, in fact, a bad idea
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:51:43 +0100] rev 70
[matrix] Cancel the 188 changset. Multiplying matrices was, in fact, a bad idea It was a bad idea because if two values were identical, the distance was 0, so the product, whereas all others values could be different
Tue, 06 Nov 2012 10:49:10 +0100 Correct some spelling
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:49:10 +0100] rev 69
Correct some spelling
Tue, 06 Nov 2012 10:46:30 +0100 [minlsh] Use an iterator to compute the result set
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:46:30 +0100] rev 68
[minlsh] Use an iterator to compute the result set Don't load all the results at once…
Tue, 06 Nov 2012 10:45:10 +0100 [minlsh] Remove the useless rows in the signature matrix while searching
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:45:10 +0100] rev 67
[minlsh] Remove the useless rows in the signature matrix while searching It's done to save memory…
Tue, 06 Nov 2012 10:48:06 +0100 [distances] For the jaccard distance, consider the set of tokens
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:48:06 +0100] rev 66
[distances] For the jaccard distance, consider the set of tokens Considering the set of tokens instead of letters is much more accurate. Eg: before this changeset, the jaccard distance between “silence” and “license” was zero. *** [Test] Jaccard implementation changed, so the tests are changed too.
Tue, 30 Oct 2012 16:41:53 +0100 [Matrix] Remove the unused defaultvalue
Simon Chabot <simon.chabot@logilab.fr> [Tue, 30 Oct 2012 16:41:53 +0100] rev 65
[Matrix] Remove the unused defaultvalue
Tue, 30 Oct 2012 16:28:05 +0100 [Align] don't use queries by lists, directly
Simon Chabot <simon.chabot@logilab.fr> [Tue, 30 Oct 2012 16:28:05 +0100] rev 64
[Align] don't use queries by lists, directly
Tue, 30 Oct 2012 16:25:01 +0100 [Matrix] Multiplying instead of adding
Simon Chabot <simon.chabot@logilab.fr> [Tue, 30 Oct 2012 16:25:01 +0100] rev 63
[Matrix] Multiplying instead of adding The way to compute the global matrix was adding the submatrices. The unknown values used to be maximized to avoid false positive. But it was to much penalizing. Now, for unknown values, we set 1, then all matrices are multiplied. Thus, if a value in unknown, it's not penalizing whereas if the value is known, it's positive.
Tue, 30 Oct 2012 16:16:40 +0100 [Distance] The output unit of geographical distance can be precised
Simon Chabot <simon.chabot@logilab.fr> [Tue, 30 Oct 2012 16:16:40 +0100] rev 62
[Distance] The output unit of geographical distance can be precised
Mon, 29 Oct 2012 15:19:09 +0100 [API] Write results according to csv format
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 15:19:09 +0100] rev 61
[API] Write results according to csv format
Mon, 29 Oct 2012 15:07:22 +0100 [API] Start the implemantation of the alignment API
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 15:07:22 +0100] rev 60
[API] Start the implemantation of the alignment API
Mon, 29 Oct 2012 11:44:41 +0100 [Minhashing] Give a thresdhold instead of a abstract "bandsize"
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 11:44:41 +0100] rev 59
[Minhashing] Give a thresdhold instead of a abstract "bandsize" The thresdhold and the bandsize are related and one can be computed knowning the other. Let's the user give the more explicit one.
Mon, 29 Oct 2012 10:21:14 +0100 Order imports
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 10:21:14 +0100] rev 58
Order imports
Mon, 29 Oct 2012 10:17:24 +0100 [Minhashing] Trained data can be saved and loaded
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 10:17:24 +0100] rev 57
[Minhashing] Trained data can be saved and loaded
Mon, 29 Oct 2012 09:52:16 +0100 [distances] The planet radius for the geographical distance can be given
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 09:52:16 +0100] rev 56
[distances] The planet radius for the geographical distance can be given
Mon, 29 Oct 2012 09:46:26 +0100 [distances] Don't concat two strings to know if there is a space in one of them. Make two tests instead
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 09:46:26 +0100] rev 55
[distances] Don't concat two strings to know if there is a space in one of them. Make two tests instead
Fri, 26 Oct 2012 11:44:50 +0200 [typo] alignement --> alignment
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:44:50 +0200] rev 54
[typo] alignement --> alignment
Fri, 26 Oct 2012 11:43:57 +0200 [Matrix] Maximum distance computation delegated to max() method of numpy.matrix
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:43:57 +0200] rev 53
[Matrix] Maximum distance computation delegated to max() method of numpy.matrix
Fri, 26 Oct 2012 11:41:27 +0200 [Minhash] Tests written
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:41:27 +0200] rev 52
[Minhash] Tests written
Fri, 26 Oct 2012 09:49:07 +0200 [Matrix] Improvement of the matched() method
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 09:49:07 +0200] rev 51
[Matrix] Improvement of the matched() method Don't run over all indices but ask directly to scipy for the wanted ones.
Fri, 26 Oct 2012 09:54:16 +0200 [Matrix] Change lil_matrix for dense matrix
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 09:54:16 +0200] rev 50
[Matrix] Change lil_matrix for dense matrix Some experiments shown the matrix wasn't sparse at all. So it's better to use the appropriate data structure. *** [Matrix] Why use todense() method if the matrix is… already dense
Fri, 26 Oct 2012 13:43:08 +0200 [Distance] Add geographical distance
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 13:43:08 +0200] rev 49
[Distance] Add geographical distance (Equirectangular projection)
Thu, 25 Oct 2012 16:43:50 +0200 [testing] make the alignment testing a little bit more cleaner
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:43:50 +0200] rev 48
[testing] make the alignment testing a little bit more cleaner Now use rdflib instead of a (bad) homemade code
Thu, 25 Oct 2012 16:41:53 +0200 [Matrix] Adapt the globalalignmentmatrix computation to the previous changeset
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:41:53 +0200] rev 47
[Matrix] Adapt the globalalignmentmatrix computation to the previous changeset (4c2f7553490b)
Thu, 25 Oct 2012 16:40:42 +0200 [Matrix] Give weighting and normalization at the contruction of the matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:40:42 +0200] rev 46
[Matrix] Give weighting and normalization at the contruction of the matrix It makes the computation faster
Wed, 24 Oct 2012 19:10:54 +0200 wip
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 19:10:54 +0200] rev 45
wip
Wed, 24 Oct 2012 15:39:38 +0200 [Minhashing] : Really, really, really faster training
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 15:39:38 +0200] rev 44
[Minhashing] : Really, really, really faster training
Wed, 24 Oct 2012 14:51:09 +0200 [Matrix] Compute the global alignement matrix
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 14:51:09 +0200] rev 43
[Matrix] Compute the global alignement matrix
Wed, 24 Oct 2012 14:50:23 +0200 [Matrix] Can pass extra arguments to distance functions
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 14:50:23 +0200] rev 42
[Matrix] Can pass extra arguments to distance functions
Wed, 24 Oct 2012 12:37:56 +0200 [Matrix] Computation handles unknown values
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 12:37:56 +0200] rev 41
[Matrix] Computation handles unknown values
Wed, 24 Oct 2012 11:51:57 +0200 [Test] Updated to the changement of simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:51:57 +0200] rev 40
[Test] Updated to the changement of simplify()
Wed, 24 Oct 2012 11:43:11 +0200 [Normalize] Add docstring to simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:43:11 +0200] rev 39
[Normalize] Add docstring to simplify()
Wed, 24 Oct 2012 11:52:22 +0200 [normalize] Add a stopword removing option to simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:52:22 +0200] rev 38
[normalize] Add a stopword removing option to simplify()
Wed, 24 Oct 2012 11:29:08 +0200 [Minhashing] Api fonctionnelle
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:29:08 +0200] rev 37
[Minhashing] Api fonctionnelle
Mon, 22 Oct 2012 18:17:17 +0200 [minhashing] First try of minhashing (related to #129000)
Simon Chabot <simon.chabot@logilab.fr> [Mon, 22 Oct 2012 18:17:17 +0200] rev 36
[minhashing] First try of minhashing (related to #129000)
Fri, 19 Oct 2012 18:22:14 +0200 [Matrix] Add basic operations such as add, mul sub, etc
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 18:22:14 +0200] rev 35
[Matrix] Add basic operations such as add, mul sub, etc
Fri, 19 Oct 2012 17:27:40 +0200 [Matrix] Add __repr__ method()
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 17:27:40 +0200] rev 34
[Matrix] Add __repr__ method()
Fri, 19 Oct 2012 16:57:40 +0200 [Matrix] Don't store inputs inside DistanceMatrix object
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 16:57:40 +0200] rev 33
[Matrix] Don't store inputs inside DistanceMatrix object Storing them had not sense, was useless, and it was annoying for future summation of matrices.
Fri, 19 Oct 2012 14:44:26 +0200 [Distance] Spaces are correctly supported by distances functions
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 14:44:26 +0200] rev 32
[Distance] Spaces are correctly supported by distances functions In fact, the problem was 'Victor Hugo' and 'Hugo Victor' had a big distance when we'd like in this case to have a small one (even a zero one !) So, the approach followed was : Construct a distance matrix : | Victor | Hugo Victor | 0 | 5 Hugo | 5 | 0 And return the minimun of the minimun of each row. In fact, we return the maximun of the minimum of the previous matrix, and the its transpose, to handle the following case : | Victor | Hugo | Jean | Victor | 0 | 5 | 6 |--> min of each row : 0 Hugo | 5 | 0 | 4 | | Victor | Hugo | Victor | 0 | 5 | Hugo | 5 | 0 |--> min of each row : 4 Jean | 6 | 4 | Return the max, ie : 4.
Fri, 19 Oct 2012 11:38:54 +0200 [Matrix] Cannot use zip if it not a square matrix \!
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 11:38:54 +0200] rev 31
[Matrix] Cannot use zip if it not a square matrix \!
Fri, 19 Oct 2012 11:21:44 +0200 [Matrix] Matched() return index and value, as tuples
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 11:21:44 +0200] rev 30
[Matrix] Matched() return index and value, as tuples
Fri, 19 Oct 2012 10:46:14 +0200 [Matrix] Matched() has a lower complexity
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 10:46:14 +0200] rev 29
[Matrix] Matched() has a lower complexity Instead of reading the whole matrix and get indexes where the value is under the cutoff (O(N²)), the idea is : - Get all indexes where the value is not null (ie not exact matched) O(N) - Append all indexes that are not in the previous list (exact matched) O(N) - If cutoff > 0, for all indexes where not null, test if the O(N) çvalue < cutoff, and add or not
Fri, 19 Oct 2012 16:58:27 +0200 [Matrix] Matrices are not symetric. Correct it
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 16:58:27 +0200] rev 28
[Matrix] Matrices are not symetric. Correct it
Fri, 19 Oct 2012 10:06:50 +0200 [Test] Assure distances are symetric
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 10:06:50 +0200] rev 27
[Test] Assure distances are symetric
Thu, 18 Oct 2012 19:01:27 +0200 [Test] Add tests for matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 19:01:27 +0200] rev 26
[Test] Add tests for matrix
Thu, 18 Oct 2012 18:18:40 +0200 [Test] Tests don't inherite anymore from CWTestCase by directly from unittest2 (be independant)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 18:18:40 +0200] rev 25
[Test] Tests don't inherite anymore from CWTestCase by directly from unittest2 (be independant)
Thu, 18 Oct 2012 18:15:25 +0200 [Normalizer] Lematizer returns a string (better for comparison)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 18:15:25 +0200] rev 24
[Normalizer] Lematizer returns a string (better for comparison)
Thu, 18 Oct 2012 17:58:53 +0200 [Matrix] Enables normalization
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:58:53 +0200] rev 23
[Matrix] Enables normalization
Thu, 18 Oct 2012 17:58:28 +0200 [Test] Distance, test euclidean distance on strings too
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:58:28 +0200] rev 22
[Test] Distance, test euclidean distance on strings too
Thu, 18 Oct 2012 17:16:44 +0200 [normalize] Using a class was a bad idea, I removed it
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:16:44 +0200] rev 21
[normalize] Using a class was a bad idea, I removed it
Thu, 18 Oct 2012 16:35:18 +0200 [Matrix] Compute a distance matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 16:35:18 +0200] rev 20
[Matrix] Compute a distance matrix
Thu, 18 Oct 2012 15:01:47 +0200 [Distance] Exchange 1 and 0 in the soundex distance, because we try to minimize the distance
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 15:01:47 +0200] rev 19
[Distance] Exchange 1 and 0 in the soundex distance, because we try to minimize the distance
Thu, 18 Oct 2012 13:55:25 +0200 [Distance] Temporal distance supports ambiguity and fuzzyness
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 13:55:25 +0200] rev 18
[Distance] Temporal distance supports ambiguity and fuzzyness - You can precise if the day or the year is given in first (day/month/year or year/month/day or month/day/year format). By default, it assumes the current format is the french common used one, ie day/month/year - You can give fuzzy sentence and compare dates : temporal('Jean est né le 1er octobre 1958', 'Le 01-10-1958, Jean est né') yields 0 !
Thu, 18 Oct 2012 12:22:30 +0200 [Normalizer] Add the format method (related to #128998)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 12:22:30 +0200] rev 17
[Normalizer] Add the format method (related to #128998)
Thu, 18 Oct 2012 11:41:59 +0200 Add LGPL to distance.py and normalize.py
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 11:41:59 +0200] rev 16
Add LGPL to distance.py and normalize.py
Thu, 18 Oct 2012 10:19:15 +0200 [Normalizer] Add a normaliser (related to #128998)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 10:19:15 +0200] rev 15
[Normalizer] Add a normaliser (related to #128998) - unormalize - tokenize - lemmatize - round
Wed, 17 Oct 2012 16:47:07 +0200 [distances] Add an euclidean distance function between two numbers (closes #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:47:07 +0200] rev 14
[distances] Add an euclidean distance function between two numbers (closes #128982)
Wed, 17 Oct 2012 15:31:11 +0200 [distance] Add a new distance between dates (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 15:31:11 +0200] rev 13
[distance] Add a new distance between dates (related #128982)
Wed, 17 Oct 2012 12:34:28 +0200 [distances] Add jaccard distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:34:28 +0200] rev 12
[distances] Add jaccard distance (related #128982)
Wed, 17 Oct 2012 12:05:02 +0200 [distances] Add the soundex distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:05:02 +0200] rev 11
[distances] Add the soundex distance (related #128982)
Wed, 17 Oct 2012 12:04:41 +0200 [distance] move soundex to soundexcode (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:04:41 +0200] rev 10
[distance] move soundex to soundexcode (related #128982) In fact, soundexcode is the function returning the soundex code of a word, and soundex will be the 1/0 distance between two words. (1 meaning both have the same code, 0 otherwise)
Wed, 17 Oct 2012 11:56:22 +0200 [distances] Remove some trailing spaces (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:56:22 +0200] rev 9
[distances] Remove some trailing spaces (related #128982)
Wed, 17 Oct 2012 11:56:04 +0200 [distance] Start the iteration at 1, not 0 because we don't care about the first
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:56:04 +0200] rev 8
[distance] Start the iteration at 1, not 0 because we don't care about the first letter (related #128982)
Wed, 17 Oct 2012 11:55:23 +0200 [distances] Correct an IndexError in the soundex code (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:55:23 +0200] rev 7
[distances] Correct an IndexError in the soundex code (related #128982) As far as we don't know if word[i + 2] is consonant, we use get because it can be a vowel (and crash…)
Wed, 17 Oct 2012 11:53:17 +0200 [distance] Soudex : if there is a vowel between two identical numbered
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:53:17 +0200] rev 6
[distance] Soudex : if there is a vowel between two identical numbered consonants, count those consonants twice. (related #128982)
Wed, 17 Oct 2012 11:51:42 +0200 [test] Add some other tests to soudex, and some explanations too
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:51:42 +0200] rev 5
[test] Add some other tests to soudex, and some explanations too (related #128982)
Wed, 17 Oct 2012 11:50:55 +0200 [test] The test for soundex was false, I corrected it (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:50:55 +0200] rev 4
[test] The test for soundex was false, I corrected it (related #128982)
Wed, 17 Oct 2012 16:43:42 +0200 [tests] Add tests for soundex and levenshtein (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:43:42 +0200] rev 3
[tests] Add tests for soundex and levenshtein (related #128982)
Wed, 17 Oct 2012 16:42:48 +0200 [distances] Add soundex code (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:42:48 +0200] rev 2
[distances] Add soundex code (related #128982)
Wed, 17 Oct 2012 16:40:51 +0200 [distance] add Levenshtein distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:40:51 +0200] rev 1
[distance] add Levenshtein distance (related #128982)
Fri, 12 Oct 2012 10:23:58 +0200 Initial commit
Simon Chabot <simon.chabot@logilab.fr> [Fri, 12 Oct 2012 10:23:58 +0200] rev 0
Initial commit
(0) +200 tip