[doc] Update doc
authorVincent Michel <vincent.michel@logilab.fr>
Tue, 15 Oct 2013 12:15:19 +0000
changeset 319 40955d08e971
parent 318 23b903af099d
child 320 c00139801d7d
[doc] Update doc
TODO
doc.rst
examples/__init__.py
examples/goncourt.csv
--- a/doc.rst	Tue Oct 15 11:30:53 2013 +0000
+++ b/doc.rst	Tue Oct 15 12:15:19 2013 +0000
@@ -1,45 +1,65 @@
-==================
-Alignment project
-==================
+
+===========================================================
+ NAZCA - Python library for practical semantics datamining
+===========================================================
+
+Nazca is a Python library to help you enhance and mine your data,
+with a strong focus on semantics information.
+
+In particular, it helps you:
+
+ * interact with SPARQL endpoints and reference databases.
 
-What is it for ?
-================
+ * align your data (`record linkage`), i.e. link data from
+   your database to data in other databases.
+
 
-This python library aims to help you to *align data*. For instance, you have a
-list of cities, described by their name and their country and you would like to
+Record linkage
+==============
+
+Record linkage (or alignment) is the task that constists in linking together data from two different
+sets, based on some distances between attributes.
+
+For instance, you have a list of cities, described by their name
+and their country and you would like to
 find their URI on dbpedia to have more information about them, as the longitude and
 the latitude for example. If you have two or three cities, it can be done with
 bare hands, but it could not if there are hundreds or thousands cities.
 This library provides you all the stuff we need to do it.
 
 
+
 Introduction
-============
+~~~~~~~~~~~~
 
-The alignment process is divided into three main steps:
+The record linkage process is divided into three main steps:
 
 1. Gather and format the data we want to align.
-   In this step, we define two sets called the `alignset` and the
-   `targetset`. The `alignset` contains our data, and the
+   In this step, we define two sets called the `referenceset` and the
+   `targetset`. The `referenceset` contains our data, and the
    `targetset` contains the data on which we would like to make the links.
+
 2. Compute the similarity between the items gathered.  We compute a distance
    matrix between the two sets according to a given distance.
+
 3. Find the items having a high similarity thanks to the distance matrix.
 
+
+
 Simple case
------------
+~~~~~~~~~~~
 
-1. Let's define `alignset` and `targetset` as simple python lists.
+1. Let's define `referenceset` and `targetset` as simple python lists.
 
 .. sourcecode:: python
 
-    alignset = ['Victor Hugo', 'Albert Camus']
+    referenceset = ['Victor Hugo', 'Albert Camus']
     targetset = ['Albert Camus', 'Guillaume Apollinaire', 'Victor Hugo']
 
 2. Now, we have to compute the similarity between each items. For that purpose, the
    `Levenshtein distance <http://en.wikipedia.org/wiki/Levenshtein_distance>`_
    [#]_, which is well accurate to compute the distance between few words, is used.
-   Such a function is provided in the `nazca.distance` module.
+   Such a function is provided in the `nazca.distances` module.
 
    The next step is to compute the distance matrix according to the Levenshtein
    distance. The result is given in the following tables.
@@ -61,6 +81,7 @@
 3. The alignment process is ended by reading the matrix and saying items having a
    value inferior to a given threshold are identical.
 
+
 A more complex one
 ------------------
 
@@ -72,13 +93,13 @@
 
 .. sourcecode:: python
 
-    alignset = [['Paul Dupont', '14-08-1991', 'Paris'],
-                ['Jacques Dupuis', '06-01-1999', 'Bressuire'],
-                ['Michel Edouard', '18-04-1881', 'Nantes']]
-    targetset = [['Dupond Paul', '14/08/1991', 'Paris'],
-                 ['Edouard Michel', '18/04/1881', 'Nantes'],
-                 ['Dupuis Jacques ', '06/01/1999', 'Bressuire'],
-                 ['Dupont Paul', '01-12-2012', 'Paris']]
+    >>> referenceset = [['Paul Dupont', '14-08-1991', 'Paris'],
+		['Jacques Dupuis', '06-01-1999', 'Bressuire'],
+		['Michel Edouard', '18-04-1881', 'Nantes']]
+    >>> targetset = [['Dupond Paul', '14/08/1991', 'Paris'],
+		['Edouard Michel', '18/04/1881', 'Nantes'],
+                ['Dupuis Jacques ', '06/01/1999', 'Bressuire'],
+                ['Dupont Paul', '01-12-2012', 'Paris']]
 
 
 In such a case, two distance functions are used, the Levenshtein one for the
@@ -92,8 +113,9 @@
 
 .. sourcecode:: python
 
-    >>> nazca.matrix.cdist([a[0] for a in alignset], [t[0] for t in targetset],
-    >>>                    'levenshtein', matrix_normalized=False)
+    >>> from nazca.distances import levenshtein, cdist
+    >>> cdist(levenshtein,[a[0] for a in referenceset],
+    >>>       [t[0] for t in targetset], matrix_normalized=False)
     array([[ 1.,  6.,  5.,  0.],
            [ 5.,  6.,  0.,  5.],
            [ 6.,  0.,  6.,  6.]], dtype=float32)
@@ -110,8 +132,9 @@
 
 .. sourcecode:: python
 
-    >>> nazca.matrix.cdist([a[1] for a in alignset], [t[1] for t in targetset],
-    >>>                    'temporal', matrix_normalized=False)
+    >>> from nazca.distances import temporal
+    >>> cdist(temporal, [a[1] for a in referenceset], [t[1] for t in targetset],
+    >>>       matrix_normalized=False)
     array([[     0.,  40294.,   2702.,   7780.],
            [  2702.,  42996.,      0.,   5078.],
            [ 40294.,      0.,  42996.,  48074.]], dtype=float32)
@@ -128,8 +151,8 @@
 
 .. sourcecode:: python
 
-    >>> nazca.matrix.cdist([a[2] for a in alignset], [t[2] for t in targetset],
-    >>>                    'levenshtein', matrix_normalized=False)
+    >>> cdist(levenshtein, [a[2] for a in referenceset], [t[2] for t in targetset],
+    >>>       matrix_normalized=False)
     array([[ 0.,  4.,  8.,  0.],
            [ 8.,  9.,  0.,  8.],
            [ 4.,  0.,  9.,  4.]], dtype=float32)
@@ -160,72 +183,70 @@
 
 Allowing some misspelling mistakes (for example *Dupont* and *Dupond* are very
 close), the matching threshold can be set to 1 or 2. Thus we can see that the
-item 0 in our `alignset` is the same that the item 0 in the `targetset`, the
-1 in the `alignset` and the 2 of the `targetset` too : the links can be
+item 0 in our `referenceset` is the same that the item 0 in the `targetset`, the
+1 in the `referenceset` and the 2 of the `targetset` too : the links can be
 done !
 
-It's important to notice that even if the item 0 of the `alignset` and the 3
+It's important to notice that even if the item 0 of the `referenceset` and the 3
 of the `targetset` have the same name and the same birthplace they are
 unlikely identical because of their very different birth date.
 
 
 You may have noticed that working with matrices as I did for the example is a
-little bit boring. The good news is that this project makes all this job for you. You
+little bit boring. The good news is that this module makes all this job for you. You
 just have to give the sets and distance functions and that's all. An other good
-news is the project comes with the needed functions to build the sets !
+news is this module comes with the needed functions to build the sets !
 
 
 Real applications
-=================
-
-Just before we start, we will assume the following imports have been done:
+~~~~~~~~~~~~~~~~~
 
-.. sourcecode:: python
-
-    from nazca import dataio as aldio #Functions for input and output data
-    from nazca import distances as ald #Functions to compute the distances
-    from nazca import normalize as aln#Functions to normalize data
-    from nazca import aligner as ala  #Functions to align data
 
 The Goncourt prize
 ------------------
 
 On wikipedia, we can find the `Goncourt prize winners
-<https://fr.wikipedia.org/wiki/Prix_Goncourt#Liste_des_laur.C3.A9ats>`_, and we
+<http://fr.wikipedia.org/wiki/Prix_Goncourt#Liste_des_laur.C3.A9ats>`_, and we
 would like to establish a link between the winners and their URI on dbpedia
 [#]_.
 
 .. [#] Let's imagine the *Goncourt prize winners* category does not exist in
        dbpedia
 
-We simply copy/paste the winners list of wikipedia into a file and replace all
-the separators (`-` and `,`) by `#`. So, the beginning of our file is :
+We simply copy/paste the winners list of wikipedia into a file and cleanup
+it a bit. So, the beginning of our file is :
 
 ..
 
-    | 1903#John-Antoine Nau#Force ennemie (Plume)
-    | 1904#Léon Frapié#La Maternelle (Albin Michel)
-    | 1905#Claude Farrère#Les Civilisés (Paul Ollendorff)
-    | 1906#Jérôme et Jean Tharaud#Dingley, l'illustre écrivain (Cahiers de la Quinzaine)
+    | 1903	John-Antoine Nau
+    | 1904	Léon Frapié
+    | 1905	Claude Farrère
+
 
 When using the high-level functions of this library, each item must have at
 least two elements: an *identifier* (the name, or the URI) and the *attribute* to
-compare. With the previous file, we will use the name (so the column number 1)
+compare.
+For now, the *identifier*
+
+ With the previous file, we will use the name (so the column number 1)
 as *identifier* (we don't have an *URI* here as identifier) and *attribute* to align.
 This is told to python thanks to the following code:
 
 .. sourcecode:: python
 
-    alignset = aldio.parsefile('prixgoncourt', indexes=[1, 1], delimiter='#')
+   >>> import os.path as osp
+   >>> from nazca import examples
+   >>> filename = osp.join(osp.split(examples.__file__)[0], 'goncourt.csv')
+   >>> referenceset = parsefile(filename, delimiter='\t')
 
-So, the beginning of our `alignset` is:
+So, the beginning of our `referenceset` is:
 
 .. sourcecode:: python
 
-    >>> alignset[:3]
-    [[u'John-Antoine Nau', u'John-Antoine Nau'],
-     [u'Léon Frapié', u'Léon, Frapié'],
-     [u'Claude Farrère', u'Claude Farrère']]
+    >>> referenceset[:3]
+    [[1903, u'John-Antoine Nau'],
+    [1904, u'L\xe9on Frapi\xe9'],
+    [1905, u'Claude Farr\xe8re']]
 
 
 Now, let's build the `targetset` thanks to a *sparql query* and the dbpedia
@@ -233,70 +254,70 @@
 
 .. sourcecode:: python
 
-   query = """
-        SELECT ?writer, ?name WHERE {
-          ?writer  <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:French_novelists>.
-          ?writer rdfs:label ?name.
-          FILTER(lang(?name) = 'fr')
-       }
-    """
-    targetset = aldio.sparqlquery('http://dbpedia.org/sparql', query)
+   >>> from nazca.dataio import sparqlquery
+   >>> query = """SELECT ?writer, ?name WHERE {
+   ?writer  <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:French_novelists>.
+   ?writer rdfs:label ?name.
+   FILTER(lang(?name) = 'fr')
+   } """
+   >>> targetset = sparqlquery('http://dbpedia.org/sparql', query, autocaste_data=False)
 
-Both functions return nested lists as presented before. Now, we have to define
-the distance function to be used for the alignment. This is done thanks to a
-python dictionary where the keys are the columns to work on, and the values are
-the treatments to apply.
+Both functions return nested lists as presented before.
+
+
+Now, we have to define the distance function to be used for the alignment.
+This is done thanks to a the `BaseProcessing` class::
 
 .. sourcecode:: python
 
-    treatments = {1: {'metric': ald.levenshtein}} #Use a levenshtein on the name
+   >>> from nazca.distances import BaseProcessing, levenshtein
+   >>> processing = BaseProcessing(ref_attr_index=1, target_attr_index=1, distance_callback=levenshtein)
+
+or (equivalent way)::
 
-Finally, the last thing we have to do, is to call the :func:`alignall` function:
+.. sourcecode:: python
+
+   >>> from nazca.distances import LevenshteinProcessing
+   >>> processing = LevenshteinProcessing(ref_attr_index=1, target_attr_index=1)
+
+Now, we create an aligner (using the `BaseAligner` class):
 
 .. sourcecode:: python
 
-    alignments = ala.alignall(alignset, targetset,
-                           0.4, #This is the matching threshold
-                           treatments,
-                           mode=None,#We'll discuss about that later
-                           uniq=True #Get the best results only
-                          )
+   >>> from nazca.aligner import BaseAligner
+   >>> aligner = BaseAligner(threshold=0, processings=(processing,))
 
-This function returns an iterator over the (different) carried out alignments.
+
+To limit the number of comparisons we may add a blocking technic:
 
 .. sourcecode:: python
 
-    for a, t in alignments:
-        print '%s has been aligned onto %s' % (a, t)
+   >>> from nazca.blocking import SortedNeighborhoodBlocking
+   >>> aligner.register_blocking(SortedNeighborhoodBlocking(1, 1, window_width=4))
+
+
+
+We have the aligned pairs using the `get_aligned_pairs` method of the `BaseAligner`:
 
-It may be important to apply some pre-treatment on the data to align. For
+.. sourcecode:: python
+
+   >>> for (r, ri), (t, ti), d in aligner.get_aligned_pairs(referenceset, targetset):
+   >>>    print 'Alignment of %s to %s (distance %s)' % (referenceset[ri], targetset[ti], d)
+
+
+
+It may be important to apply some pre-processing on the data to align. For
 instance, names can be written with lower or upper characters, with extra
 characters as punctuation or unwanted information in parenthesis and so on. That
 is why we provide some functions to ``normalize`` your data. The most useful may
-be the :func:`simplify` function (see the docstring for more information). So the
-treatments list can be given as follow:
-
+be the :func:`simplify` function (see the docstring for more information).
 
 .. sourcecode:: python
 
-    def remove_after(string, sub):
-        """ Remove the text after `sub` in `string`
-            >>> remove_after('I like cats and dogs', 'and')
-            'I like cats'
-            >>> remove_after('I like cats and dogs', '(')
-            'I like cats and dogs'
-        """
-        try:
-            return string[:string.lower().index(sub)].strip()
-        except ValueError:
-            return string
 
+   >>> from nazca.normalize import SimplifyNormalizer
+   >>> aligner.register_ref_normalizer(SimplifyNormalizer(attr_index=1))
 
-    treatments = {1: {'normalization': [lambda x:remove_after(x, '('),
-                                        aln.simply],
-                      'metric': ald.levenshtein
-                     }
-                 }
 
 
 Cities alignment
@@ -317,36 +338,38 @@
 
 .. sourcecode:: python
 
-    targetset = aldio.rqlquery('http://demo.cubicweb.org/geonames',
-                               'Any U, N, LONG, LAT WHERE X is Location, X name'
-                               ' N, X country C, C name "France", X longitude'
-                               ' LONG, X latitude LAT, X population > 1000, X'
-                               ' feature_class "P", X cwuri U',
-                               indexes=[0, 1, (2, 3)])
-    alignset = aldio.sparqlquery('http://dbpedia.inria.fr/sparql',
-                                 'prefix db-owl: <http://dbpedia.org/ontology/>'
-                                 'prefix db-prop: <http://fr.dbpedia.org/property/>'
-                                 'select ?ville, ?name, ?long, ?lat where {'
-                                 ' ?ville db-owl:country <http://fr.dbpedia.org/resource/France> .'
-                                 ' ?ville rdf:type db-owl:PopulatedPlace .'
-                                 ' ?ville db-owl:populationTotal ?population .'
-                                 ' ?ville foaf:name ?name .'
-                                 ' ?ville db-prop:longitude ?long .'
-                                 ' ?ville db-prop:latitude ?lat .'
-                                 ' FILTER (?population > 1000)'
-                                 '}',
-                                 indexes=[0, 1, (2, 3)])
+   >>> from nazca.dataio import sparqlquery, rqlquery
+   >>> referenceset = sparqlquery('http://dbpedia.inria.fr/sparql',
+		'prefix db-owl: <http://dbpedia.org/ontology/>'
+		'prefix db-prop: <http://fr.dbpedia.org/property/>'
+                'select ?ville, ?name, ?long, ?lat where {'
+                ' ?ville db-owl:country <http://fr.dbpedia.org/resource/France> .'
+                ' ?ville rdf:type db-owl:PopulatedPlace .'
+                ' ?ville db-owl:populationTotal ?population .'
+                ' ?ville foaf:name ?name .'
+                ' ?ville db-prop:longitude ?long .'
+                ' ?ville db-prop:latitude ?lat .'
+                ' FILTER (?population > 1000)'
+                '}',
+                indexes=[0, 1, (2, 3)])
+   >>> targetset = rqlquery('http://demo.cubicweb.org/geonames',
+		'Any U, N, LONG, LAT WHERE X is Location, X name'
+		' N, X country C, C name "France", X longitude'
+		' LONG, X latitude LAT, X population > 1000, X'
+		' feature_class "P", X cwuri U',
+		indexes=[0, 1, (2, 3)])
 
+   >>> from nazca.distances import BaseProcessing, levenshtein
+   >>> processing = BaseProcessing(ref_attr_index=1, target_attr_index=1, distance_callback=levenshtein)
 
-    treatments = {1: {'normalization': [aln.simply],
-                      'metric': ald.levenshtein,
-                      'matrix_normalized': False
-                     }
-                 }
-    results = ala.alignall(alignset, targetset, 3, treatments=treatments, #As before
-                           indexes=(2, 2), #On which data build the kdtree
-                           mode='kdtree',  #The mode to use
-                           uniq=True) #Return only the best results
+   >>> from nazca.aligner import BaseAligner
+   >>> aligner = BaseAligner(threshold=0, processings=(processing,))
+
+   >>> from nazca.blocking import KdTreeBlocking
+   >>> aligner.register_blocking(KdTreeBlocking(2, 2))
+
+   >>> results = list(aligner.get_aligned_pairs(referenceset, targetset, unique=True))
+
 
 
 Let's explain the code. We have two files, containing a list of cities we want
@@ -359,108 +382,29 @@
 used to reduce the potential candidates without loosing any more refined
 possible matches.
 
-So, in the next step, we define the treatments to apply.
-It is the same as before, but we ask for a non-normalized matrix
-(i.e.: the real output of the levenshtein distance).
-Thus, we call the :func:`alignall` function. `indexes` is a tuple saying the
-position of the point on which the kdtree_ must be built, `mode` is the mode
-used to find neighbours [#]_.
+So, in the next step, we define the processings to apply, and we add a specific
+ kdtree_ blocking.
 
-Finally, `uniq` ask to the function to return the best
+Finally, `uniqe` ask to the function to return the best
 candidate (i.e.: the one having the shortest distance below the given threshold)
 
-.. [#] The available modes are `kdtree`, `kmeans` and `minibatch` for
-       numerical data and `minhashing` for text one.
-
 The function output a generator yielding tuples where the first element is the
-identifier of the `alignset` item and the second is the `targetset` one (It
+identifier of the `referenceset` item and the second is the `targetset` one (It
 may take some time before yielding the first tuples, because all the computation
 must be done…)
 
 .. _kdtree: http://en.wikipedia.org/wiki/K-d_tree
 
-The :func:`alignall_iterative` and `cache` usage
-==========================================
-
-Even when using methods such as `kdtree` or `minhashing` or `clustering`,
-the alignment process might be long. That’s why we provide you a function,
-called :func:`alignall_iterative` which works directly with your files. The idea
-behind this function is simple, it splits your files (the `alignfile` and the
-`targetfile`) into smallers ones and tries to align each item of each subsets.
-When processing, if an alignment is estimated almost perfect
-then the item aligned is _removed_ from the `alignset` to faster the process − so
-Nazca doesn’t retry to align it.
-
-Moreover, this function uses a cache system. When a alignment is done, it is
-stored into the cache and if in the future a *better* alignment is found, the
-cached is updated. At the end, you get only the better alignment found.
-
-.. sourcecode:: python
-
-    from difflib import SequenceMatcher
-
-    from nazca import normalize as aln#Functions to normalize data
-    from nazca import aligner as ala  #Functions to align data
-
-    def approxMatch(x, y):
-        return 1.0 - SequenceMatcher(None, x, y).ratio()
-
-    alignformat = {'indexes': [0, 3, 2],
-                   'formatopt': {0: lambda x:x.decode('utf-8'),
-                                 1: lambda x:x.decode('utf-8'),
-                                 2: lambda x:x.decode('utf-8'),
-                                },
-                  }
-
-    targetformat = {'indexes': [0, 1, 3],
-                   'formatopt': {0: lambda x:x.decode('utf-8'),
-                                 1: lambda x:x.decode('utf-8'),
-                                 3: lambda x:x.decode('utf-8'),
-                                },
-                  }
-
-    tr_name = {'normalization': [aln.simplify],
-               'metric': approxMatch,
-               'matrix_normalized': False,
-              }
-    tr_info = {'normalization': [aln.simplify],
-               'metric': approxMatch,
-               'matrix_normalized': False,
-               'weighting': 0.3,
-              }
-
-    alignments = ala.alignall_iterative('align_csvfile', 'target_csvfile',
-                                        alignformat, targetformat, 0.20,
-                                        treatments={1:tr_name,
-                                                    2:tr_info,
-                                                   },
-                                        equality_threshold=0.05,
-                                        size=25000,
-                                        mode='minhashing',
-                                        indexes=(1,1),
-                                        neighbours_threshold=0.2,
-                                       )
-
-    with open('results.csv', 'w') as fobj:
-        for aligned, (targeted, distance) in alignments.iteritems():
-            fobj.write('%s\t%s\t%s\n' % (aligned, targeted, distance))
-
-Roughly, this function expects the same arguments than the previously shown
-:func:`alignall` function, excepting the `equality_threshold` and the `size`.
-
- - `size` is the number items to have in each subsets
- - `equality_threshold` is the threshold above which two items are said as
-   equal.
 
 `Try <http://demo.cubicweb.org/nazca/view?vid=nazca>`_ it online !
-==================================================================
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 We have also made a little application of Nazca, using `CubicWeb
 <http://www.cubicweb.org/>`_. This application provides a user interface for
 Nazca, helping you to choose what you want to align. You can use sparql or rql
 queries, as in the previous example, or import your own cvs file [#]_. Once you
 have choosen what you want to align, you can click the *Next step* button to
-customize the treatments you want to apply, just as you did before in python !
+customize the processings you want to apply, just as you did before in python !
 Once done, by clicking the *Next step*, you start the alignment process. Wait a
 little bit, and you can either download the results in a *csv* or *rdf* file, or
 directly see the results online choosing the *html* output.
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/examples/goncourt.csv	Tue Oct 15 12:15:19 2013 +0000
@@ -0,0 +1,110 @@
+1903	John-Antoine Nau
+1904	Léon Frapié
+1905	Claude Farrère
+1906	Jérôme et Jean Tharaud
+1907	Émile Moselly
+1908	Francis de Miomandre
+1909	Marius-Ary Leblond
+1910	Louis Pergaud
+1911	Alphonse de Châteaubriant
+1912	André Savignon
+1913	Marc Elder
+1914	Adrien Bertrand
+1915	René Benjamin
+1916	Henri Barbusse
+1917	Henry Malherbe
+1918	Georges Duhamel
+1919	Marcel Proust
+1920	Ernest Pérochon
+1921	René Maran
+1922	Henri Béraud
+1923	Lucien Fabre
+1924	Thierry Sandre
+1925	Maurice Genevoix
+1926	Henri Deberly
+1927	Maurice Bedel
+1928	Maurice Constantin-Weyer
+1929	Marcel Arland
+1930	Henri Fauconnier
+1931	Jean Fayard
+1932	Guy Mazeline
+1933	André Malraux
+1934	Roger Vercel
+1935	Joseph Peyré
+1936	Maxence Van der Meersch
+1937	Charles Plisnier
+1938	Henri Troyat
+1939	Philippe Hériat
+1940	Francis Ambrière
+1941	Henri Pourrat
+1942	Marc Bernard
+1943	Marius Grout
+1944	Elsa Triolet
+1945	Jean-Louis Bory
+1946	Jean-Jacques Gautier
+1947	Jean-Louis Curtis
+1948	Maurice Druon
+1949	Robert Merle
+1950	Paul Colin
+1951	Julien Gracq
+1952	Béatrix Beck
+1953	Pierre Gascar
+1954	Simone de Beauvoir
+1955	Roger Ikor
+1956	Romain Gary
+1957	Roger Vailland
+1958	Francis Walder
+1959	André Schwarz-Bart
+1960	Vintila Horia
+1961	Jean Cau
+1962	Anna Langfus
+1963	Armand Lanoux
+1964	Georges Conchon
+1965	Jacques Borel
+1966	Edmonde Charles-Roux
+1967	André Pieyre de Mandiargues
+1968	Bernard Clavel
+1969	Félicien Marceau
+1970	Michel Tournier
+1971	Jacques Laurent
+1972	Jean Carrière
+1973	Jacques Chessex
+1974	Pascal Lainé
+1975	Émile Ajar (Romain Gary)
+1976	Patrick Grainville
+1977	Didier Decoin
+1978	Patrick Modiano
+1979	Antonine Maillet
+1980	Yves Navarre
+1981	Lucien Bodard
+1982	Dominique Fernandez
+1983	Frédérick Tristan
+1984	Marguerite Duras
+1985	Yann Queffélec
+1986	Michel Host
+1987	Tahar Ben Jelloun
+1988	Erik Orsenna
+1989	Jean Vautrin
+1990	Jean Rouaud
+1991	Pierre Combescot
+1992	Patrick Chamoiseau
+1993	Amin Maalouf
+1994	Didier van Cauwelaert
+1995	Andreï Makine
+1996	Pascale Roze
+1997	Patrick Rambaud
+1998	Paule Constant
+1999	Jean Echenoz
+2000	Jean-Jacques Schuhl
+2001	Jean-Christophe Rufin
+2002	Pascal Quignard
+2003	Jacques-Pierre Amette
+2004	Laurent Gaudé
+2005	François Weyergans
+2006	Jonathan Littell
+2007	Gilles Leroy
+2008	Atiq Rahimi
+2009	Marie NDiaye
+2010	Michel Houellebecq
+2011	Alexis Jenni
+2012	Jérôme Ferrari