diff --git a/hal-classifier/examples.http b/hal-classifier/examples.http index 25910e5..fd5e60b 100644 --- a/hal-classifier/examples.http +++ b/hal-classifier/examples.http @@ -1,7 +1,12 @@ # These examples can be used directly in VSCode, using REST Client extension (humao.rest-client) -# Classification dans les domaines de niveau de la base HAL -POST https://hal-classifier.services.inist.fr/v1/en/classhalen?indent=true HTTP/1.1 +# @baseUrl=http://localhost:31976 +@baseUrl=https://hal-classifier.services.istex.fr + +### +# @name v1EnClasshalen +# @description Classification dans les domaines de niveau de la base HAL +POST {{baseUrl}}/v1/en/classhalen?indent=true HTTP/1.1 Content-Type: application/json [ diff --git a/hal-classifier/tests.hurl b/hal-classifier/tests.hurl new file mode 100644 index 0000000..bc5fd46 --- /dev/null +++ b/hal-classifier/tests.hurl @@ -0,0 +1,30 @@ +POST https://hal-classifier.services.istex.fr/v1/en/classhalen?indent=true +content-type: application/json +[ +{ +"id":1, +"value":"In the southern French Massif Central, the Montagne Noire axial zone is a NE-SW elongated granite-migmatite dome emplaced within Visean south-verging recumbent folds and intruded by syn- to late-migmatization granitoids. The tectonic setting of this dome is still disputed, thus several models have been proposed. In order to better understand the emplacement mechanism of this dome, petrofabric and Anisotropy of Magnetic Susceptibility (AMS) studies have been carried out. In the granites and migmatites that form the dome core, magmatic texture and to a lesser extent weak solid-state texture are dominant. As a paramagnetic mineral, biotite is the main carrier of the magnetic susceptibility. On the basis of 135 AMS sites, the magnetic fabrics appear as independent of the lithology but related to the dome architecture. Coupling our results with previous structural and geochronological studies, allows us to propose a new emplacement model. Between 340-325 Ma, the Palaeozoic series underwent a compressional deformation represented by nappes and recumbent folds involving the thermal event leading to partial melting. Until ~325-310 Ma, the dome emplacement was assisted by diapiric processes. An extensional event took place at 300 Ma, after the emplacement of the late to post-migmatitic granitic plutons. In the northeast side of the dome, a brittle normal-dextral faulting controlled the opening of the Graissessac coal-basin." +}, +{"id":2, +"value":"The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, China. The World Health Organization declared the outbreak a Public Health Emergency of International Concern on 20 January 2020, and later a pandemic on 11 March 2020. As of 2 April 2021, more than 129 million cases have been confirmed, with more than 2.82 million deaths attributed to COVID-19, making it one of the deadliest pandemics in history." +} +] + + +HTTP 200 +[{ + "id": 1, + "value": { + "code": "sdu", + "labelFr": "Planète et Univers [physics]", + "labelEn": "Sciences of the Universe [physics]" + } +}, +{ + "id": 2, + "value": { + "code": "sdv", + "labelFr": "Sciences du Vivant [q-bio]", + "labelEn": "Life Sciences [q-bio]" + } +}] diff --git a/hal-classifier/v1/en/classhalen.ini b/hal-classifier/v1/en/classhalen.ini index ef15201..9f111b7 100644 --- a/hal-classifier/v1/en/classhalen.ini +++ b/hal-classifier/v1/en/classhalen.ini @@ -17,6 +17,20 @@ post.parameters.1.schema.type = boolean post.parameters.1.description = Indent or not the JSON Result +# Examples +post.requestBody.content.application/json.example.0.id = 1 +post.requestBody.content.application/json.example.0.value = In the southern French Massif Central, the Montagne Noire axial zone is a NE-SW elongated granite-migmatite dome emplaced within Visean south-verging recumbent folds and intruded by syn- to late-migmatization granitoids. The tectonic setting of this dome is still disputed, thus several models have been proposed. In order to better understand the emplacement mechanism of this dome, petrofabric and Anisotropy of Magnetic Susceptibility (AMS) studies have been carried out. In the granites and migmatites that form the dome core, magmatic texture and to a lesser extent weak solid-state texture are dominant. As a paramagnetic mineral, biotite is the main carrier of the magnetic susceptibility. On the basis of 135 AMS sites, the magnetic fabrics appear as independent of the lithology but related to the dome architecture. Coupling our results with previous structural and geochronological studies, allows us to propose a new emplacement model. Between 340-325 Ma, the Palaeozoic series underwent a compressional deformation represented by nappes and recumbent folds involving the thermal event leading to partial melting. Until ~325-310 Ma, the dome emplacement was assisted by diapiric processes. An extensional event took place at 300 Ma, after the emplacement of the late to post-migmatitic granitic plutons. In the northeast side of the dome, a brittle normal-dextral faulting controlled the opening of the Graissessac coal-basin. +post.requestBody.content.application/json.example.1.id = 2 +post.requestBody.content.application/json.example.1.value = The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, China. The World Health Organization declared the outbreak a Public Health Emergency of International Concern on 20 January 2020, and later a pandemic on 11 March 2020. As of 2 April 2021, more than 129 million cases have been confirmed, with more than 2.82 million deaths attributed to COVID-19, making it one of the deadliest pandemics in history. +post.responses.default.content.application/json.example.0.id = 1 +post.responses.default.content.application/json.example.0.value.code = sdu +post.responses.default.content.application/json.example.0.value.labelFr = Planète et Univers [physics] +post.responses.default.content.application/json.example.0.value.labelEn = Sciences of the Universe [physics] +post.responses.default.content.application/json.example.1.id = 2 +post.responses.default.content.application/json.example.1.value.code = sdv +post.responses.default.content.application/json.example.1.value.labelFr = Sciences du Vivant [q-bio] +post.responses.default.content.application/json.example.1.value.labelEn = Life Sciences [q-bio] + [use] plugin = @ezs/spawn plugin = @ezs/basics diff --git a/irc3-species/examples.http b/irc3-species/examples.http new file mode 100644 index 0000000..f2b18e6 --- /dev/null +++ b/irc3-species/examples.http @@ -0,0 +1,29 @@ +# These examples can be used directly in VSCode, using REST Client extension (humao.rest-client) + +# @baseUrl=http://localhost:31976 +@baseUrl=https://irc3-species.services.istex.fr + +### +# @name v1Irc3sp +# @description Recherche de noms d'espèces +POST {{baseUrl}}/v1/irc3sp?indent=true HTTP/1.1 +Content-Type: application/json + +[{ + "id": 1, + "value": "Trophic diversity accumulation curves of (a) Pseudopercis semifasciata, (b) Acanthistius patachonicus and (c) Pinguipes brasilianus. Horizontal lines show Brillouin diversity index (Hz) values (Hz± 0·05 Hz) and the vertical line shows a value n- 2 (n = number of stomachs)." +},{ + "id": 2, + "value": "Phasianus colchicus/versicolor: in our study, the best match." +},{ + "id": 3, + "value": "short lower jaw in Etheostoma bellator Suttkus" +}, { + "id": 4, + "value": [ + "Carnivore diet analysis based on next‐generation sequencing: application to the leopard cat (Prionailurus bengalensis) in Pakistan ", + "The leopard cat (Prionailurus bengalensis) is a small felid (weight 1.7–7.1 kg; Sunquist & Sunquist 2009), with a wide range in Asia (8.66 × 106 km2; Nowell & Jackson 1996). ", + "Muridae (mainly Rattus spp. and Mus spp.) seem to represent the main prey items throughout the leopard cat distribution range, supplemented by a wide variety of other prey including small mammals such as shrews and ground squirrels, birds, reptiles, frogs and fish (Tatara & Doi 1994; Grassman et al. 2005; Austin et al. 2007; Rajaratnam et al. 2007; Watanabe 2009; Fernandez & de Guia 2011). ", + "More recently, Deagle et al. (2009, 2010) investigated the diet of Australian fur seals (Arctocephalus pusillus) and penguins (Eudyptula minor) by combining a blocking oligonucleotide approach with 454 GS‐FLX pyrosequencing technologies. " + ] +}] diff --git a/irc3-species/tests.hurl b/irc3-species/tests.hurl new file mode 100644 index 0000000..bca0ff3 --- /dev/null +++ b/irc3-species/tests.hurl @@ -0,0 +1,36 @@ +POST https://irc3-species.services.istex.fr/v1/irc3sp?indent=true +content-type: application/json +[{ + "id": 1, + "value": "Trophic diversity accumulation curves of (a) Pseudopercis semifasciata, (b) Acanthistius patachonicus and (c) Pinguipes brasilianus. Horizontal lines show Brillouin diversity index (Hz) values (Hz± 0·05 Hz) and the vertical line shows a value n- 2 (n = number of stomachs)." +},{ + "id": 2, + "value": "Phasianus colchicus/versicolor: in our study, the best match." +},{ + "id": 3, + "value": "short lower jaw in Etheostoma bellator Suttkus" +}, { + "id": 4, + "value": [ + "Carnivore diet analysis based on next‐generation sequencing: application to the leopard cat (Prionailurus bengalensis) in Pakistan ", + "The leopard cat (Prionailurus bengalensis) is a small felid (weight 1.7–7.1 kg; Sunquist & Sunquist 2009), with a wide range in Asia (8.66 × 106 km2; Nowell & Jackson 1996). ", + "Muridae (mainly Rattus spp. and Mus spp.) seem to represent the main prey items throughout the leopard cat distribution range, supplemented by a wide variety of other prey including small mammals such as shrews and ground squirrels, birds, reptiles, frogs and fish (Tatara & Doi 1994; Grassman et al. 2005; Austin et al. 2007; Rajaratnam et al. 2007; Watanabe 2009; Fernandez & de Guia 2011). ", + "More recently, Deagle et al. (2009, 2010) investigated the diet of Australian fur seals (Arctocephalus pusillus) and penguins (Eudyptula minor) by combining a blocking oligonucleotide approach with 454 GS‐FLX pyrosequencing technologies. " + ] +}] + + +HTTP 200 +[ { + "id": 1, + "value": [ "Acanthistius patachonicus", "Pinguipes brasilianus", "Pseudopercis semifasciata" ] +}, { + "id": 2, + "value": [ "Phasianus colchicus" ] +}, { + "id": 3, + "value": [ "Etheostoma bellator" ] +}, { + "id": 4, + "value": [ "Arctocephalus pusillus", "Eudyptula minor", "Prionailurus bengalensis" ] +}] diff --git a/kos2vec/README.md b/kos2vec/README.md index d52eb11..020f620 100755 --- a/kos2vec/README.md +++ b/kos2vec/README.md @@ -1,74 +1,74 @@ - - # kos2vec ## Application d’indexation sémantique sur une ressource termino-ontologique (RTO) ------------- -**Identification de concepts sur la mémoire basé sur une ontology et utilisant un modèle de langue** + +Identification de concepts sur la mémoire basé sur une ontology et utilisant un modèle de langue. ## Principe de fonctionnement - -![text](image.jpg) - - -Le système prend en entrée les métadonnées associées à un article (titre, résumé) et renvoie une sélection de concepts tirés du thesaurus mémoire. -Il se compose de 3 modules principaux : - -- le module syntaxique analyse les documents d'entrée et identifie les concepts qui sont explicitement mentionnés dans le document. - -- le Module semantique extrait des candidats termes (cunking) et calcule la similarité de ceux-ci avec les nœuds de l'ontologie en tirant parti de l'intégration des mots dans un modèle Embedding. Il sélection des termes RTO directement présent ou proche voisin dans le modèle. +![schéma de principe](image.jpg) -![text](image2.jpg) - +Le système prend en entrée les métadonnées associées à un article (titre, +résumé) et renvoie une sélection de concepts tirés du thesaurus mémoire. +Il se compose de 3 modules principaux : -- le module de post-traitement combine les résultats de ces deux modules, élimine les valeurs aberrantes et les améliore en incluant les "super-concepts pertinents" (broader). +1. le module syntaxique analyse les documents d'entrée et identifie les concepts + qui sont explicitement mentionnés dans le document. +2. le Module sémantique extrait des candidats termes (*cunking*) et calcule la + similarité de ceux-ci avec les nœuds de l'ontologie en tirant parti de + l'intégration des mots dans un modèle *Embedding*. Il sélectionne des termes + RTO directement présents ou proches voisins dans le modèle. + ![text](image2.jpg) +3. le module de post-traitement combine les résultats de ces deux modules, + élimine les valeurs aberrantes et les améliore en incluant les + "super-concepts pertinents" (broader). +L'approche exploite une RTO de loterre et des plongements lexicaux calculés sur +un corpus du domaine. -L'approche exploite une RTO de loterre et des plongements lexicaux calculés sur un corpus du domaine. - - * Le modele de langue est de type **Word2Vec** et il construit sur un corpus Istex de 587.721 résumés **annotés par les termes de la RTO et les ngrams les plus fréquents** (collocation lexicale). - * L'Ontology mémoire provient du site Inist **Loterre** : https://skosmos.loterre.fr/P66/fr/ - +- Le modèle de langue est de type **Word2Vec** et il est construit sur un corpus + Istex de 587.721 résumés **annotés par les termes de la RTO et les ngrams les + plus fréquents** (collocation lexicale). +- L'ontologie mémoire provient du site Inist **Loterre** : ## Utilisation ### Sollicitation du WebService - [/v1/{code_vocab}/index?indent=True](/v1/en/index?indent=True) -| nom de la ressource|Code_vocab|Sur loterre| -|--- |:-: |:-: | -| memoire Psychologie | P66 | https://skosmos.loterre.fr/P66/en/ | -| MeSH |JVR|https://skosmos.loterre.fr/JVR/en/| -| education | 216 |https://skosmos.loterre.fr/216/en/| -| sociologie | 3JP |https://skosmos.loterre.fr/3JP/en/| -| philosophie | 73G |https://skosmos.loterre.fr/73G/en/| -| litterature | P21 |https://skosmos.loterre.fr/P21/en/| -| SAGEThesaurus | SAG || - - -* Prend en entrée un flux **json** au format **id/value** : -``` +| nom de la ressource | Code_vocab | Sur loterre | +| ------------------- | :--------: | :----------------------------------: | +| memoire Psychologie | P66 | | +| MeSH | JVR | | +| education | 216 | | +| sociologie | 3JP | | +| philosophie | 73G | | +| litterature | P21 | | +| SAGEThesaurus | SAG | | + +Prend en entrée un flux **json** au format **id/value** : + +```json [ {"idt":"11-0278198","value":"reduction fear child comparison positive information imagery control condition study... effect ... "}, {"idt":"07-0413881","value":"avoidance hemodilution selective cerebral perfusion neurobehavioral outcome ... "} ] ``` -* Produit en sortie un **flux json** contenant les résultats d'une indexation sur le thesaurus mémoire : - - * **"idt"** : identifiant fourni en entrée - * **"syntactic"** : résultat de l'indexation syntaxique - * **"semantic"** : résultat de l'indexation semantique - * **"union"** : union des deux indexations - * **"enhancement"** : trace textuelle des indexations - * **"explanation"** : les concepts broader de tous les concepts trouvés +Produit en sortie un **flux json** contenant les résultats d'une indexation sur le thesaurus mémoire : + +- **"idt"** : identifiant fourni en entrée +- **"syntactic"** : résultat de l'indexation syntaxique +- **"semantic"** : résultat de l'indexation semantique +- **"union"** : union des deux indexations +- **"enhancement"** : trace textuelle des indexations +- **"explanation"** : les concepts broader de tous les concepts trouvés #### Exemple -``` + +```bash cat <