Étude pour l'INSU, à partir de ~14000 DOI présents dans le WOS. À rendre pour mi-octobre 2022.
Voir https://wos-dumps.conditor.inist.fr/ et les .ini
qui y sont.
Voir https://gitbucket.inist.fr/tdm/web-services/blob/master/biblio-tools/v1/wos/works/expand.ini.
Voir https://gitbucket.inist.fr/tdm/web-dumps.
Les DOI sont dans un .bib
(BibTeX).
Décider sur quelle machine mettre ce dump.
WOS_API_KEY
dans mon mail CNRS.
graph TD A[(corpus_WoS_vol1.json)] --> B[[extract-fields.ini]] B --> C[(corpus-simple.json)] C --> D[[enrich-rnsr.ini]] D --> E[(corpus-simple-rnsr.json)] E --> F[[enrich-etab.ini]] F --> G[(corpus-simple-etab.json)] G --> H[[enrich-institutes.ini]] H --> I[(corpus-simple-instituts.json)] I --> J[[enrich-teeft.ini]] J --> K[(corpus-simple-teeft-en.json)] K --> L[[enrich-pascal.ini]] L --> M[(corpus-simple-pascal.json)]
Utiliser lodex-crontab@1.4+, avec cette configuration:
{ "environnement": { "CRON_VERBOSE": true, "EZS_VERBOSE": false, "DEBUG": "ezs" }, "packages": [ "@ezs/core@2.1.0", "@ezs/basics@1.22.3", "@ezs/analytics@2.0.2" ], "files" : { "zip": "https://gitbucket.inist.fr/parmentf/giec-wos/archive/master.zip" }, "tasks": [ { "CronRule": "0 1 * * *", "Target": "data/corpus-simple-cnrs.json", "RunOnStartup": true } ] }
WoS_PaysENG_Correction_Correspondance
) pour la cartographie dans LODEXLes items 25,26, 27, 29 ,30, 31, 32, 34, 35, 39, 41 ont un tableau dans le champ abstract
, ce qui pose problème au web service. Il faut donc s'arranger pour n'envoyer qu'une chaîne.
L'enrichissement RNSR n'a rien retourné pour ces 61 UT:
$ fx < data/corpus-simple-rnsr.json '.map(n => ({uri: n.uri, rnsr: n.ws.rnsr?.[0]}))' '.map(n => ({ ok: Array.isArray(n.rnsr), ...n}))' '.filter(n => !n.ok)' '.map(n => n.uri).join("\n")' | sort -u WOS:000087687000034 WOS:000201991500005 WOS:000202541600015 WOS:000259592400005 WOS:000278783300013 WOS:000297080200009 WOS:000303246100021 WOS:000307031000011 WOS:000326102600010 WOS:000328962700024 WOS:000330731300004 WOS:000336983900012 WOS:000351469300002 WOS:000358138100001 WOS:000368910100002 WOS:000369014100030 WOS:000371481700005 WOS:000375767000046 WOS:000376443100012 WOS:000382134800021 WOS:000403527800002 WOS:000415909900003 WOS:000416761600042 WOS:000417409700018 WOS:000419033400048 WOS:000425514300054 WOS:000432597400029 WOS:000433901600021 WOS:000437255400003 WOS:000437362800003 WOS:000440301400006 WOS:000446187900022 WOS:000449897200002 WOS:000451164300002 WOS:000451785200002 WOS:000462592200001 WOS:000493421500001 WOS:000515035400001 WOS:000578319700001 WOS:000598066100006 WOS:000616977100020 WOS:000620058900026 WOS:A1945XY75400007 WOS:A1949UA83200002 WOS:A1954YH20800003 WOS:A1955WU68300004 WOS:A1955ZQ16500003 WOS:A1957XF83300002 WOS:A1960XF81800012 WOS:A19614744A00031 WOS:A1961WY83500001 WOS:A1961XG98500003 WOS:A19656660700020 WOS:A19656993100009 WOS:A19679381400001 WOS:A1969E372700020 WOS:A1969E761200001 WOS:A1969F336900001 WOS:A1970H335700006 WOS:A1971I822800004 WOS:A1971J729400017
Premier cas: pas d'adresse d'affiliation (35 occurrences)
{ "uri": "WOS:000087687000034", "title": "A North Atlantic climate pacemaker for the centuries", "abstract": "", "publication_year": 2000, "source": "SCIENCE", "affiliations": [], "countries": [], "subjects": [ "Multidisciplinary Sciences", "Science & Technology - Other Topics" ], "subheadings": [ null ], "headings": [ "Science & Technology" ] }
Deuxième cas: tout semble normal
{ "uri": "WOS:000259592400005", "title": "On avoiding dangerous anthropogenic interference with the climate system: Formidable challenges ahead", "abstract": "The ... degrees C.", "publication_year": 2008, "source": "PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA", "affiliations": [ "Univ Calif San Diego, Scripps Inst Oceanog, La Jolla, CA 92093 USA", "Univ Calif San Diego, Scripps Inst Oceanog, 9500 Gilman Dr, La Jolla, CA 92093 USA" ], "countries": [ "USA" ], "keywords": [ "EMISSIONS", "TRENDS", "CHINA" ], "subjects": [ "Multidisciplinary Sciences", "Science & Technology - Other Topics" ], "subheadings": [ null ], "headings": [ "Science & Technology" ] }, { "uri": "WOS:000278783300013", "title": "Compensation between Model Feedbacks and Curtailment of Climate Sensitivity", "abstract": "The ... of the intermodel spread.", "publication_year": 2010, "source": "JOURNAL OF CLIMATE", "affiliations": [ "Harvard Univ, Cambridge, MA 02138 USA", "Harvard Univ, 20 Oxford St, Cambridge, MA 02138 USA" ], "countries": [ "USA" ], "keywords": [ "GENERAL-CIRCULATION MODELS", "CLOUD FEEDBACK", "SURFACE-TEMPERATURE", "SIMULATIONS", "PROJECTIONS", "MECHANISMS", "CYCLE" ], "subjects": [ "Meteorology & Atmospheric Sciences" ], "subheadings": [ "Physical Sciences" ], "headings": [ "Science & Technology" ] }
En cherchant ces deux exemples à la main, le service ne trouve aucune structure (logique, elles sont à l'étranger).
$ curl -X 'POST' \ 'https://affiliations-tools.services.inist.fr/v1/rnsr/info' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '[ { "id": 1, "value": { "year": 2010, "address": "Harvard Univ, Cambridge, MA 02138 USA" } }, { "id": 3, "value": { "year": 2010, "address": "Harvard Univ, 20 Oxford St, Cambridge, MA 02138 USA" } }, { "id": 4, "value": { "year": 2008, "address": "Univ Calif San Diego, Scripps Inst Oceanog, La Jolla, CA 92093 USA" } }, { "id": 5, "value": { "year": 2008, "address": "Univ Calif San Diego, Scripps Inst Oceanog, 9500 Gilman Dr, La Jolla, CA 92093 USA" } } ]' [{ "id": 1, "value": [] }, { "id": 3, "value": [] }, { "id": 4, "value": [] }, { "id": 5, "value": [] }]
Donc le service est capable de traiter ces adresses, et de retourner un tableau vide pour chacune. Ça n'explique pas le comportement sur ces 61 - 35 = 26 notices.
Mais comme ça n'arrive que pour 26 notices sur plus de 12000, on va juste faire en sorte que ça ne fasse pas planter le script suivant (enrich-etab.ini
).