Sisyphe-go is a golang command line application for recursive analysis of directories and files of scientific publishing corpus

example Fix last bug and update indexation after analyze 5 months ago
kibanatemplates Evolution indexation, dashboard, user, variable 5 months ago
nginx.conf.d add kibana files 6 months ago
.dockerignore add log for display output indexation script 6 months ago
.gitignore Evolution indexation, dashboard, user, variable 5 months ago
Dockerfile Ajout paquet coreutils pour GNU sort 1 month ago
README.md Adaptation RAM et cpus rééls 1 month ago
docker-compose.yml Evolution indexation, dashboard, user, variable 5 months ago
go.mod Fix last bug and update indexation after analyze 5 months ago
go.sum Fix last bug and update indexation after analyze 5 months ago
indexCorpus.sh suppression argument corpuspath inutile 1 month ago
main.go Inversion noattval et attval 1 month ago
pdf.go fix bugs 6 months ago
pdf_test.go refactoring logging 7 months ago
struct.go add error in analyze log If error in xpath analysis 4 months ago
util.go Fix last bug and update indexation after analyze 5 months ago
xml.go add error in analyze log If error in xpath analysis 4 months ago
xml_test.go add error in analyze log If error in xpath analysis 4 months ago
xpath.sh Adaptation RAM et cpus rééls 1 month ago
README.md

Sisyphe-GO

Sisyphe-go is a generic Golang recursive folder analyser terminal application

Requirements

Tested with Golang 1.18

Works on Linux/OSX/Windows

Create and fill in the following environment variables on the host machine

  • WORK
  • CORPUS_RESOURCES
  • SISYPHE_OUT
  • ELASTIC_URL
  • ELASTIC_PORT
  • KIBANA_PORT
  • UID=$(id -u)
  • GID=$(id -g)

Execution

Generic analysis

docker-compose up -d
docker exec -t sisyphe-go_go_1 go run . -n corpusName -p corpusPath -o outputPath

Detailed analysis

docker exec -t sisyphe-go_go_1 go run . -n corpusName -c corpusResourcesPath -p corpusPath -o outputPath

Example:

docker exec -t sisyphe-go_go_1 go run . -n karger-ebooks-2022-08-08-detaillee -c /corpus-resources -p /work/sample/karger_2020_11_06

By default the program will write its results in SISYPHE_OUT

Install it on local

  1. Download the latest Sisyphe-go version
  2. Just do : go build .
  3. ... that's it.

Help

go run . --help Will output help

Options

--help      Output usage
-c          Configuration folder path
-n          Corpus name (default "test")
-o          Output directory where results are written
-p          Corpus path
-w          Counting word on pdf
-noanalyze  Disable analysis
-noindex    Disable indexation
-noxpath    Disable xpaths.csv file generation
-noattval   xpaths.csv without attribute value

How it works ?

Just start Sisyphe-go on a folder with any files in it.

go run . -p ~/Documents/customfolder/corpus -n corpusname -o outputpath

go run . -p ~/Documents/customfolder/corpus -n corpusname -c ~/Documents/customfolder/corpusResources -o outputpath

Sisyphe-go is now working in background with all your computer thread. Just take a coffee and wait , it will prevent you when it's done :)

The results of sisyphe-go are present in outputpath/{timestamp}-corpusName/ (errors,info,duration..)

Test

Just run go test

For cover go test -cover

Modules

  • PDF Usage of pdf lib (pdftotext and pdfinfo)
  • XML Usage of xml lib (xmlstarlet and xmllint)