Sisyphe-go is a golang command line application for recursive analysis of directories and files of scientific publishing corpus

@Nacim Nacim authored on 16 Mar 2022
example update go and dependencies 1 year ago
kibanatemplates add kibana files 1 year ago
nginx.conf.d add kibana files 1 year ago
.dockerignore fix test when run in docker 2 years ago
.gitignore add dtd and xsd files 1 year ago
Dockerfile update go and dependencies 1 year ago
README.md add option -noindex for disable indexation after process 1 year ago
docker-compose.yml remove useless port 1 year ago
go.mod update go and dependencies 1 year ago
go.sum update go and dependencies 1 year ago
indexCorpus.sh update go and dependencies 1 year ago
main.go add option -noindex for disable indexation after process 1 year ago
pdf.go run script for index corpus 1 year ago
pdf_test.go refactoring logging 1 year ago
struct.go update go and dependencies 1 year ago
util.go run script for index corpus 1 year ago
xml.go update go and dependencies 1 year ago
xml_test.go update go and dependencies 1 year ago
README.md

sisyphe

Sisyphe-GO

Sisyphe is a generic Golang recursive folder analyser terminal application

Sisyphe-pic

Requirements

Tested with Golang 1.17

Works on Linux/OSX/Windows

Mount a corpus folder and :

docker-compose up -d
docker exec -it sisyphe_go_go_1 go run . -n corpusName -c corpuspath -o outputpath

Install it on local

  1. Download the latest Sisyphe-go version
  2. Just do : go build .
  3. ... that's it.

Help

go run . --help Will output help

Options

--help      Output usage
-c          Configuration folder path
-n          Corpus name (default "test")
-o          Output directory where results are written
-p          Corpus path
-w          Counting word on pdf
-noindex    Disable indexation after process

How it works ?

Just start Sisyphe on a folder with any files in it.

go run . ~/Documents/customfolder/corpus -n corpusname -o outputpath

go run . ~/Documents/customfolder/corpus -n corpusname -c ~/Documents/customfolder/corpusResources -o outputpath

Sisyphe is now working in background with all your computer thread. Just take a coffee and wait , it will prevent you when it's done :)

The results of sisyphe are present @ sisyphe/out/{timestamp}-corpusName/ (errors,info,duration..)

Test

Just run go test

Modules

  • XML Usage of poppler function (pdftotext and pdfinfo)