![sisyphe](./docs/logo-sisyphe.jpg) ## Sisyphe-GO Sisyphe is a generic Golang recursive folder analyser terminal application ![Sisyphe-pic](./docs/sisyphe.gif) ### Requirements Tested with Golang 1.17 Works on Linux/OSX/Windows Mount a corpus folder and : ```bash docker-compose up -d docker exec sisyphe_go_go_1 -it go run . -n corpusname -c corpuspath -o outputpath ``` ### Install it on local 1. Download the latest Sisyphe-go version 2. Just do : `go build .` 3. ... that's it. ### Help `go run . --help` Will output help ### Options --help Output usage -c Configuration folder path -n Corpus name (default "test") -o Output directory where results are written -p Corpus path -w Counting word on pdf ### How it works ? Just start Sisyphe on a folder with any files in it. `go run . ~/Documents/customfolder/corpus -n corpusname -o outputpath` `go run . ~/Documents/customfolder/corpus -n corpusname -c ~/Documents/customfolder/corpusResources -o outputpath` Sisyphe is now working in background with all your computer thread. Just take a coffee and wait , it will prevent you when it's done :) The results of sisyphe are present @ `sisyphe/out/{timestamp}-corpusname/` (errors,info,duration..) ![Sisyphe-dashboard](./docs/sisyphe-monitor.gif) ### Modules There is a list of default modules (focused on xml & pdf). Those URL NEED to be updated when merge branch will be ok. - [FILETYPE](https://github.com/istex/sisyphe/tree/master/src/worker/filetype) Will detect mimetype,extension, corrupted files.. - [PDF](https://github.com/istex/sisyphe/tree/master/src/worker/pdf) Will get info from PDF (version, author, meta...) - [XML](https://github.com/istex/sisyphe/tree/master/src/worker/xml) Will check if it's wellformed, valid-dtd's, get elements from balises ... - [XPATH](https://github.com/istex/sisyphe/tree/master/src/worker/xpath) Will generate a complete list of xpaths from submitted folder - [OUT](https://github.com/istex/sisyphe/tree/master/src/worker/out) Will export data to json file & ElasticSearch database