搭建个人PDF工具箱--stirling-pdf

82   /   0   /   0   /   0   /   发布于 1年前
# docker安装: ``` sudo docker pull frooodle/s-pdf sudo docker -d --name pdfTool -p 8080:8080 frooodle/s-pdf ``` # 手动安装: 1. 下载jar包 [官网](https://www.stirlingpdf.com/)和[github](https://github.com/Stirling-Tools/Stirling-PDF),这两个位置下载都比较慢 2. 安装java - 命令安装: ``` sudo apt install openjdk-21-jdk ``` - 如果没有安装包,只能手动安装: 建议安装[OpenJDK21](https://jdk.java.net/archive/) 下载后,解压到 `/usr/local/` 在 `~/.bashrc`中添加: ``` export JAVA_HOME=/usr/local/jdk-21 export PATH=$JAVA_HOME/bin:$PATH ``` 3. 依赖安装软件 ``` apt install -y libreoffice-writer libreoffice-calc libreoffice-impress tesseract-ocr tesseract-ocr-chi-sim python3 python3-pip # 这里可能还要安装ocrmypdf,有时间了试试 # 修改pip3为国内源 # https://mirrors.aliyun.com/pypi/simple/ # https://mirrors.cloud.tencent.com/pypi/simple/ # 设置全局索引地址(以清华大学源为例) pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/ # 设置信任主机(避免SSL问题) pip3 config set install.trusted-host pypi.tuna.tsinghua.edu.cn pip3 install uno opencv-python-headless unoserver pngquant WeasyPrint ``` 4. 安装并运行Stirling-PDF ``` mkdir /opt/Stirling-PDF mv /path/to/Stirling-PDF-*.jar /opt/Stirling-PDF/ mv /源码下的/scripts/ /opt/Stirling-PDF/ # scripts中有init.sh和init-without-ocr.sh等脚本,其中cp -r /usr/share/tesseract-ocr/4.00/tessdata/* /usr/share/tessdata是可以使用ocr的重点 export SYSTEM_DEFAULTLOCALE="zh-CN" # export SERVER_HOST="0.0.0.0" 这个可以没有 export SERVER_PORT="8080" export SECURITY_ENABLELOGIN=true java -jar /opt/Stirling-PDF/Stirling-PDF-with-login.jar ``` 5. 默认用户名为admin,密码为stirling # 官方提供的编译安装指南 To run the application without Docker/Podman, you will need to manually install all dependencies and build the necessary components. Note that some dependencies might not be available in the standard repositories of all Linux distributions, and may require additional steps to install. The following guide assumes you have a basic understanding of using a command line interface in your operating system. It should work on most Linux distributions and MacOS. For Windows, you might need to use Windows Subsystem for Linux (WSL) for certain steps. The amount of dependencies is to actually reduce overall size, ie installing LibreOffice sub components rather than full LibreOffice package. You could theoretically use a Distrobox/Toolbox, if your Distribution has old or not all Packages. But you might just as well use the Docker Container then. ### Step 1: Prerequisites Install the following software, if not already installed: * Java 17 or later (21 recommended) * Gradle 7.0 or later (included within repo so not needed on server) * Git * Python 3.8 (with pip) * Make * GCC/G++ * Automake * Autoconf * libtool * pkg-config * zlib1g-dev * libleptonica-dev * Debian-based Systems * Fedora-based Systems * Nix Package Manager ```bash sudo apt-get update sudo apt-get install -y git automake autoconf libtool \ libleptonica-dev pkg-config zlib1g-dev make g++ \ openjdk-21-jdk python3 python3-pip ``` ### Step 2: Clone and Build jbig2enc (Only required for certain OCR functionality) * Debian * Fedora * Nix Package Manager ```bash mkdir ~/.git cd ~/.git &&\ git clone https://github.com/agl/jbig2enc.git &&\ cd jbig2enc &&\ ./autogen.sh &&\ ./configure &&\ make &&\ sudo make install ``` ### Step 3: Install Additional Software Next we need to install LibreOffice for conversions, tesseract for OCR, and opencv for pattern recognition functionality. Install the following software: * libreoffice (libreoffice-core libreoffice-common libreoffice-writer libreoffice-calc libreoffice-impress) * python3-uno * unoserver * pngquant * tesseract * opencv-python-headless * Debian-based Systems * Fedora-based Systems * Nix Package Manager ```bash sudo apt-get install -y libreoffice-writer libreoffice-calc libreoffice-impress tesseract pip3 install uno opencv-python-headless unoserver pngquant WeasyPrint --break-system-packages ``` ### Step 4: Grab latest Stirling-PDF Jar The JAR can be downloaded in two versions, [normal](https://files.stirlingpdf.com/Stirling-PDF.jar) and [security](https://files.stirlingpdf.com/Stirling-PDF-with-login.jar) ### Step 5: Move jar to desired location You can move this file to a desired location, for example, `/opt/Stirling-PDF/`. You must also move the Script folder within the Stirling-PDF repo that you have downloaded to this directory. This folder is required for the python scripts using OpenCV. * Debian (Root) * Fedora (Root) * Nix (Non-root) ```bash sudo mkdir /opt/Stirling-PDF &&\ sudo mv ./build/libs/Stirling-PDF-*.jar /opt/Stirling-PDF/ &&\ sudo mv scripts /opt/Stirling-PDF/ &&\ echo "Scripts installed." ``` ### Step 6: OCR Language Support If you plan to use the OCR (Optical Character Recognition) functionality, you might need to install language packs for Tesseract if running non-english scanning. * Debian-based Systems * Fedora-based Systems * Nix Package Manager * Manual Installation ```bash sudo apt update &&\ # All languages # sudo apt install -y 'tesseract-ocr-*' # Find languages: apt search tesseract-ocr- # View installed languages: dpkg-query -W tesseract-ocr- | sed 's/tesseract-ocr-//g' ``` ### Step 7: Run Stirling-PDF * Debian-based Systems * Fedora-based Systems * Nix Package Manager ```bash java -jar /opt/Stirling-PDF/Stirling-PDF-*.jar ``` ### Step 8: Adding a Desktop Icon This will add a modified Appstarter to your Appmenu. ```bash location=$(pwd)/gradlew image=$(pwd)/docs/stirling-transparent.svg cat > ~/.local/share/applications/Stirling-PDF.desktop <<EOF [Desktop Entry] Name=Stirling PDF; GenericName=Launch StirlingPDF and open its WebGUI; Category=Office; Exec=xdg-open http://localhost:8080 && nohup $location java -jar /opt/Stirling-PDF/Stirling-PDF-*.jar &; Icon=$image; Keywords=pdf; Type=Application; NoDisplay=false; Terminal=true; EOF ``` Note: Currently the app will run in the background until manually closed. ### Optional: Changing the Host and Port To override the default configuration, you can add the following to `/.git/Stirling-PDF/configs/custom_settings.yml` file: ```yaml server: host: 0.0.0.0 port: 3000 ``` For systemd add in the .env file (see run as service for setting environment variables): ```bash SERVER_HOST="0.0.0.0" SERVER_PORT="3000" ``` **Note:** The file `custom_settings.yml` is created after the first application launch. To have it before that, you can create the directory and add the file yourself. ### Optional: Run Stirling-PDF as a service (requires root). First create a .env file, where you can store environment variables: ```text touch /opt/Stirling-PDF/.env ``` In this file you can add all variables, one variable per line, as stated in the main readme (for example SYSTEM\_DEFAULTLOCALE="de-DE"). Create a new file where we store our service settings and open it with nano editor: ```text nano /etc/systemd/system/stirlingpdf.service ``` Paste this content, make sure to update the filename of the jar-file. Press Ctrl+S and Ctrl+X to save and exit the nano editor: ```text [Unit] Description=Stirling-PDF service After=syslog.target network.target [Service] SuccessExitStatus=143 User=root Group=root Type=simple EnvironmentFile=/opt/Stirling-PDF/.env WorkingDirectory=/opt/Stirling-PDF ExecStart=/usr/bin/java -jar Stirling-PDF-0.17.2.jar ExecStop=/bin/kill -15 $MAINPID [Install] WantedBy=multi-user.target ``` Notify systemd that it has to rebuild its internal service database (you have to run this command every time you make a change in the service file): ```text sudo systemctl daemon-reload ``` Enable the service to tell the service to start it automatically: ```text sudo systemctl enable stirlingpdf.service ``` See the status of the service: ```text sudo systemctl status stirlingpdf.service ``` Manually start/stop/restart the service: ```text sudo systemctl start stirlingpdf.service sudo systemctl stop stirlingpdf.service sudo systemctl restart stirlingpdf.service ``` ### Starting unoserver alongside Stirling PDF To ensure that unoserver is running alongside Stirling PDF, you need to start it with the following command: ```bash unoserver --port 2003 --interface 0.0.0.0 ``` You can add this command to your startup script or systemd service file to ensure it starts automatically with Stirling PDF. ### Customizing Paths in settings.yml If the install path is different, it can be customized in `settings.yml`: ```yaml system: customPaths: pipeline: watchedFoldersDir: "" #Defaults to /pipeline/watchedFolders finishedFoldersDir: "" #Defaults to /pipeline/finishedFolders operations: weasyprint: "" #Defaults to /opt/venv/bin/weasyprint unoconvert: "" #Defaults to /opt/venv/bin/unoconvert ```
  • 共 0 条回复
  • 需要登录 后方可回复, 如果你还没有账号请点击这里注册
梦初醒 茅塞开
  • 不经他人苦,莫劝他人善。
  • 能量足,心态稳,温和坚定可以忍。
  • 辛苦决定不了收入,真正决定收入的只有一个,就是不可替代性。
  • 要么忙于生存,要么赶紧去死!
  • 内心强大到混蛋,比什么都好!
  • 规范流程比制定制度更重要!
  • 立志需要高远,但不能急功近利;
    行动需要迅速,却不可贪图速成。
  • 不要强求人品,要设计高效的机制。
  • 你弱的时候,身边都是鸡零狗碎;
    你强的时候,身边都是风和日丽。
  • 机制比人品更可靠,契约比感情更可靠。
  • 合作不意味着没有冲突,却是控制冲突的最好方法。
  • 误解是人生常态,理解本是稀缺的例外。
  • 成功和不成功之间,只差一次坚持!
  • 祁连卧北雪,大漠壮雄关。
  • 利益顺序,过程公开,机会均等,付出回报。