pdfbox自带的转换html的方法效果不是太好,pdfdom是基于pdfbox的,在此之上加强了转换html的能力。
maven
???????<dependency> ???????????<groupId>net.sf.cssbox</groupId> ???????????<artifactId>pdf2dom</artifactId> ???????????<version>1.6</version> ???????</dependency> ???????<dependency> ???????????<groupId>org.apache.pdfbox</groupId> ???????????<artifactId>pdfbox</artifactId> ???????????<version>2.0.4</version> ???????</dependency> ???????<dependency> ???????????<groupId>org.apache.pdfbox</groupId> ???????????<artifactId>pdfbox-tools</artifactId> ???????????<version>2.0.4</version> ???????</dependency>
使用
public void generateHTMLFromPDF(String filename) throws IOException, ParserConfigurationException { ???????PDDocument pdf = PDDocument.load(new File(filename)); ???????Writer output = new PrintWriter("pdf.html", "utf-8"); ???????new PDFDomTree().writeText(pdf, output); ???????output.close();}
或者
public void convertPdf2Html(File input,Writer out) throws IOException, ParserConfigurationException { ???????PDDocument pdf = PDDocument.load(input); ???????PDFDomTree tree = new PDFDomTree(); ???????tree.writeText(pdf,out);}
转自:这里
使用pdfdom将pdf转为html
原文地址:https://www.cnblogs.com/x54256/p/8820471.html