分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 教程案例

实现word转pdf,HTML转pdf(探索篇)

发布时间:2023-09-06 02:14责任编辑:董明明关键词:HTMLpdfword
笔者找依赖的jar包,找的好辛苦。

ITextRenderer、
ITextFontResolver这两个类依赖的jar包到底是哪个,还有怎么下载?苦苦纠结了3个小时。


终于找到你了!
记录个网址:
http://www.java2s.com/Code/Jar/c/Downloadcorerendererr8pre2jar.htm
上测试代码:

 /* * html转图片 */public static boolean convertHtmlToPdf(String inputFile, ????????String outputFile, String imagePath) ???????throws Exception { ???OutputStream os = new FileOutputStream(outputFile); ???ITextRenderer renderer = new ITextRenderer(); ???String url = new File(inputFile).toURI().toURL().toString(); ???renderer.setDocument(url); ???// 解决中文支持问题 ???ITextFontResolver fontResolver = renderer.getFontResolver(); ???fontResolver.addFont("C:/Windows/Fonts/simsunb.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED); ???//解决图片的相对路径问题 ???renderer.getSharedContext().setBaseURL("file:/" + imagePath);//D:/test ???renderer.layout(); ???renderer.createPDF(os); ???os.flush(); ???os.close(); ???return true;}

调用+走你!

这里笔者结合上一篇poi将word转html,结合使用。
/**doc

  • 转html
    */
    String tagPath = "D:\red_ant_file\20180915\image\";
    String sourcePath = "D:\red_ant_file\20180915\RedAnt的实验作业.doc";
    String outPath = "D:\red_ant_file\20180915\123.html";
    try {
    AllServiceIsHere.docToHtml(tagPath, sourcePath, outPath);
    } catch (Exception e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }

    String pdfPath = "D:\\red_ant_file\\20180915\\456.pdf";try { ???AllServiceIsHere.convertHtmlToPdf(outPath , pdfPath, tagPath);} catch (Exception e) { ???// TODO Auto-generated catch block ???e.printStackTrace();}

    【注意】
    (值得注意的地方是IText 根据html生成pdf文件的时候,会验证html文件是否标准,例如通过poi转换的出来的html文件的一些标签会缺少标签闭合 ” / “ :
    否则,你会遇到
    Can‘t load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; lineNumber: 23; columnNumber: 3; 元素类型 "meta" 必须由匹配的结束标记 "</meta>" 终止。

笔者尝试,使用第三方 jar 包Jsoup, ?直接调用 parse方法,笔者认为html就标准啦!
这个坑,让笔者苦恼了,1个小时。

为此,笔者不得不重写,word转html代码:
再次记录个网址:下载第三方 jar 包Jsoup使用
https://jsoup.org/download
上重写word转html代码:

 ???????// word 转 html ???????????public static void convert2Html(String fileName, String outPutFile) throws Exception { ???????????????HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(fileName));// WordToHtmlUtils.loadDoc(new ???????????????// 兼容2007 以上版本 ???????????????WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter( ???????????????????????DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()); ???????????????wordToHtmlConverter.setPicturesManager(new PicturesManager() { ???????????????????public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches, ???????????????????????????float heightInches) { ???????????????????????return "test/" + suggestedName; ???????????????????} ???????????????}); ???????????????wordToHtmlConverter.processDocument(wordDocument); ???????????????// save pictures ???????????????List pics = wordDocument.getPicturesTable().getAllPictures(); ???????????????if (pics != null) { ???????????????????for (int i = 0; i < pics.size(); i++) { ???????????????????????Picture pic = (Picture) pics.get(i); ???????????????????????System.out.println(); ???????????????????????try { ???????????????????????????pic.writeImageContent(new FileOutputStream("D:/test/" + pic.suggestFullFileName())); ???????????????????????} catch (FileNotFoundException e) { ???????????????????????????e.printStackTrace(); ???????????????????????} ???????????????????} ???????????????} ???????????????Document htmlDocument = wordToHtmlConverter.getDocument(); ???????????????ByteArrayOutputStream out = new ByteArrayOutputStream(); ???????????????DOMSource domSource = new DOMSource(htmlDocument); ???????????????StreamResult streamResult = new StreamResult(out); ???????????????TransformerFactory tf = TransformerFactory.newInstance(); ???????????????Transformer serializer = tf.newTransformer(); ???????????????serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); ???????????????serializer.setOutputProperty(OutputKeys.INDENT, "yes"); ???????????????serializer.setOutputProperty(OutputKeys.METHOD, "HTML"); ???????????????serializer.transform(domSource, streamResult); ???????????????out.close(); ???????????????writeFile(new String(out.toByteArray()), outPutFile); ???????????} ???????????????//输出html文件 ????????????????public static void writeFile(String content, String path) { ???????????????????????FileOutputStream fos = null; ????????????????????????BufferedWriter bw = null; ???????????????????????org.jsoup.nodes.Document doc = Jsoup.parse(content); ????????????????????????content=doc.html(); ???????????????????????try { ???????????????????????????????File file = new File(path); ???????????????????????????????fos = new FileOutputStream(file); ???????????????????????????????bw = new BufferedWriter(new OutputStreamWriter(fos,"UTF-8")); ???????????????????????????????bw.write(content); ???????????????????????} catch (FileNotFoundException fnfe) { ???????????????????????????????fnfe.printStackTrace(); ???????????????????????} catch (IOException ioe) { ???????????????????????????????ioe.printStackTrace(); ???????????????????????} finally { ???????????????????????????????try { ???????????????????????????????????????if (bw != null) ???????????????????????????????????????????????bw.close(); ???????????????????????????????????????if (fos != null) ???????????????????????????????????????????????fos.close(); ???????????????????????????????} catch (IOException ie) { ???????????????????????????????} ???????????????????????} ???????????????}

准备个文件,测试一下。

 ???String source = "D:\\red_ant_file\\20180915\\1303\\RedAnt的实验作业.doc"; ???????????????String out = "D:\\red_ant_file\\20180915\\1303\\789.html"; ???????????????try { ???????????????????AllServiceIsHere.convert2Html(source, out); ???????????????} catch (Exception e) { ???????????????????// TODO Auto-generated catch block ???????????????????e.printStackTrace(); ???????????????}

word转html,规范化代码后的转换结果。

接下来,html转pdf

【后话】

虽然笔者,最终调试出来了。使用这种方法转pdf。
但是使用中,会遇到各种各样的奇葩坑!因此笔者在这里不推荐使用这种方法。
原因就是,html的规则也在变化之中,写法也在变化之中。html转pdf会在后续报各种各样的标签错误。
笔者之所以粘出,这些代码。完全是因为,笔者对自己的尝试,有个明确的结果。亦或是,再优化这些代码,找到合适的解决办法。

实现word转pdf,HTML转pdf(探索篇)

原文地址:http://blog.51cto.com/13479739/2175543

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved