html抽取正文内容 c++,如何用C/C++获取html或shtml文件的内容?
解析html這些應該有開源代碼可以借鑒的,幫老大搜索了下,找到了下面這些:
Steev's HTML Parser
Steev's HTML Parser is an HTML parsing library that builds a complete hierarchy for each element and attribute in the supplied HTML file. Each element is its own C++ class, replete with child nodes, allowing for full control and processing. An 'HTML beautifier' example is included.
網址: http://freshmeat.net/projects/steevshtmlparser/
htmlcxx
htmlcxx is a simple non-validating CSS1 and HTML parser for C++. The parsing politics attempt to mimic the behavior of Mozilla Firefox, so you should expect parse trees similar to those created by Firefox. However, it does not insert nonexistent stuff in your HTML. Therefore, serializing the DOM tree gives exactly the same output as the original HTML document. Another key feature is an STL-like tree navigation API provided by the tree.hh template library.
網址: http://freshmeat.net/projects/htmlcxx/
Xport toolkit
Xport is a C++ template class library that can be included in any C++ project to enable the creation and generation of XHTML documents. Although it was developed with the idea of creating XHTML documents for reporting purposes, Xport can be used to create XHTML documents for many other uses as well. It can easily generate and parse (X)HTML documents and stylesheets. It is intuitive to use, and allows many options for parsing and generating documents.
網址: http://freshmeat.net/projects/xporttoolkit/
搜索的方法我是在freshmeat網站搜索關鍵字 html parse 搜到的,上面的三個都是開源的,最后一個貌似很好很強大,希望對老大有用。
總結
以上是生活随笔為你收集整理的html抽取正文内容 c++,如何用C/C++获取html或shtml文件的内容?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: html div实时监听,jquery实
- 下一篇: 未来新一代计算机的发展方向,未来计算机的