{"id":24960,"date":"2022-07-30T15:22:50","date_gmt":"2022-07-30T14:22:50","guid":{"rendered":"https:\/\/ekiwi-blog.de\/?p=24960"},"modified":"2022-12-26T11:13:26","modified_gmt":"2022-12-26T10:13:26","slug":"vba-read-pdf","status":"publish","type":"post","link":"https:\/\/ekiwi-blog.de\/en\/24960\/vba-read-pdf\/","title":{"rendered":"VBA read PDF"},"content":{"rendered":"<p>With <abbr title=\"Visual Basic for Application\">VBA<\/abbr> you can write many helpful macros to automate work processes within Microsoft Office. A very useful function would be to read values from a PDF file and transfer them to Excel or Word using VBA. This would avoid tedious typing or manual copying from the PDF into Excel or Word and save a lot of time.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of content<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ekiwi-blog.de\/en\/24960\/vba-read-pdf\/#Preparation_Convert_PDF_to_text_file\" >Preparation: Convert PDF to text file<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ekiwi-blog.de\/en\/24960\/vba-read-pdf\/#VBA-Code\" >VBA-Code<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Preparation_Convert_PDF_to_text_file\"><\/span>Preparation: Convert PDF to text file<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To accomplish this requires a little preparation. Those who have the paid version of Adobe Acrobat can try it directly through the library. But since most will not have the paid version, we use a command line tool to convert the PDF file to a text file, and then read the text file. This work is done by the <a href=\"https:\/\/www.xpdfreader.com\/index.html\" target=\"_blank\" rel=\"noopener\">pdftotext.exe of XpdfReader<\/a>. Simply download the tools and place the exe (<em>pdftotext.exe<\/em>) in any folder. The exe-file can also be located in the <a href=\"https:\/\/ekiwi-blog.de\/en\/19023\/what-is-macosx-folder\/\">same folder<\/a> path as the PDF files to be read.<\/p>\n<p>The readout of the PDF is to be done using the example of an invoice. Let&#8217;s assume we have the following invoice and want to read the total amount from it and write it to an Excel file.<\/p>\n<figure id=\"attachment_24957\" aria-describedby=\"caption-attachment-24957\" style=\"width: 499px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-24957 size-full\" title=\"Convert PDF to Excel, Word with VBA\" src=\"http:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf.jpg\" alt=\"Screenshot of PDF-Invoice as an expample\" width=\"499\" height=\"700\" srcset=\"https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf.jpg 499w, https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-214x300.jpg 214w\" sizes=\"auto, (max-width: 499px) 100vw, 499px\" \/><figcaption id=\"caption-attachment-24957\" class=\"wp-caption-text\">pdf invoice example<\/figcaption><\/figure>\n<p>In this example we have placed the PDF and the <em>pdftotext.exe<\/em> in the same directory.<\/p>\n<figure id=\"attachment_24955\" aria-describedby=\"caption-attachment-24955\" style=\"width: 698px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-24955 size-full\" title=\"readout pdf using VBA\" src=\"http:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-file.jpg\" alt=\"Screenshot File Explorer with pdf file that shall be read out and pdftotext.exe\" width=\"698\" height=\"161\" srcset=\"https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-file.jpg 698w, https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-file-300x69.jpg 300w\" sizes=\"auto, (max-width: 698px) 100vw, 698px\" \/><figcaption id=\"caption-attachment-24955\" class=\"wp-caption-text\">Example PDF for export text values to microsoft office, e.g. Excel, Word, Access<\/figcaption><\/figure>\n<p>In order to be able to read the values of the PDF file correctly later with a VBA macro, we must first understand what the text file looks like that the command line tool creates. This is the only way to parse the text file correctly later. The best way to do this is to first create the text file manually using the command prompt (cmd). There you can also learn the syntax, which we will need later in the VBA code. The text file is automatically saved in the same directory and under the same name as the PDF file. So if the PDF file is called <em>Invoice-2022-5340.pdf<\/em>, then the text file is called <em>Invoice-2022-5340.txt<\/em>.<\/p>\n<p>For our example, the generated text file looks like this:<\/p>\n<figure id=\"attachment_24959\" aria-describedby=\"caption-attachment-24959\" style=\"width: 564px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-24959 size-full\" title=\"extract text from pdf with vba without using adobe acrobat\" src=\"http:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-to-text-file.jpg\" alt=\"Screenshot converted text from PDF for further use in Microsoft Office (Excel, Word, Access)\" width=\"564\" height=\"679\" srcset=\"https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-to-text-file.jpg 564w, https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-to-text-file-249x300.jpg 249w\" sizes=\"auto, (max-width: 564px) 100vw, 564px\" \/><figcaption id=\"caption-attachment-24959\" class=\"wp-caption-text\">Readed text from PDF within txt-file<\/figcaption><\/figure>\n<p>If you know the structure of the text file, you can think about the best way to process\/parse the text and how to get the desired information.<\/p>\n<p><strong><u>Note:<\/u><\/strong> The <em>pdftotext.exe<\/em> tool is <a href=\"https:\/\/ekiwi-blog.de\/en\/17865\/windows-11-drag-and-drop-on-taskbar-does-not-work\/\" title=\"Windows 11 Drag &#038; Drop on Taskbar does not work\">not a guarantee that it will work<\/a> or that everything will be read correctly. There may be PDF files that cannot be read because perhaps the font is not known, the PDF is encrypted, copy-protected or similar. Also in this example it is noticeable that for example the Euro (\u20ac) sign has not been exported.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"VBA-Code\"><\/span>VBA-Code<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now what does the VBA macro look like, for example, to read in the total price from the invoice?<br \/>\nThe following procedure:<\/p>\n<ol>\n<li>Read the text from the PDF and save it to a text file<\/li>\n<li>Open the text file and assign the text to a variable<\/li>\n<li>parse the text, preferably with regex, to get the desired values<\/li>\n<li>write the desired value into an <a href=\"https:\/\/ekiwi-blog.de\/en\/18466\/vba-excel-als-datum-formatieren\/\" title=\"VBA Excel format cell as date\">Excel cell<\/a> or into Word<\/li>\n<li>Delete the text file again<\/li>\n<\/ol>\n<pre><code><span style=\"color: #0000ff;\">Sub<\/span> ReadPDFFile()\r\n    <span style=\"color: #0000ff;\">Dim<\/span> WSHShell <span style=\"color: #0000ff;\">As Object<\/span>\r\n    <span style=\"color: #0000ff;\">Dim<\/span> FSO <span style=\"color: #0000ff;\">As Object<\/span>\r\n    <span style=\"color: #0000ff;\">Dim<\/span> regex <span style=\"color: #0000ff;\">As Object<\/span>\r\n    <span style=\"color: #0000ff;\">Dim<\/span> strCommand, pdfFilePath, txtFilePath, strText, txt <span style=\"color: #0000ff;\">As String<\/span>\r\n    <span style=\"color: #0000ff;\">On Error GoTo<\/span> ErrorHandling\r\n\r\n    <span style=\"color: #0000ff;\">Set<\/span> WSHShell = CreateObject(\"WScript.Shell\")\r\n    <span style=\"color: #0000ff;\">Set<\/span> FSO = CreateObject(\"Scripting.FileSystemObject\")\r\n\r\n    <span style=\"color: #339966;\">'<a title=\"iTunes.exe where is it found?\" href=\"https:\/\/ekiwi-blog.de\/en\/26678\/where-is-itunes-exe-found-location-directory-path\/\">'Directory path<\/a> to PDF file<\/span>\r\n    pdfFilePath = \"c:\\temp\\Invoice-2022-5340.pdf\"\r\n\r\n    <span style=\"color: #339966;\">'convert to text file with command line tool pdftotext.exe<\/span>\r\n    strCommand = \"\"\"c:\\temp\\pdftotext.exe\"\" -raw \" &amp; \"\"\"\" &amp; pdfFilePath &amp; \"\"\"\"\r\n    WSHShell.Run strCommand, 0, True\r\n    \r\n    <span style=\"color: #339966;\">'Determine path of text file and read text file<\/span>\r\n    txtFilePath = Replace(pdfFilePath, \".pdf\", \".txt\")\r\n    strText = FSO.OpenTextFile(txtFilePath).ReadAll\r\n    \r\n    <span style=\"color: #339966;\">'Parse text with regex<\/span>\r\n    <span style=\"color: #0000ff;\">Set<\/span> regex = CreateObject(\"vbscript.regexp\")\r\n    regex.Pattern = \"Total:\\s*\\d*,\\d{2}\"\r\n    <span style=\"color: #0000ff;\">Set<\/span> match1 = regex.Execute(strText)\r\n    <span style=\"color: #0000ff;\">If<\/span> match1.Count &gt; 0 <span style=\"color: #0000ff;\">Then<\/span>\r\n        txt = match1(0)\r\n        txt = Replace(txt, \"Total: \", \"\")\r\n        'Insert into Excel cell\r\n        Cells(3, 3).Value = txt\r\n    <span style=\"color: #0000ff;\">End If<\/span>\r\n    \r\n    <span style=\"color: #339966;\">'Delete text file again after work is done<\/span>\r\n    Kill (txtFilePath)\r\n\r\n    <span style=\"color: #0000ff;\">Set<\/span> regex = <span style=\"color: #0000ff;\">Nothing<\/span>\r\n    <span style=\"color: #0000ff;\">Set<\/span> WSHShell = <span style=\"color: #0000ff;\">Nothing<\/span>\r\n    <span style=\"color: #0000ff;\">Set<\/span> FSO = <span style=\"color: #0000ff;\">Nothing<\/span>\r\n    \r\n    <span style=\"color: #0000ff;\">Exit Sub<\/span>\r\nErrorHandling:\r\n        <span style=\"color: #339966;\">'Perform a few cleanups in case of an error.<\/span>\r\n        Kill (txtFilePath)\r\n        <span style=\"color: #0000ff;\">Set<\/span> regex = <span style=\"color: #0000ff;\">Nothing<\/span>\r\n        <span style=\"color: #0000ff;\">Set<\/span> WSHShell = <span style=\"color: #0000ff;\">Nothing<\/span>\r\n        <span style=\"color: #0000ff;\">Set<\/span> FSO = <span style=\"color: #0000ff;\">Nothing<\/span>\r\n<span style=\"color: #0000ff;\">End Sub<\/span><\/code><\/pre>\n<p>This code reads the PDF file and writes the total amount of the invoice to the Excel cell <em>C3<\/em>.<\/p>\n<figure id=\"attachment_24953\" aria-describedby=\"caption-attachment-24953\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-24953 size-full\" title=\"VBA: read a specific string from pdf file\" src=\"http:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-to-excel.jpg\" alt=\"Screenshot read out PDF file via VBA\" width=\"650\" height=\"513\" srcset=\"https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-to-excel.jpg 650w, https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-to-excel-300x237.jpg 300w, https:\/\/ekiwi-blog.de\/wp-content\/uploads\/2022\/07\/vba-read-pdf-to-excel-100x80.jpg 100w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\" \/><figcaption id=\"caption-attachment-24953\" class=\"wp-caption-text\">Result: Values of PDF <a href=\"https:\/\/ekiwi-blog.de\/en\/49271\/vba-read-out-folder-and-directory-names-and-list-in-excel\/\">read out and inserted to excel<\/a> file<\/figcaption><\/figure>\n\n<div class=\"aawp\">\n\n            \n            \n<div class=\"aawp-product aawp-product--horizontal aawp-product--bestseller aawp-product--ribbon\"  data-aawp-product-asin=\"3367105732\" data-aawp-product-id=\"112260\" data-aawp-tracking-id=\"ekiwide0b-21\" data-aawp-product-title=\"VBA mit Excel  Das umfassende Handbuch f\u00fcr Einsteiger und fortgeschrittene Anwender Aktuell zu Excel 2024 und Microsoft 365 \u2013 Ausgabe 2025\">\n\n    <span class=\"aawp-product__ribbon aawp-product__ribbon--bestseller\">Bestseller No. 1<\/span>\n    <div class=\"aawp-product__thumb\">\n        <a class=\"aawp-product__image-link\"\n           href=\"https:\/\/www.amazon.de\/dp\/3367105732?tag=ekiwide0b-21&linkCode=ogi&th=1&psc=1&keywords=books%20vba\" title=\"VBA mit Excel: Das umfassende Handbuch f\u00fcr...\" rel=\"nofollow noopener sponsored\" target=\"_blank\">\n            <img decoding=\"async\" class=\"aawp-product__image\" src=\"https:\/\/m.media-amazon.com\/images\/I\/51EVPKCMSSL._SL160_.jpg\" alt=\"VBA mit Excel: Das umfassende Handbuch f\u00fcr...\"  \/>\n        <\/a>\n\n            <\/div>\n\n    <div class=\"aawp-product__content\">\n        <a class=\"aawp-product__title\" href=\"https:\/\/www.amazon.de\/dp\/3367105732?tag=ekiwide0b-21&linkCode=ogi&th=1&psc=1&keywords=books%20vba\" title=\"VBA mit Excel: Das umfassende Handbuch f\u00fcr...\" rel=\"nofollow noopener sponsored\" target=\"_blank\">\n            VBA mit Excel: Das umfassende Handbuch f\u00fcr...        <\/a>\n        <div class=\"aawp-product__description\">\n                    <\/div>\n    <\/div>\n\n    <div class=\"aawp-product__footer\">\n\n        <div class=\"aawp-product__pricing\">\n            \n                            <span class=\"aawp-product__price aawp-product__price--current\">49,90 EUR<\/span>\n            \n            <a href=\"https:\/\/www.amazon.de\/gp\/prime\/?tag=ekiwide0b-21\" title=\"Amazon Prime\" rel=\"nofollow noopener sponsored\" target=\"_blank\" class=\"aawp-check-prime\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/ekiwi-blog.de\/wp-content\/plugins\/aawp\/assets\/img\/icon-check-prime.svg\" height=\"16\" width=\"55\" alt=\"Amazon Prime\" \/><\/a>        <\/div>\n\n                <a class=\"aawp-button aawp-button--buy aawp-button--icon aawp-button--icon-black\" href=\"https:\/\/www.amazon.de\/dp\/3367105732?tag=ekiwide0b-21&#038;linkCode=ogi&#038;th=1&#038;psc=1&#038;keywords=books%20vba\" title=\"Buy on Amazon\" target=\"_blank\" rel=\"nofollow noopener sponsored\">Buy on Amazon<\/a>\n            <\/div>\n\n<\/div>\n\n            \n            \n<div class=\"aawp-product aawp-product--horizontal aawp-product--bestseller aawp-product--ribbon\"  data-aawp-product-asin=\"3832803033\" data-aawp-product-id=\"111673\" data-aawp-tracking-id=\"ekiwide0b-21\" data-aawp-product-title=\"VBA mit Excel - Der leichte Einstieg  Vom ersten Makro zur eigenen Eingabemaske - F\u00fcr Excel 2010 bis 2021 365  Die anschauliche Anleitung f\u00fcr .. Eingabemaske - F\u00fcr Excel 2010 bis 2019\">\n\n    <span class=\"aawp-product__ribbon aawp-product__ribbon--bestseller\">Bestseller No. 2<\/span>\n    <div class=\"aawp-product__thumb\">\n        <a class=\"aawp-product__image-link\"\n           href=\"https:\/\/www.amazon.de\/dp\/3832803033?tag=ekiwide0b-21&linkCode=ogi&th=1&psc=1&keywords=books%20vba\" title=\"VBA mit Excel - Der leichte Einstieg: Vom ersten...\" rel=\"nofollow noopener sponsored\" target=\"_blank\">\n            <img decoding=\"async\" class=\"aawp-product__image\" src=\"https:\/\/m.media-amazon.com\/images\/I\/51Ssb4JGplL._SL160_.jpg\" alt=\"VBA mit Excel - Der leichte Einstieg: Vom ersten...\"  \/>\n        <\/a>\n\n            <\/div>\n\n    <div class=\"aawp-product__content\">\n        <a class=\"aawp-product__title\" href=\"https:\/\/www.amazon.de\/dp\/3832803033?tag=ekiwide0b-21&linkCode=ogi&th=1&psc=1&keywords=books%20vba\" title=\"VBA mit Excel - Der leichte Einstieg: Vom ersten...\" rel=\"nofollow noopener sponsored\" target=\"_blank\">\n            VBA mit Excel - Der leichte Einstieg: Vom ersten...        <\/a>\n        <div class=\"aawp-product__description\">\n                    <\/div>\n    <\/div>\n\n    <div class=\"aawp-product__footer\">\n\n        <div class=\"aawp-product__pricing\">\n            \n                            <span class=\"aawp-product__price aawp-product__price--current\">19,90 EUR<\/span>\n            \n            <a href=\"https:\/\/www.amazon.de\/gp\/prime\/?tag=ekiwide0b-21\" title=\"Amazon Prime\" rel=\"nofollow noopener sponsored\" target=\"_blank\" class=\"aawp-check-prime\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/ekiwi-blog.de\/wp-content\/plugins\/aawp\/assets\/img\/icon-check-prime.svg\" height=\"16\" width=\"55\" alt=\"Amazon Prime\" \/><\/a>        <\/div>\n\n                <a class=\"aawp-button aawp-button--buy aawp-button--icon aawp-button--icon-black\" href=\"https:\/\/www.amazon.de\/dp\/3832803033?tag=ekiwide0b-21&#038;linkCode=ogi&#038;th=1&#038;psc=1&#038;keywords=books%20vba\" title=\"Buy on Amazon\" target=\"_blank\" rel=\"nofollow noopener sponsored\">Buy on Amazon<\/a>\n            <\/div>\n\n<\/div>\n\n    \n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>With VBA you can write many helpful macros to automate work processes within Microsoft Office. A very useful function would<\/p>\n","protected":false},"author":2,"featured_media":13630,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1555],"tags":[2252,2251,1702,1703,1704,2168,1700,1705,1706,1558],"class_list":["post-24960","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software-en","tag-code-en","tag-coding-en","tag-excel-en","tag-microsoft-en","tag-microsoft-office-en","tag-office-en","tag-programming","tag-vba-en","tag-visual-basic-for-application-en","tag-windows-en"],"_links":{"self":[{"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/posts\/24960","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/comments?post=24960"}],"version-history":[{"count":0,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/posts\/24960\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/media\/13630"}],"wp:attachment":[{"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/media?parent=24960"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/categories?post=24960"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/tags?post=24960"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}