Convert PDF to XML in memory using C# and VB .Net

Convert PDF to XML in memory using C# and VB .Net.


The application shows how to convert all tabular and even textual data from PDF to XML in memory. The output XML will be represented as System.String.

Complete code.

using System;
using System.IO;

namespace Sample
{
    class Sample
    {
        static void Main(string[] args)
        {
            string pathToPdf = @"..\..\Table.pdf";
            string pathToXml = Path.ChangeExtension(pathToPdf, ".xml");

            byte[] pdf = File.ReadAllBytes(pathToPdf);
            string xml = null;

            // Convert PDF file to XML file.
            SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();

            // This property is necessary only for registered version.
            //f.Serial = "XXXXXXXXXXX";

            // Let's convert all data (textual and tabular) to XML.
            f.XmlOptions.ConvertNonTabularDataToSpreadsheet = true;

            f.OpenPdf(pdf);

            if (f.PageCount > 0)
            {
                xml = f.ToXml();
                
                //Show XML document in browser
                if (!String.IsNullOrEmpty(xml))
                {
                    File.WriteAllText(pathToXml,xml);
                    System.Diagnostics.Process.Start(pathToXml);
                }
            }
        }
    }
}

Download.

        
            Imports System.IO
Imports System.Drawing.Imaging
Imports System.Collections.Generic
Imports SautinSoft

Module Sample

    Sub Main()
        Dim pathToPdf As String = "..\Table.pdf"
        Dim pathToXml As String = Path.ChangeExtension(pathToPdf, ".xml")

        Dim pdf() As Byte = File.ReadAllBytes(pathToPdf)
        Dim xml As String = Nothing

        ' Convert PDF file to XML file.
        Dim f As New SautinSoft.PdfFocus()

        ' This property is necessary only for registered version.
        'f.Serial = "XXXXXXXXXXX"

        ' Let's convert all data (textual and tabular) to XML.
        f.XmlOptions.ConvertNonTabularDataToSpreadsheet = True

        f.OpenPdf(pdf)

        If f.PageCount > 0 Then
            xml = f.ToXml()

            'Show XML document in browser
            If Not String.IsNullOrEmpty(xml) Then
                File.WriteAllText(pathToXml, xml)
                System.Diagnostics.Process.Start(pathToXml)
            End If
        End If
    End Sub
End Module

Download.


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:



Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.