SautinSoft.Pdf can read PDF files from C# or VB.NET applications at very high speeds; it can read the text of a 1,000 page PDF file (almost 500,000 words) in just 3 seconds.
Text extraction is fairly easy to perform. With a simple API and just a few lines of code, the entire text content from a PDF file can be extracted in a single String, ready for your further processing.
The following example shows how to easily read the text content of each page of a PDF document.
Complete code
using System;
using System.IO;
using SautinSoft;
using SautinSoft.Pdf;
using SautinSoft.Pdf.Content;
namespace Sample
{
class Sample
{
/// <summary>
/// Create a page tree.
/// </summary>
/// <remarks>
/// Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/read-text-from-pdf-files.php
/// </remarks>
static void Main(string[] args)
{
// Before starting this example, please get a free 30-day trial key:
// https://sautinsoft.com/start-for-free/
// Apply the key here:
// PdfDocument.SetLicense("...");
string pdfFile = Path.GetFullPath(@"..\..\..\simple text.pdf");
// Load PDF Document.
using (var document = PdfDocument.Load(pdfFile))
{
foreach (var page in document.Pages)
{
// Write text from pdf file to console.
Console.WriteLine(page.Content.ToString());
}
}
}
}
}
If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: