Barcodes

webPDF supports the recognition and creation of barcodes ("Barcode" web service) with various common barcode formats.

 

The mode for the "Barcode" service is set with the "<add>" (generation) and "<detect>" (recognition) parameters. Moreover, the various parameters can be used to configure the way in which the generation or recognition operation will work.

 

Recognition mode

 

In recognition mode, the web service will search for all selected barcode formats in the selected area in the PDF document. Depending on the selected option, the web service will structure the corresponding results in the form of a JSON or XML document.

 

tipp

While doing so is not an absolute necessity for most barcode formats, limiting the number of pages scanned and the size of the area scanned on these pages can significantly reduce the analysis time required for the recognition operation.

 

hint

If there are multiple barcodes of the same format on the same page, it is extremely advisable to limit the size of the area being scanned, as the detection operation may fail otherwise. In this case, it is recommended to select a separate area for each barcode and recognize these barcodes separately.

 

hint

The list of results provided in recognition mode will never contain duplicates for a scanned page. If a barcode featuring the exact same content and format is found multiple times on a page, it will only be listed once.

 

Generation mode

 

In generation mode, the web service will generate a barcode in the selected format and place it on as many pages as you want in the passed PDF document. The output document will always be a PDF document.

 

Supported formats

 

The following barcode formats are supported for both generation and recognition.

 

One-dimensional (linear) barcodes

One-dimensional barcodes are normally linear barcodes that are used to encode values with a sequence of bars with different thicknesses. In this type of barcode format, only this sequence is relevant, i.e., the bars’ height is not important. In fact, this is the reason why these barcodes are called one-dimensional barcodes. Accordingly,1D barcodes normally involve few, if any, requirements concerning the barcode height. In contrast, their width is subject to strict rules, as the sequence of empty spaces and bars, and their width ratio in particular, must strictly adhere to the relevant specifications without fail.

 

Codabar

A-D, 0-9,

6 special characters

Capacity:

16 symbols +4 optional characters as start and stop symbols

Encoded value: A1234567890A

codabar

 

The Codabar linear code was originally developed for the retail industry, but only plays a secondary role in it nowadays. Now, it is primarily used by libraries, photo labs, blood banks, and other specific businesses. Normally, the start and stop symbols provide information about the purpose of the encoded information. Thanks to its typically large spaces and bar thicknesses, Codabar remains easy to read at low resolutions, as well as in printouts with poor quality. However, the format has little information capacity and requires considerable space, rendering these advantages less useful than would be expected.

 

Code 39

Character set:

A-Z, 0-9,

5 special characters

Capacity:

Variable

Encoded value: *WEBPDF$*

code 39

 

The Code 3 of 9 barcode is named that way because of the fact that three of the nine elements (bars) used to encode a codeword are wider than the others. Code 39 can optionally be used with a check digit, but in general is already considered to be self-checking due to its codeword structure. Thanks to its large character set, its variable length, and the ease with which it can be generated, this barcode is widely used by a large number of industries, including the electronics, chemical, warehousing, and shipping industries, among others. The asterisk symbol is always used for the start and stop symbols. However, both the start and stop symbols are also often left out when entering/outputting a Code 39 barcode and are handled automatically in the background. Unfortunately, the format has a very low information density per unit of space, and its character set is more limited than that available with Code 128, for instance.

 

Code 128

Character set:

128A:

A-Z, 0-9,

ASCII special characters,

ASCII control characters,

FNC 1-4

 

128B:

A-Z, a-z, 0-9,

ASCII special characters,

FNC 1-4

 

128C:

00-99

 

Capacity:

Variable

Encoded value: webPDF

code 128

Code 128 is named that way because of the fact that is supports all 128 ASCII characters. Code 128 features a couple of special characteristics, the first being the fact that it is possible to switch between 3 different character sets within the same barcode, which provides greater information density and expands the available range of characters even further. These three character sets do not constitute independent formats, as a Code 128 barcode will normally switch between all three of them as necessary in order to encode contents in as compact a manner as possible. In addition to this, Code 128 offers 4 FNC codes. Out of these, FNC4 extends the character set by adding all LATIN-1 (ISO 8859-1) characters. This format is extremely common worldwide, particularly in the packaging and shipping industries. It features its own specialized start and stop symbols, as well as the option of generating a checksum.

 

EAN-13

Character set:

0-9

Capacity:

13 digits

Encoded value: 5901234123457

ean 13

 

The "European Article Number 13" format is a widely used barcode format used for product labelling in the retail industry – the number 13 refers to the barcode’s maximum capacity of 13 digits. The main reason why this barcode is useful is the fact that it has a set length and strictly standardized contents. More specifically, an EAN-13 barcode consists of a GS1 country code (GS1 Prefix), a company code, a product code, and a check digit. This means that the format is very easy and quick to read, as well as to enter manually. There is also added flexibility in the fact that the country code can be replaced, for example, with an internal code at supermarkets in order to specify the relevant product’s use.

 

EAN-8

Character set:

0-9

Capacity:

8 digits

Encoded value: 65833254

ean 8

 

The "European Article Number 8" format is a shorter version of the EAN-13 barcode – the number 8 refers to the barcode’s maximum capacity of 8 digits. The main reason why this barcode is useful is the fact that it not only has a set length, but also one that is comparably very short. The EAN-8 barcode is primarily intended for labelling products for which an EAN-13 barcode would be too long. An EAN-8 barcode consists of a GS1 country code (GS1 Prefix), a product code, and a check digit. This means that the format is very easy and quick to read, as well as to enter manually.

 

UPC-A

Character set:

0-9

Capacity:

12 digits

Encoded value: 03600029145

upc a

 

Much like EAN barcodes, the Universal Product Code is a format for labelling products in the retail industry. UPC is compatible with EAN codes, and is the only format accepted for product labelling in the USA and Canada. Its main difference from EAN-13 is the numbering system digit that is found at its beginning:

0 - Normal UPC code

2 - Products sold by weight

3 - NDC (National Drug Code) and HRI (Health Related Items) codes, i.e., medical products

4 - Unrestricted UPC code

5 - Coupon

6 - Normal UPC code

7 - Normal UPC code

Digits 1, 8, and 9 are reserved for later assignment. The second through sixth digits are used to indicate the product’s manufacturer, with this information being followed by the product number and, finally, a check digit. In contrast to EAN barcodes, UPC codes are accepted worldwide and accordingly are preferred primarily by companies with international operations. The format is very easy and quick to read, as well as to enter manually.

 

ITF

0-9

14 digits

Encoded value: 98765432109213

itf

 

Just like UPC and EAN barcodes, Interleaved 2 of 5 (or, more properly, ITF-14) is a barcode format used by the retail industry. However, it is used primarily for labelling shipping packages and pallets. The first digit specifies the type of packaging, the next 12 digits contain the product number, and the final digit is the check digit. The format’s name comes from the way in which information is stored in it – it encodes pairs of digits, with the first digit being encoded in five bars and the second digit being encoded in the five spaces that follow these bars. The advantage of this approach is that it allows for a relatively high information density.

The original ITF format did not have any character limitations, but it has also fallen into disuse.

 

Two-dimensional barcodes

In two-dimensional barcodes, a value is encoded in a two-dimensional plane with the use of black and white pixels. 2D barcodes usually have a significantly greater information capacity than linear barcodes, but their higher complexity also means that, in some cases, they are considerably more prone to image errors. This, in turn, means that they need an error correction method. Both the height and width of 2D barcodes are subject to strict rules, as every pixel on the code can potentially contain important information. Accordingly, these formats very frequently involve requirements concerning the available heights and widths, width-to-height ratios, and the geometric shape of the barcode in general.

 

Data Matrix

Character set:

ASCII (1-255)

Capacity:

Variable

Encoded value: webPDF

data matrix

 

The contents in a Data Matrix code are encoded in the data region by using filled and empty cells. Depending on the selected type, these barcodes will have either a rectangular or square basic shape. Data Matrix barcodes feature a solid line at the left and bottom margins and segmented lines at the top and right margins – on one hand, this makes it possible to locate the barcode; on the other hand, it makes it possible to determine whether the barcode has been rotated. Data Matrix barcodes feature an integrated error correction mechanism based on the Reed-Solomon algorithm. This mechanism ensures that parts of the matrix can be recovered even if the code has been heavily damaged.

 

hint

When using the web service to recognize Data Matrix barcodes, it is absolutely necessary to make sure that the image area being scanned is limited to the Data Matrix barcode so that the barcode will be as centred within the area as possible. It is important to avoid sources of error such as text and other images as much as possible. In addition, Data Matrix barcodes require a "quiet zone" (a frame) around them without fail. The width of this quiet zone must be at least equal to the length of an encoding symbol’s side.

Example: If a Data Matrix cell is 2 pixels by 2 pixels, the quiet zone must have a width of at least 2 pixels.

 

QR code

Character set:

ASCII (1-255)

Capacity:

Variable

Encoded value: webPDF

qr code

 

In Quick Response Codes, information is encoded in a manner similar to that used for Data Matrix codes, with filled and empty squares being used in a basic square shape. QR codes are optimized in such a way that they can be automatically recognized and read as quickly as possible. In fact, they are a very popular way to store information (such as a web address) in a format that can be easily recognized and read by cell phones (mobile tagging). Normally, QR codes feature three position markers that make it easier for scanners to recognize the barcode and its orientation. The maximum conceivable information content for a QR code is 2,956 kB, but the actual capacity will also depend on the selected error correction level. This error correction level indicates the percentage of encoded data that it will be possible to restore with the Reed-Solomon algorithm (Low: 7%; Medium: 15%; Quartile: 25%; High: 30%). The higher the recovery percentage, the lower the remaining barcode capacity – however, higher levels also ensure that the barcode can sustain a greater amount of damage before becoming unreadable.

 

Aztec

Character set:

ASCII (0-127), extended ASCII

Capacity:

Variable

Encoded value: webPDF

aztec

 

In Aztec Codes, information is encoded with the use of empty and filled squares arranged concentrically around a square core. This core not only makes it possible to recognize the barcode, but also indicates its orientation. The resulting structure is reminiscent of stepped pyramids, which is where the format gets its name from. Each layer around the core is made up of two rings of encoding symbols, and the fact that each additional layer has longer sides means that it can represent more data. Layers are added outwards starting from the centre, meaning that the longer the encoded message, the more space an Aztec Code will need. Aztec barcodes feature an integrated error correction mechanism that is based on the Reed-Solomon algorithm and that can be configured to occupy any percentage of the barcode’s symbol capacity. This mechanism ensures that parts of the matrix can be recovered even if the code has been heavily damaged. To date, Aztec Codes have been used primarily to label pharmaceutical products, as well as for tickets for public transportation.

 

hint

When using the web service to recognize Aztec barcodes, it is absolutely necessary to make sure that the image area being scanned is limited to the Aztec barcode so that the barcode will be as centred within the area as possible.

 

PDF417

Character set:

ASCII

Capacity:

Variable

Encoded value: webPDF

PDF 417

 

The "Portable Data File 417" barcode format is used first and foremost to encode relatively large amounts of data. Each code pattern consists of 4 bars and 4 empty spaces and has a length of 17 encoding units, which is where the 417 number comes from. PDF417 barcodes can consist of 3 to 90 rows, with each individual row essentially representing a linear barcode that also contains information regarding its content, row number, etc. The fact that the individual rows are independent from each other means that PDF417 barcodes can be read by most linear scanners as well,. This sets this type of barcode apart from all other 2D barcodes, which require more complex image recognition. PDF417 barcodes feature an inner 8-level Reed-Solomon error correction algorithm, and the higher the level, the more resistant a barcode will be to damage. In addition, PDF 417 barcodes have the option of qualifying encoded content – a specific number of codewords is required for an individual codeword depending on the selection –, making it possible to make these barcodes more compact:

 

Text - each codeword represents two letters.

Byte - each 5 codewords represent 6 bytes.

Numeric - up to 15 codewords represent numbers with a length of up to 44 digits.

 

Output formats

 

In recognition mode, the Barcode web service has two possible output formats for the document that will contain the barcodes found. This output format can be defined with the "outputFormat" option in the parameters.

 

tipp

The document’s format is described by the http://schema.webpdf.de/1.0/extraction/barcode.xsd schema.

 

XML

 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<barcodes xmlns="http://schema.webpdf.de/1.0/extraction/barcode">
  <barcode type="qrcode" page="61" errorCorrectionLevel="L">
      <rectangle coordinates="user" x="407.0" y="73.0" width="43.5" height="44.0"/>
      <rectangle coordinates="pdf" x="407.0" y="725.0" width="43.5" height="44.0"/>
      <plain>webPDF</plain>
  </barcode>
   ....
</barcodes>

 

Every "<barcode>" element found in "<barcodes>" represents one recognized barcode, with its "page" and "type" attributes providing the page and format for it. In addition, there may be additional metainformation after the attributes. This metainformation may contain further information regarding the barcode depending on the specific barcode format involved.

 

The "<plain>" element contains the barcode’s decoded value, while the "<rectangle>" elements contain the position of the barcode on the corresponding page.

 

JSON

 

The structure in JSON format corresponds to the contexts of the XML structure.

 

{
"barcodes" : {
  "barcode" : [ {
    "rectangle" : [ {
      "x" : 407.0,
      "y" : 73.0,
      "width" : 43.5,
      "height" : 44.0,
      "coordinates" : "user"
     }, {
      "x" : 407.0,
      "y" : 725.0,
      "width" : 43.5,
      "height" : 44.0,
      "coordinates" : "pdf"
     } ],
    "plain" : "webPDF",
    "type" : "qrcode",
    "page" : 61,
    "errorCorrectionLevel" : "L"
   },.... ]
 }
}