+1 (669) 231-3838 or +1 (800) 930-5144

Structify Text

Structures semi-structured text, useful when parsing command line output from networking devices.

What is it

If you’re reading this you’ve probably been tasked with programmatically retrieving information from a CLI driven device and you’ve got to the point
where you have a nice string of text and say to yourself, “wow I wish it just returned something structured that I could deal with like JSON or some other key/value format”.

Well that’s where

structifytext

tries to help. It lets you define the payload you wish came back to you, and with a sprinkle of the right regular expressions it does!

Installation

With pip: 
pip install structifytext

From source

make install

Usage

Pass your text and a “structure” (python dictionary) to the  parser  modules parse method.

from structifytext import parser
output = """
  eth0      Link encap:Ethernet  HWaddr 00:11:22:3a:c4:ac
            inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
            RX packets:147142475 errors:0 dropped:293854 overruns:0 frame:0
            TX packets:136237118 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:17793317674 (17.7 GB)  TX bytes:46525697959 (46.5 GB)

  eth1      Link encap:Ethernet  HWaddr 00:11:33:4a:c8:ad
            inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
            inet6 addr: fe80::225:90ff:fe4a:c8ad/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
            RX packets:51085118 errors:0 dropped:251 overruns:0 frame:0
            TX packets:3447162 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:4999277179 (4.9 GB)  TX bytes:657283496 (657.2 MB)
  """

struct = {
        'interfaces': [{
            'id': '(eth\d{1,2})',
            'ipv4_address': 'inet addr:(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})',
            'mac_address': 'HWaddr\s((?:[a-fA-F0-9]{2}[:|\-]?){6})'
          }]
       }

parsed = parser.parse(output, struct)
print parsed
This will return the python dictionary
{
 'interfaces': [
      {
          'id': 'eth0',
          'ipv4_address': '192.168.1.2',
          'mac_address': '00:11:22:3a:c4:ac'
      },
      {
          'id': 'eth1',
          'ipv4_address': '192.168.1.3',
          'mac_address': '00:11:33:4a:c8:ad'
      }
  ]
}
Which you can then do with as you please, maybe return as JSON as part of a REST service…

The Struct

A struct or structure or payload or whatever have you, is just a dictionary that resembles what you wish to get back.

With the values either being a dictionary { }, a list [ ], or a regular expression string [a-z](\d) with one group (to populate the value).

The structure is recursively parsed, populating the dictionary/structure that was provided with values from the input text.

Quite often, similar sections of semi-structured text are repeated in the text you are trying to parse.

To parse these sections of text, we define a dictionary with key of either id  or block_start the difference being block_start key/value is dropped from the resulting output.

This id or block_start marks the beginning and end for each “chunk” that you’d like parsed.

You can forcefully mark the end of a “chunk” by specifying a block_end key and regex value.

An example is useful here.

E.g. The following structure.

{
        'tables': [
            {
                'id': '\[TABLE (\d{1,2})\]',
                'flows': [
                    {
                        'id': '\[FLOW_ID(\d+)\]',
                        'info': 'info\s+=\s+(.*)'
                    }
                ]
            }
        ]
    }
Will create a “chunk/block” from the following output
[TABLE 0] Total entries: 3
   [FLOW_ID1]
    info = related to table 0 flow 1
[TABLE 1] Total entries: 31
    [FLOW_ID1]
    info = related to table 1 flow 1
That will be parsed as:
{
        'tables': [{
        'id': '0',
        'flows': [{ 'id': '1', 'info': 'related to table 0 flow 1' }],
        }, {
        'id': '1',
        'flows': [{ 'id': '1', 'info': 'related to table 1 flow 1' }]
    }]
}

Pin It on Pinterest

Share This