With our work deep into OpenDaylight, we support the community in many ways – project leadership, upstreaming code, bug fixes, and general Q&A. It’s this last piece we’ve decided to make publicly available on our blog. We regularly field questions from the community asking for our expert advice, input, or guidance on specific topics. In support of the community innovation process, we want to make sure everyone has access to this information and we open real dialog on these topics.

In this article, the question we’re answering is on the topic of tuning 3-node clusters. If you’re running an OpenDaylight controller at scale in a production environment, please review and comment, or contact us to discuss further.

NOTE: Use your preferred text editor to correct any indentation as-desired. The HOCON format used in most of these files can be auto-formatted like JSON.

  1. The module-shards.conf files is how you map named data shards to specific named cluster members. In this basic example, we want all shards to be replicated to all cluster members.

ODL: vi /opt/lumina/lsc/controller/configuration/initial/module-shards.conf

LSC: vi /opt/lumina/configuration/customer/lsc/configuration/initial/module-shards.conf

module-shards = [

   {

       name = “default”

       shards = [

           {

               name=”default”

               replicas = [

                   “member-1”

                   “member-2”

                   “member-3”

               ]

           }

       ]

   },

   {

       name = “topology”

       shards = [

           {

               name=”topology”

               replicas = [

                   “member-1”

                   “member-2”

                   “member-3”

                   ]

           }

       ]

   },

   {

       name = “inventory”

       shards = [

           {

               name=”inventory”

               replicas = [

                   “member-1”

                   “member-2”

                   “member-3”

                   ]

           }

       ]

   },

]

 

  1. The akka.conf is the primary method for configuring and tuning cluster members. You must change the “roles = ”, the “seed-nodes =”, and the “hostname = ” while all nodes are down. It’s far better to use FQDNs here instead of IP addresses, so you can follow a virtual env’s IPAM. You can also use cloud-init-type bootstrap daemons injecting variable substitution for the “roles” field. If you’re having difficulty with the cluster, change INFO to DEBUG.

ODL: vi /opt/lumina/lsc/controller/configuration/initial/akka.conf

LSC: vi /opt/lumina/configuration/customer/lsc/configuration/initial/akka.conf

 

odl-cluster-data {

 akka {

   loglevel = “”

   remote {

     netty.tcp {

       hostname = “192.168.1.171”

       port = 2550

     },

     use-passive-connections = off

   }

   actor {

     debug {

       autoreceive = on

       lifecycle = on

       unhandled = on

       fsm = on

       event-stream = on

     }

   }

   cluster {

     seed-nodes = [

       “akka.tcp://opendaylight-cluster-data@192.168.1.171:2550”,

       “akka.tcp://opendaylight-cluster-data@192.168.1.172:2550”,

       “akka.tcp://opendaylight-cluster-data@192.168.1.173:2550”

     ]

     seed-node-timeout = 15s

     roles = [“member-1”]

   }

   persistence {

     journal-plugin-fallback {

       circuit-breaker {

         max-failures = 10

         call-timeout = 90s

         reset-timeout = 30s

       }

       recovery-event-timeout = 90s

     }

     snapshot-store-plugin-fallback {

       circuit-breaker {

         max-failures = 10

         call-timeout = 90s

         reset-timeout = 30s

       }

       recovery-event-timeout = 90s

     }

   }

 }

}

 

odl-cluster-rpc {

 akka {

   loglevel = “”

   remote {

     netty.tcp {

       hostname = “192.168.1.171”

       port = 2551

     },

     use-passive-connections = off

   }

   actor {

     debug {

       autoreceive = on

       lifecycle = on

       unhandled = on

       fsm = on

       event-stream = on

     }

   }

   cluster {

       seed-nodes = [

         “akka.tcp://odl-cluster-rpc@192.168.1.171:2551”,

         “akka.tcp://odl-cluster-rpc@192.168.1.172:2551”,

         “akka.tcp://odl-cluster-rpc@192.168.1.173:2551”

       ]

       seed-node-timeout = 15s

   }

   persistence {

     journal-plugin-fallback {

       circuit-breaker {

         max-failures = 10

         call-timeout = 90s

         reset-timeout = 30s

       }

       recovery-event-timeout = 90s

     }

     snapshot-store-plugin-fallback {

       circuit-breaker {

         max-failures = 10

         call-timeout = 90s

         reset-timeout = 30s

       }

       recovery-event-timeout = 90s

     }

   }

 }

}

 

  1. The ..datastore.cfg file deals with various settings related to loading and working with the journals, shards, etc.

 

vi /opt/lumina/lsc/controller/etc/org.opendaylight.controller.cluster.datastore.cfg:

### Note: Some sites use batch-size of 1, not reflecting that here###

persistent-actor-restart-min-backoff-in-seconds=10

persistent-actor-restart-max-backoff-in-seconds=40

persistent-actor-restart-reset-backoff-in-seconds=20

shard-transaction-commit-timeout-in-seconds=120

shard-isolated-leader-check-interval-in-millis=30000

operation-timeout-in-seconds=120

 

  1. Afterward, clean up all data and restart the controller. When starting or stopping the entire cluster, try to do them all in VERY close succession to avoid race conditions.

ODL: rm -rf ./controller/cache/* ./controller/journal/* ./controller/snapshots/* ./controller/data/*   (then restart)

LSC: service lumina-lsc clean && service lumina-lsc configure && service lumina-lsc restart

tail -f /opt/lumina/lsc/log/karaf.out

…<CTRL-C> after initial startup…

…reminder to ignore all the “proxied object” exceptions; they’re INFO for a reason…

tail -f /opt/lumina/lsc/controller/data/log/karaf.log

See How Lumina Can Transform Your Network.