Commit 28e8c0f145f1f3e8434456b43d4a839a4b01b3f8

Authored by John Snowdon
1 parent 5183074e1f
Exists in master

added updates as per working script on the tower of power

... ... @@ -1,13 +0,0 @@
1   -A simple machine monitor and performance stats utility written in Python for Linux systems.
2   -
3   -Gathers system performance data and outputs to terminal (for debugging purposes) or to a USB/Serial LCD display such as the Adafruit USB/Serial shield and 19x2 LCD module.
4   -
5   -The use of this is slanted towards embedded Linux platforms that run headless, in the specific case the data acquisition class also gathers information related to the status of running services related to OpenStack for test micro VM systems.
6   -
7   -The Python classes require the following packages:
8   -* socket
9   -* urllib
10   -* psutil
11   -
12   -John Snowdon
13   -December 2015
README.md
... ... @@ -0,0 +1,65 @@
  1 +# Overview
  2 +
  3 +A simple machine monitor and performance stats utility written in Python for Linux systems.
  4 +
  5 +Gathers system performance data and outputs to terminal (for debugging purposes) or to a USB/Serial LCD display such as the Adafruit USB/Serial shield and 19x2 LCD module.
  6 +
  7 +The use of this is slanted towards embedded Linux platforms that run headless, in the specific case the data acquisition class also gathers information related to the status of running services related to OpenStack for test micro VM systems.
  8 +
  9 +The Python classes require the following packages:
  10 +socket
  11 +urllib
  12 +psutil
  13 +
  14 +## Useage
  15 +
  16 +Edit the `settings.py` configuration file to your requirements. Use the `run.py` file to start a single instance of the LCD output display. The `run.sh` script contains a simple shell script `while loop` to reconnect the LCD module if it is disconnected, or not present at start up.
  17 +
  18 +To automatically run on boot, add the following line to `/etc/rc.local`
  19 +
  20 +```
  21 +bash run.sh 2>&1 | tee /var/log/lcd.log
  22 +```
  23 +
  24 +## Settings
  25 +
  26 +`INTERVAL` - How many loop counts between display refresh and data gathering. A value of 1 is the minimum and means to gather new performance data between each display refresh. This also consumes the most CPU time. Takes: int
  27 +
  28 +`SERVICE_INTERVAL` - How many loop counts between refreshing system service health data. These are mostly gathering through relatively crude mechanisms such as `os.system()`, and the checks themselves may be quite long (e.g. connecting to MySQL, or retriving a HTML page from Apache) so don't set this value too low, or your display will constantly pause while refreshing this health information. Takes: int
  29 +
  30 +`ZAP_INTERVAL` - How often to reset the counters used for IO or system load calculations. Mainly used to prevent memory ballooning; shouldn't need to be altered. Takes: int
  31 +
  32 +`DETAIL_1_INTERVAL` through `DETAIL_4_INTERVAL` - Controls how many display loops until the display progresses to the next page. The cycle is: CPU and IP Address, CPU and System Load Average, CPU and Disk space and then finally CPU and RAM use. The display then loops around back to CPU and IP Address. Takes: int
  33 +
  34 +`DETAIL_DURATION` - How long (in refresh loops) each of the four detail pages is shown, before cycling back to the main (read: first) page. Takes: int
  35 +
  36 +`DISK_PATH` - A path that is used to check for space useage, as displayed in the `DETAIL_3_INTERVAL` page. NFS. Takes: string
  37 +
  38 +`NFS_PATH` - A path that we should check is mounted correctly over NFS. Takes: string
  39 +
  40 +`ETH_DEVICE` - The primary network interface of the machine, the device which we assume has the primary IP address used to display in the `DETAIL_1_INTERVAL` page. Takes: string
  41 +
  42 +`ETH_DEVICE` - A list of all network interfaces which should be up, configured and connected for the machine. If either unconfigured, disconnected or otherwise unavailable, an error is raised. Takes: Python list of strings
  43 +
  44 +`DISPLAY_WIDTH` - The number of characters shown on an output device (terminal or LCD) before scrolling the display right to left. Takes: int
  45 +
  46 +`SCROLL_SPEED` - Time (in seconds) to pause between right to left scrolling refresh. Takes: int OR float
  47 +
  48 +`SERVICE_CHECK_TYPE` - This parameter affects whether service checks are carried out from a client or server perspective. For example, the checks for a Nova or Neutron client are fundamentally different than if the machine is hosting those services. Similarly, NFS client checks are different than when checking for the Kernel NFS server. See the section in `data.py` that lists the service checks for more information. Takes: "server" OR "client"
  49 +
  50 +`SERVICE_CHECKS` - A list of the service checks that should be run for this particular host. Takes: Python list of strings.
  51 +
  52 +`ERR_` Messages - Preset error text that is shown when a service or health check is failed. Customise as required. Takes: string
  53 +
  54 +`CPU_HIGH` - Controls the colour of the LCD screen when CPU use rises above a set value classed as `high`. Takes: Python dictionary with the keys 'level' - CPU use level, above which activates this colour scheme, 'r' - Red LED intensity, 'g' - Green LED intensity, 'b' - Blue LED intensity.
  55 +
  56 +`CPU_MED` - Controls the colour of the LCD screen when CPU use rises above a set value classed as `medium`. Takes: Python dictionary with the keys 'level' - CPU use level, above which activates this colour scheme, 'r' - Red LED intensity, 'g' - Green LED intensity, 'b' - Blue LED intensity.
  57 +
  58 +`CPU_LOW` - Controls the colour of the LCD screen when CPU use rises above a set value classed as `low`. Takes: Python dictionary with the keys 'level' - CPU use level, above which activates this colour scheme, 'r' - Red LED intensity, 'g' - Green LED intensity, 'b' - Blue LED intensity.
  59 +
  60 +`CPU_IDLE` - Controls the colour of the LCD screen when CPU use falls below a set value classed as `idle`. Takes: Python dictionary with the keys 'level' - CPU use level, below which activates this colour scheme, 'r' - Red LED intensity, 'g' - Green LED intensity, 'b' - Blue LED intensity.
  61 +
  62 +# Author
  63 +
  64 +John Snowdon (john.snowdon@newcastle.ac.uk)
  65 +December 2015, updated October 2016.
... ...
data.py
... ... @@ -228,61 +228,132 @@ class data():
228 228 s = socket.create_connection((self.network_address, "22"))
229 229 s.close()
230 230 except Exception as e:
231   - if settings.ERR_SERVICE_SSH not in self.errors:
  231 + if settings.ERR_SERVICE_SSH in self.errors:
  232 + pass
  233 + else:
232 234 self.errors.append(settings.ERR_SERVICE_SSH)
233 235  
234 236 # Check mysql
235 237 if "mysql" in settings.SERVICE_CHECKS:
236   - try:
237   - s = socket.create_connection((self.network_address, "3306"))
238   - s.close()
239   - except Exception as e:
240   - if settings.ERR_SERVICE_MYSQL not in self.errors:
  238 + port_open = False
  239 + i = psutil.net_if_addrs()
  240 + # Check that MySQL is listening on at least one of the defined
  241 + # interfaces in our list
  242 + for dev in settings.ETH_DEVICES:
  243 + print(dev)
  244 + for iface in i[dev]:
  245 + if iface.family == 2:
  246 + #print("Checking MySQL on %s:3306" % iface.address)
  247 + try:
  248 + s = socket.create_connection((iface.address, "3306"))
  249 + s.close()
  250 + port_open = True
  251 + #print("Connected OK")
  252 + except Exception as e:
  253 + pass
  254 + if port_open:
  255 + pass
  256 + else:
  257 + if settings.ERR_SERVICE_MYSQL in self.errors:
  258 + pass
  259 + else:
241 260 self.errors.append(settings.ERR_SERVICE_MYSQL)
242 261  
243 262 # check apache
244 263 if "httpd" in settings.SERVICE_CHECKS:
245 264 try:
246   - u = urllib.urlopen(self.network_address)
  265 + u = urllib.urlopen("http://" + self.network_address)
247 266 if u.getcode() != 200:
248   - if settings.ERR_SERVICE_HTTPDa not in self.errors:
  267 + if settings.ERR_SERVICE_HTTPDa in self.errors:
  268 + pass
  269 + else:
249 270 self.errors.append(settings.ERR_SERVICE_HTTPDa)
250 271 except Exception as e:
251   - if settings.ERR_SERVICE_HTTPDb not in self.errors:
  272 + print("Exception while connecting to httpd service: %s" % e)
  273 + if settings.ERR_SERVICE_HTTPDb in self.errors:
  274 + pass
  275 + else:
252 276 self.errors.append(settings.ERR_SERVICE_HTTPDb)
253 277  
254 278 # check openstack keystone - identity
255 279 if "keystone" in settings.SERVICE_CHECKS:
256 280 if settings.SERVICE_CHECK_TYPE == "server":
257 281 # Check keystone server
258   - pass
  282 + t = os.system("keystone discover 2>/dev/null | grep 'Keystone found at'")
  283 + if t != 0:
  284 + if settings.ERR_SERVICE_KEYSTONEa in self.errors:
  285 + pass
  286 + else:
  287 + self.errors.append(settings.ERR_SERVICE_KEYSTONEa)
259 288  
260 289 # check openstack nova - compute
261 290 if "nova" in settings.SERVICE_CHECKS:
262 291 if settings.SERVICE_CHECK_TYPE == "server":
263 292 # Check nova server daemons
264   - pass
  293 + t = os.system("systemctl status openstack-nova-api")
  294 + if t != 0:
  295 + if settings.ERR_SERVICE_NOVAa in self.errors:
  296 + pass
  297 + else:
  298 + self.errors.append(settings.ERR_SERVICE_NOVAa)
265 299 else:
266 300 # Check nova client daemons
267   - pass
  301 + t = os.system("systemctl status openstack-nova-compute")
  302 + if t != 0:
  303 + if settings.ERR_SERVICE_NOVAb in self.errors:
  304 + pass
  305 + else:
  306 + self.errors.append(settings.ERR_SERVICE_NOVAb)
268 307  
269 308 # check openstack cinder - block
270 309 if "cinder" in settings.SERVICE_CHECKS:
271 310 if settings.SERVICE_CHECK_TYPE == "server":
272   - # Check cinder server
273   - pass
274   -
  311 + # Check cinder
  312 + t = os.system("systemctl status openstack-cinder-api")
  313 + if t != 0:
  314 + if settings.ERR_SERVICE_CINDER in self.errors:
  315 + pass
  316 + else:
  317 + self.errors.append(settings.ERR_SERVICE_CINDERa)
  318 +
  319 + # Check cinder volume
  320 + if "cinder-volume" in settings.SERVICE_CHECKS:
  321 + if settings.SERVICE_CHECK_TYPE == "server":
  322 + # Check cinder volume
  323 + t = os.system("systemctl status openstack-cinder-volume")
  324 + if t != 0:
  325 + if settings.ERR_SERVICE_CINDER in self.errors:
  326 + pass
  327 + else:
  328 + self.errors.append(settings.ERR_SERVICE_CINDERb)
  329 +
  330 +
275 331 # check openstack glance - image/iso
276 332 if "glance" in settings.SERVICE_CHECKS:
277 333 if settings.SERVICE_CHECK_TYPE == "server":
278 334 # Check glance server
279   - pass
  335 + t = os.system("systemctl status openstack-glance-api")
  336 + if t != 0:
  337 + if settings.ERR_SERVICE_GLANCE in self.errors:
  338 + pass
  339 + else:
  340 + self.errors.append(settings.ERR_SERVICE_GLANCE)
280 341  
281 342 # check openstack neutron - networking
282 343 if "neutron" in settings.SERVICE_CHECKS:
283 344 if settings.SERVICE_CHECK_TYPE == "server":
284 345 # Check neutron server related daemons
285   - pass
  346 + t = os.system("systemctl status neutron-server")
  347 + if t != 0:
  348 + if settings.ERR_SERVICE_NEUTRONa in self.errors:
  349 + pass
  350 + else:
  351 + self.errors.append(settings.ERR_SERVICE_NEUTRONa)
286 352 else:
287 353 # Check neutron client related daemons
288   - pass
  354 + t = os.system("systemctl status neutron-linuxbridge-agent")
  355 + if t != 0:
  356 + if settings.ERR_SERVICE_NEUTRONb in self.errors:
  357 + pass
  358 + else:
  359 + self.errors.append(settings.ERR_SERVICE_NEUTRONb)
... ...
output.py
... ... @@ -94,7 +94,6 @@ class Output():
94 94 # Show CPU
95 95 # Show IP address
96 96 if (interval >= settings.DETAIL_1_INTERVAL) and (interval <= (settings.DETAIL_1_INTERVAL + settings.DETAIL_DURATION)):
97   - #print "CPU %2d%% %2d%% %2d%% %2d%%" % (data.cpu_percent[0], data.cpu_percent[1], data.cpu_percent[2], data.cpu_percent[3])
98 97 self.matrix.matrixWrite("CPU %2d%% %sMHz" % (data.cpu_percent_avg, data.cpu_mhz))
99 98 self.matrix.matrixWrite("IP %s" % (data.network_address))
100 99 return
... ... @@ -103,7 +102,6 @@ class Output():
103 102 # Show CPU
104 103 # Show CPU load
105 104 if (interval >= settings.DETAIL_2_INTERVAL) and (interval <= (settings.DETAIL_2_INTERVAL + settings.DETAIL_DURATION)):
106   - #print "CPU %2d%% %2d%% %2d%% %2d%%" % (data.cpu_percent[0], data.cpu_percent[1], data.cpu_percent[2], data.cpu_percent[3])
107 105 self.matrix.matrixWrite("CPU %2d%% %sMHz" % (data.cpu_percent_avg, data.cpu_mhz))
108 106 self.matrix.matrixWrite("%s %s %s" % (data.load_average[0], data.load_average[1], data.load_average[2]))
109 107 return
... ... @@ -112,7 +110,6 @@ class Output():
112 110 # Show CPU
113 111 # Show Disk space
114 112 if (interval >= settings.DETAIL_3_INTERVAL) and (interval <= (settings.DETAIL_3_INTERVAL + settings.DETAIL_DURATION)):
115   - #print "CPU %2d%% %2d%% %2d%% %2d%%" % (data.cpu_percent[0], data.cpu_percent[1], data.cpu_percent[2], data.cpu_percent[3])
116 113 self.matrix.matrixWrite("CPU %2d%% %sMHz" % (data.cpu_percent_avg, data.cpu_mhz))
117 114 self.matrix.matrixWrite("DSK %s/%s" % (self.sizeof_fmt(num = data.disk_space), self.sizeof_fmt(num = data.disk_size)))
118 115 return
... ... @@ -121,7 +118,6 @@ class Output():
121 118 # Show CPU
122 119 # Show RAM use
123 120 if (interval == settings.DETAIL_4_INTERVAL):
124   - #print "CPU %2d%% %2d%% %2d%% %2d%%" % (data.cpu_percent[0], data.cpu_percent[1], data.cpu_percent[2], data.cpu_percent[3])
125 121 self.matrix.matrixWrite("CPU %2d%% %sMHz" % (data.cpu_percent_avg, data.cpu_mhz))
126 122 display_text = ""
127 123 t = "RAM %s of %s used, %s buffers." % (self.sizeof_fmt(num = data.ram_avail), self.sizeof_fmt(num = data.ram_total), (self.sizeof_fmt(num = data.ram_buffer)))
... ... @@ -143,7 +139,6 @@ class Output():
143 139 # Show the normal screen
144 140 # Show CPU
145 141 # Show total network and disk IO
146   - #print "CPU %2d%% %2d%% %2d%% %2d%%" % (data.cpu_percent[0], data.cpu_percent[1], data.cpu_percent[2], data.cpu_percent[3])
147 142 self.matrix.matrixWrite("CPU %2d%% %sMHz" % (data.cpu_percent_avg, data.cpu_mhz))
148 143 self.matrix.matrixWrite("IO %ss %ss" % (self.sizeof_fmt(num = data.stats_bandwidth["net_bytes_rw_bw"]), self.sizeof_fmt(num = data.stats_bandwidth["disk_bytes_rw_bw"])))
149 144 return
... ...
run.sh
... ... @@ -0,0 +1,13 @@
  1 +#!/bin/bash
  2 +
  3 +cd ~ncsteam/lcd
  4 +while true
  5 +do
  6 + echo "[`date`] - First attempt to kill any existing LCD console process"
  7 + su root -c "pkill run.py"
  8 + sleep 5
  9 + echo "[`date`] - Running new LCD console process"
  10 + su root -c "cd ~ncsteam/lcd ; python run.py"
  11 + echo "[`date`] - Terminated, clean up old LCD console process"
  12 + su root -c "pkill run.py"
  13 +done
... ...
settings.py
... ... @@ -9,7 +9,7 @@
9 9 INTERVAL = 1
10 10  
11 11 # How long between service check intervals
12   -SERVICE_INTERVAL = 10
  12 +SERVICE_INTERVAL = 30
13 13  
14 14 # How many loops before counters are reset
15 15 ZAP_INTERVAL = 50
... ... @@ -30,9 +30,9 @@ DISK_PATH = &quot;/var&quot;
30 30 NFS_PATH = "/"
31 31  
32 32 # Name of the primary network adapter (to determine our IP address)
33   -ETH_DEVICE = "eth1"
  33 +ETH_DEVICE = "enp0s20u4c2"
34 34 # List all of the network devices we should check are up and connected
35   -ETH_DEVICES = ["eth1"]
  35 +ETH_DEVICES = ["enp0s20u4c2", "enp2s0", "enp0s20u3c2"]
36 36  
37 37 # Serial display width - how many characters we can display before scrolling right to left
38 38 DISPLAY_WIDTH = 16
... ... @@ -46,8 +46,8 @@ SERVICE_CHECK_TYPE = &quot;server&quot;
46 46 #SERVICE_CHECK_TYPE = "client"
47 47  
48 48 # List of enabled service checks
49   -SERVICE_CHECKS = ["ssh"]
50   -#SERVICE_CHECKS = ["ssh", "mysql", "httpd", "keystone", "nova", "cinder", "glance", "neutron"]
  49 +SERVICE_CHECKS = ["ssh", "mysql", "http", "keystone", "nova", "cinder", "glance", "neutron"]
  50 +#SERVICE_CHECKS = ["ssh", "mysql", "httpd", "keystone", "nova", "cinder", "cinder-volume", "glance", "neutron", "nfs"]
51 51  
52 52 # Error messages
53 53 ERR_NETWORK_STATUS = "Network error, ETH interface unconfigured."
... ... @@ -60,18 +60,20 @@ ERR_SERVICE_HTTPDa = &quot;Apache httpd error, web page not available.&quot;
60 60 ERR_SERVICE_HTTPDb = "Apache httpd error, service not running."
61 61 ERR_SERVICE_KEYSTONEa = "Keystone error or service not running."
62 62 ERR_SERVICE_KEYSTONEb = "Keystone client error or unable to contact controller."
63   -ERR_SERVICE_NOVAa = "Nova error or service not running."
  63 +ERR_SERVICE_NOVAa = "Nova API error or service not running."
64 64 ERR_SERVICE_NOVAb = "Nova client error or unable to contact controller."
65   -ERR_SERVICE_CINDER = "Cinder error or service not running."
66   -ERR_SERVICE_GLANCE = "Glance error or service not running."
67   -ERR_SERVICE_NEUTRONa = "Neutron error or service not running."
  65 +ERR_SERVICE_CINDERa = "Cinder API error or service not running."
  66 +ERR_SERVICE_CINDERb = "Cinder volume error or service not running."
  67 +ERR_SERVICE_GLANCE = "Glance API error or service not running."
  68 +ERR_SERVICE_NEUTRONa = "Neutron API error or service not running."
68 69 ERR_SERVICE_NEUTRONb = "Neutron client error or unable to contact controller."
  70 +ERR_SERVICE_NFS = "NFS Kernel server not running."
69 71  
70 72 # Trigger levels for various metrics
71 73 # and the colour to change the backlight of the lcd module
72 74 # when that mode is active
73   -CPU_HIGH = { 'level' : 50, 'r': 255, 'g': 0, 'b': 0 }
74   -CPU_MED = { 'level' : 25, 'r': 255, 'g': 255, 'b': 0 }
75   -CPU_LOW = { 'level' : 10, 'r': 255, 'g': 165, 'b': 0 }
76   -CPU_IDLE = { 'level' : 0, 'r': 0, 'g': 255, 'b': 0 }
  75 +CPU_HIGH = { 'level' : 65, 'r': 255, 'g': 0, 'b': 0 }
  76 +CPU_MED = { 'level' : 40, 'r': 165, 'g': 255, 'b': 0 }
  77 +CPU_LOW = { 'level' : 20, 'r': 165, 'g': 165, 'b': 0 }
  78 +CPU_IDLE = { 'level' : 10, 'r': 0, 'g': 255, 'b': 0 }
77 79  
... ...