Recently, I was asked about monitoring Microsoft DHCP IP Address Pools using Log Insight to alert when the pool was exhausted and DHCP requests were failing. There are a couple ways to do this, but I’d like to cover two as a demonstration of getting a bit fancy with your alert queries and it paying off big time!
First off, Microsoft DHCP Servers write their events to a log file – at the end of the day…. so we can parse that file for an Event ID of 14 to see when we ran out. This is easy to do as shown below using Event ID 11 (DHCP Renew) as an example. The regex is simple but unfortunately we get the information way too late!
Enter the Log Insight Agent’s ability to read Windows Event Logs! As your DHCP Server starts running low on available addresses in a certain pool it starts to throw warnings in the System Event Log with an Event ID of 1376 that state what percent is currently used and how many addresses are still available.
It would be really cool if we could have Log Insight fire off an alert if these messages showed that we were above 90% used, right? But it’s text… how do we do math on text in log messages? The good news is that not only can you accomplish this; it’s easy to do!
First off, we need to create an Extracted Field that allows us to treat the value of percentage used as an integer. Simply highlight the number and select “Extract Field”
I’ve been trying to get the Zenoss SQL Transaction Zenpack working so that we can use Zenoss to run SQL queries for specific monitoring purposes and ran into a few things that might be worth sharing.
Using tsql for troubleshooting
Zenoss, among many other tools uses pymssql to connect to your SQL Servers; and pymssql uses FreeTDS behind the scenes. If you can’t get pymssql to work them you can go a layer deeper to see if you can find the issues. In my case I have the following configuration:
Fedora Server 23
First off, FreeTDS uses a config file at /etc/freetds.conf that has a [Global] section and examples for configuring individual server types. This is important because you need to use TDS version 7.0+ for Windows Authentication to work.
If we try to connect using the diagnostic tool tsql (not to be confused with the language T-SQL) without changing the default TDS version or adding a server record in the config file our attempts will fail
To fix this you can either:
Change the Global value for “tds version” to be 7+ (sounds like a good idea to me if you only have MSSQL):
or you can add a server record for each Microsoft SQL Server and leave the global version less than 7.
It’s basically a simplified version of the example on the pymssql web page, but will prove if pymssql and MSSQL Windows Authentication is working or not.
print(‘Connecting to SQL’)
conn = pymssql.connect(server=’server.domain.com’, user=’DOMAIN\username’, password=’Super Secret P@ssW0rds’, database=’master’)
cursor = conn.cursor()
SELECT MAX(req.total_elapsed_time) AS [total_time_ms]
FROM sys.dm_exec_requests AS req
WHERE req.sql_handle IS NOT NULL
row = cursor.fetchone()
row = cursor.fetchone()
This post is a continuation of Part 1; I think I shall call it “Help, my ntbtlog.txt isn’t being written to disk and I’m flying blind”
Ok, now I need more data because I’m not getting anywhere. Fortunately Windows still has the option to log kernel debugging over serial. A feature I wasn’t aware existed util today. That brings up the big question: how do I make that work on a VM and a physical device without a serial port?
First you need to enable virtual printers in VMware Workstation under Edit > Preferences. Without this enabled Workstation can’t attach to named pipes.
I have a Windows Server 2012 VM that will not boot past the Windows splash screen but throws a BSOD with the error “SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (NETIO.SYS). It’s been a long while since working on troubleshooting Windows (I primarily use CentOS) but here’s what I’ve found. I don’t have the solution yet but I’m recording some tidbits that I found so I will have them later.
First a bit of preamble:
1. Advanced Boot Options – When you select “Enable Boot Logging” this is supposed to write a log file named ntbtlog.txt. However, in this particular case that never happens. This is presumably because it is before the appropriate driver is loaded to write log files. However with 2012 this is conjecture since the latest Microsoft documentation that I can find applies to Server 2000. Regardless of reason, it isn’t captured in this instance.
2. This VM was originally running on ESXi but I have exported and OVF to my local VMware Workstation for my troubleshooting.
3. In the below operations I will be referencing “d:” which is actually the c: of the server. It is referenced from the rescue command prompt as d: on my system.
Step 1: Boot to the command prompt from the troubleshooting menu in the Automatic Repair wizard
Step 2: Run a chkdsk to verify the filesystem is in working order. My scan came back with required repairs which it corrected. Subsequent scans come back clean.
Command: chkdsk d: /f