pyenv on Mac OS 10.15 Catalina

I have been running pyenv from homebrew on Mac so I could run Python3 rather than 2.7. However when I upgraded to Catalina I ran into an issue that pyenv wasn’t working anymore. When I went through the GitHub page for pyenv at: pyenv and followed all of the steps and got to this part:

  • Add pyenv init to your shell to enable shims and autocompletion. Please make sure eval "$(pyenv init -)" is placed toward the end of the shell configuration file since it manipulates PATH during the initialization.$ echo -e ‘if command -v pyenv 1>/dev/null 2>&1; then\n eval “$(pyenv init -)”\nfi’ >> ~/.bash_profile
    • Zsh note: Modify your ~/.zshenv file instead of ~/.bash_profile.
    • fish note: Use pyenv init - | source instead of eval (pyenv init -).
    • Ubuntu and Fedora note: Modify your ~/.bashrc file instead of ~/.bash_profile.General warning: There are some systems where the BASH_ENV variable is configured to point to .bashrc. On such systems you should almost certainly put the abovementioned line eval "$(pyenv init -)" into .bash_profile, and not into .bashrc. Otherwise you may observe strange behaviour, such as pyenv getting into an infinite loop. See #264 for details.

.zshenv should actually be .zshrc

First Half of Python for Network Engineers

It’s been non stop for 5 weeks of training, but this week we had a week off so I thought I would post this.

I was able to get my work to fund the Python for Network Engineers course taught by Kirk Byers. 

https://pynet.twb-tech.com/class-pyauto.html

I had taken the free class a couple of times and learned quite a bit. I thought that being able to take the paid course would give me a better understanding of things related to python and how to handle some of the more complex things that I want to do. I really want to be able to take advantage of more automation in our environment and make things work better/easier with fewer chances for errors. I also want to empower my Helpdesk to be able to do more things, we are a very small shop with a large footprint of stores/offices. We have deployed Meraki to almost all of the locations so being able to take advantage of python/rest apis has been a great benefit so far. However I feel there is more that I can do, I just need some more training. Also the more stuff I can give to my Helpdesk the less they have to call me for and I can try and get some more sleep(as though that would happen). 

I have really enjoyed the first half of the class and learned quite a bit so far in just using Netmiko, textfsm, and jinja2. The other part that is nice is the community of people that Kirk has put together so that we can all learn off from each other and exchange ideas and questions. Between using Slack and some group channels there has been a lot of good comments/questions exchanged back and forth. 

As for the class Kirk’s videos have been informative and I have found a lot of useful information in them. His examples have been good and have shown some real life information in working with equipment. Not diving into actual network engineering, but showing some information in relation to real life data/examples. I have also found the exercises he has assigned us to be challenging and quite good. I have picked up some good ideas from them and it has pushed my learning and understanding of python.

In all I am really enjoying it and can’t wait for the next half and to see how my python programming improves.

Meraki Script to pull LTE Card Signal

Script for pulling the make and signal strength of wireless cards

We are trying to continually audit our LTE cards in the Meraki Routers so we wanted to be able to monitor the stores LTE connections and see the signal strength and then determine which if any needed to be swapped out. However that data is only stored at the device level so you have to iterate through the whole Organization then by network and then by device in the network. Meraki has a polling limit for how many times you can poll the cloud per second so I put a 1 second delay in there to keep the program from overwhelming everything and causing issues for itself or for our users monitoring on the website.

The script can be found here:

https://github.com/undrwatr/MERAKI_CARD_SIGNAL

How I handle credentials and shared variables in Python

How to handle common variables between programs

I have been writing a lot of python programs lately for interacting with the Meraki Platform. I was tired of copying and pasting my variables and credentials between programs, plus I wanted the ability to easily upload the programs to GitHub without having to worry about sanitizing the program of my companies or personal data. I did some searching and didn’t find a lot so what I figured I would do is put this information into a python module and then I could call that module from within my programs and then I wouldn’t have to worry about keeping all of my data secure. I decided to call my module cred.py and then I could call it from within the program with just a “import cred”. I used to copy this file into each of the directories where I was working on a program. Then I ran into a problem where I had to change an API key, I then had to go through and find all of the cred.py files I had created and then update the data in them. That proved to be more of a pain than I wanted to deal with so I decided to place it in a central directory for all of my programs. This proved much easier, but then I had to figure out how to call it from within Python without making it a module in the install path.

That is where I came up with this:

import sys

#Import the CRED module from a separate directory
sys.path.insert(0,’../CRED’)
import cred

With this it allows me to keep one central directory to store all of my credentials, but also commonly needed variables. I call it from within the program and can then run my programs easily. Love to hear how others are handling this or if there is a better way for me to do it.

Moving Cisco UCS to 10G Interfaces

We initially implemented our Cisco UCS chassis and FI’s with 4 port channels each with 2x1Gb interfaces connected to our Cisco Core Switches. Now we are in the process of moving from a dual Cisco Core to a Juniper Virtual Chassis Core, more on that later in another post. Part of getting the new core was finally getting 10G for our network. We had been surviving just fine with our current network connectivity, but figured it wouldn’t hurt to get 10G and connect whatever we could to it.

What I could not find searching around the internet was how the UCS FI’s were going to handle the additional links and how the traffic would move over. I was afraid it would do some sort of Spanning Tree blocking and not allow them to pass traffic. However I realized after checking the existing links that I had two from each FI and both of them were actively passing traffic and neither was in a blocking state.

I then went ahead and started to plan the turnup of the links. For us the majority of our servers are sitting in the UCS environment from bare metal linux and windows machines, to our 500 Guest VMfarm. With so much crucial infrastructure we wanted to make sure we didn’t have any downtime or lose any traffic during the transition. So as part of the planning I built out a python script that would ping a list of known addresses to ensure they were all up and on the network as each part of the plan was completed. I wanted it be useful across any platform so that the code was reusable, so I made some allowances for the different versions and the unique requirements of each OS. The only requirement is a file called hosts.txt with your ip addresses in it you want to ping. Its multi threaded so it will run a lot of the pings at the same time and complete it as soon as possible. Then you just need to go through the output and look for anything that is failing.

 

 

[codesyntax lang=”python”]

#!/usr/bin/python

import sys
import os
import platform
import subprocess
import threading

plat = platform.system()
scriptDir = sys.path[0]
hosts = os.path.join(scriptDir, ‘hosts.txt’)
hostsFile = open(hosts, “r”)
lines = hostsFile.readlines()

def ping(ip):
if plat == “Windows”:
ping = subprocess.Popen(
[“ping”, “-n”, “1”, “-l”, “1”, “-w”, “100”, ip],
stdout = subprocess.PIPE,
stderr = subprocess.PIPE
)

if plat == “Linux”:
ping = subprocess.Popen(
[“ping”, “-c”, “1”, “-l”, “1”, “-s”, “1”, “-W”, “1”, ip],
stdout = subprocess.PIPE,
stderr = subprocess.PIPE
)
if plat == “Darwin”:
ping = subprocess.Popen(
#[“ping”, “-c”, “1”, “-l”, “1”, “-s”, “1”, “-W”, “1”, ip],
[“ping”, “-c”, “1”, “-s”, “1”, “-W”, “1”, ip],
stdout = subprocess.PIPE,
stderr = subprocess.PIPE
)

out, error = ping.communicate()
print out
print error

for ip in lines:
threading.Thread(target=ping, args=(ip,)).run()
[/codesyntax]

For the migration we took the subordinate FI and brought up the 10G interface as we watched traffic flow over it we then ran the ping script a couple of times. We then started to shutdown the trunk interfaces each time running the ping script looking for issues. After we had the Subordinate FI moved over to the 10G and new core we then then did the same with the Primary FI. I was happy to find that at no time did we lost any connectivity to our hosts/guests and that everything went smoothly.

3750 Unicast Flooding Issue

Since I ran into this issue and wasn’t really able to find anyone posting on this I thought I should put something together for anyone else that runs into it. I had an issue with a stack of 3750x switches where there was unicast flooding to all of the ports in the same VLAN. While doing research I came across suggestions of asymmetric l2 routes and timeout values for the arp tables and tcam table overruns. My issue turned out to be none of these, the arp timeout values where all increased and that didn’t solve the problem. My network if farily simple with a collapsed core and l2 asymmetric routing wasn’t the issue. The tcam tables were different not being overrun on this switches as it can handle 8K arp entries and I am no where near that.

So what did that leave me with? An issue where the ARP tables of all members of the stack were not getting updated in a timely manner. As seen below with the following command:

remote command all sh mac add count | i Total

Switch : 3 : (Master)

———————

Total Mac Addresses    : 152

Total Mac Addresses    : 585

Total Mac Addresses    : 39

Total Mac Addresses    : 381

Total Mac Addresses    : 384

Total Mac Addresses    : 22

Total Mac Addresses    : 28

Total Mac Addresses    : 178

Total Mac Addresses    : 0

Total Mac Address Space Available: 6402

Switch : 1 :

————

Total Mac Addresses    : 152

Total Mac Addresses    : 585

Total Mac Addresses    : 162

Total Mac Addresses    : 22

Total Mac Addresses    : 39

Total Mac Addresses    : 381

Total Mac Addresses    : 384

Total Mac Addresses    : 28

Total Mac Addresses    : 0

Total Mac Address Space Available: 6418

Switch : 2 :

————

Total Mac Addresses    : 152

Total Mac Addresses    : 585

Total Mac Addresses    : 165

Total Mac Addresses    : 22

Total Mac Addresses    : 39

Total Mac Addresses    : 381

Total Mac Addresses    : 384

Total Mac Addresses    : 28

Total Mac Addresses    : 0

Total Mac Address Space Available: 6415

Switch : 4 :

————

Total Mac Addresses    : 152

Total Mac Addresses    : 585

Total Mac Addresses    : 39

Total Mac Addresses    : 381

Total Mac Addresses    : 384

Total Mac Addresses    : 22

Total Mac Addresses    : 28

Total Mac Addresses    : 140

Total Mac Addresses    : 0

Total Mac Address Space Available: 6440

After many hours of troubleshooting with TAC, they finally came to the conclusion that we were hitting bug:

CSCut64281    Ports on Member of the stack takes long time to learn/age MAC addr

This was only evident in the 15.1x code train, this issue didn’t exist in 15.0 which is why some of my older switches weren’t seeing it. Only the brand new shiny ones I had installed last year. The fix was finally available in the last couple of months in 15.2.(2)E3. I finally finished testing the release on some slightly non prod switches and then decided to roll out to my campus, now I am seeing the following in the upgraded switches:

Switch : 3 : (Master)

———————

Total Mac Addresses    : 142

Total Mac Addresses    : 536

Total Mac Addresses    : 44

Total Mac Addresses    : 70

Total Mac Addresses    : 30

Total Mac Addresses    : 21

Total Mac Addresses    : 27

Total Mac Addresses    : 150

Total Mac Addresses    : 0

Total Mac Address Space Available: 7151

Switch : 1 :

————

Total Mac Addresses    : 142

Total Mac Addresses    : 535

Total Mac Addresses    : 44

Total Mac Addresses    : 70

Total Mac Addresses    : 30

Total Mac Addresses    : 21

Total Mac Addresses    : 27

Total Mac Addresses    : 151

Total Mac Addresses    : 0

Total Mac Address Space Available: 7151

Switch : 2 :

————

Total Mac Addresses    : 142

Total Mac Addresses    : 534

Total Mac Addresses    : 44

Total Mac Addresses    : 70

Total Mac Addresses    : 30

Total Mac Addresses    : 20

Total Mac Addresses    : 27

Total Mac Addresses    : 150

Total Mac Addresses    : 0

Total Mac Address Space Available: 7154

Switch : 4 :

————

Total Mac Addresses    : 142

Total Mac Addresses    : 535

Total Mac Addresses    : 44

Total Mac Addresses    : 70

Total Mac Addresses    : 30

Total Mac Addresses    : 21

Total Mac Addresses    : 27

Total Mac Addresses    : 146

Total Mac Addresses    : 0

Total Mac Address Space Available: 7156

While not perfect it definitely seems to be a lot better than the previous reports. I keep looking for the bug to be posted on Cisco’s site, but it is still private at this point.

Troubleshooting Websense as Proxy for site access

I recently had to troubleshoot a problem with a client going through Websense as a proxy and trying to gain access to a site. The site has at https://somesite.com:11001. Every time I would go to the site I would just get a “Page could not be displayed”. I then wen through and started troubleshooting from the Websense side and couldn’t see anything in the interface itself, so I went to the log server and then stopped the logging service and ran it from the commandline with just the client I was testing with. However this didn’t even show that there was a hit from the client. I then had to go to the next level and troubleshoot with a packet capture and Wireshark. Once I was able to capture the traffic I could see that Websense was returning an error that the browser wouldn’t display. The issue came down to using https on port 11001 which wasn’t allowed in the Content Gateway on the Websense appliance. Once I added that I was able to browse successfully to the site and have it show up in the log server.

So below I have summarized the steps for someone else needing to do this type of troubleshooting.

How to use the Websense testlogserver to troubleshoot problems and limit the information that is seen:

  1. Log into the logging server
  2. Stop the “Websense Log Server” service
  3. Go into the c:program files (x86)WebsenseWeb Securitybin folder and run the testlogserver.exe -onlyip (ip address you want to see)
  4. You can now surf the site from that machine and see what errors are showing up in the log server to help determine the problem.
  5. If you need to go to another level then run a packet capture from the machine using Websense as an explicit proxy in your browser. You can then limit the capture to just the Websense IP.
  6. Once you have gone to the site you can then look at the packet capture and search for “http contains (site you are going to)”.
  7. You should be able to then decode the http stream and see all of the headers and information returned. This should help you in troubleshooting the issue.