Archive

Archive for the ‘Windows Administration’ Category

Troubleshooting Windows Application Problems

December 16th, 2009

Frequently in my work as a Windows system administrator, I am asked to troubleshoot unusual application problems that our first and second-level support staff have been unable to fix. Although I troubleshoot these kinds of problems on a fairly regular basis, I find that I don’t always do so consistently. I might overlook something that I shouldn’t have, or I might forget something I’ve seen before that helped me solve a similar problem.

To help myself and my co-workers jog our memories when presented with an application problem that doesn’t respond to the troubleshooting steps we’ve tried so far, I developed the following (lengthy) series of questions to ask myself when I run into a problem that seems to be resisting my efforts to solve it. Since this list might be of value to others who are trying to solve problems with Windows (or even Mac or Linux applications, though this guide is aimed specifically at Windows), I thought I would publish it here so that others could benefit from it.

  1. Has the PC been rebooted to ensure the problem isn’t temporary? If rebooting isn’t practical, try having the user log off/on, as this will refresh the applications that load when the user logs on and terminate anything that might be hung.
  2. Have we checked to see if the manufacturer’s support site has seen this problem before?
  3. Have we done a Google search on any symptoms or error messages to see if others have seen and fixed this before?
  4. If this is a new application install, does the problem occur for an administrator and not for a normal user? If so, we probably need to adjust permissions for some of the files/folders in the application’s C:\Program Files directory. The Sysinternals Filemon tool can help you identify what files might be having trouble. Regmon can help you do the same for registry entries.
  5. Does the problem occur when other users log on to the same PC and use the same application? If not, we’re probably looking at a user profile issue. Try renaming the user’s profile and having them login to create a new one, then see if the app works.
  6. Has the application in question been repaired using Add/Remove Programs, or removed and reinstalled? If the application interacts with other applications (e.g., Flash Player and Internet Explorer), have all the relevant applications been repaired and/or reinstalled?
  7. If the problem involves a browser add-on or extension, have we disabled all other browser extensions and add-ons to see if there is a conflict of some sort (for Internet Explorer, see Tools -> Manage Add-ons -> Enable or Disable Add-ons)? Has a recent Microsoft “kill bits” or ActiveX patch disabled it?
  8. Is the application in question a Java application, or does it make some use of Java? If so, check to make sure Java is working by entering “java -version” at a command prompt. If Java isn’t found, that could be the problem.
  9. If this is a problem with an application that creates and opens documents (like Excel), does the problem happen with all documents or just certain ones? If the document is copied to another machine with the same application does that machine exhibit the same problem? If so, it may just be a corrupted document.
  10. Does the application utilize any temp files or configuration files (e.g., INIs) that might be corrupted? If so, have we tried renaming those and letting the application make new copies? For Internet Explorer, this includes the Temporary Internet Files. For Office, it includes opa11.dat, excel11.xlb, excel11.pip, mso1033.acl, powerp11.pip, ppt11.pip, extend.dat, and normal.dot. (Note that an uninstall/reinstall doesn’t usually fix this.)
  11. Has CHKDSK been run to ensure there is no disk corruption? (Note: Multiple runs may be needed if corruption is extensive.) If there was corruption, repairing the application after fixing the corruption is a good idea. If there is still a problem, the OS itself might be corrupted and a full rebuild or reimage may be the best answer, especially if you can’t replicate the issue on another PC. If corruption doesn’t seem to get fixed after 3 CHKDSK runs, you’re probably looking at a bad hard disk or such severe corruption that rebuild is a better idea than repair.
  12. Have we checked the vendor’s web site to see if there are any updates, hotfixes, or patches available and applied them?
  13. If the application uses plug-ins, have we tried repairing and/or removing those plug-ins to see if the problem goes away?
  14. Are there multiple versions of the application installed (e.g., Office 2003 and Office XP)? Can the user live without one of them? Has the newer version been repaired before (and/or after) the older one?
  15. Is there anything in the Event Logs which might point to the cause of the problem? Does the application produce any logs of its own that we can look at?
  16. If this is a network-related application (like Outlook, Cygwin, etc.) have we confirmed that networking is working? Is the firewall causing a problem?
  17. If this is a database related application, is the database up? Is there an ODBC database provider configured in the control panel? Is any database middleware present (e.g., Oracle software) that needs to be?
  18. Was anything installed on the computer just prior to the onset of the problem?
  19. Were any patches applied recently that affect this particular application? Have you tried removing the most-recently installed patches to see if this helps (see Add/Remove Programs)?
  20. Have we tried renaming the branch of the registry related to the application and then repairing the application (e.g., HKLM\Software\Vendor to HKLM\Software\Vendor.old)?
  21. If this is an application which prints (like the Office apps), try changing the default printer and launching the application again. If the problem disappears, delete the original default printer, re-add it to get new drivers, and make it the default again. (Some apps grab printer information at startup and can crash if there is a driver issue.)
  22. Is there a chance that this application needs a firewall exception? Check its manual, vendor web site, etc., to verify this and if necessary add one. If it needs a firewall exception and this wasn’t automatically done at install, notify WDA.
  23. Does the machine have the latest BIOS?
  24. Some applications interface with, or hook into, hardware drivers. For example remote control software does this to simulate keyboard/mouse input and capture video changes. If there’s a chance this application does that, have we tried updating the drivers (e.g., video, network, key/mouse)? Note that you may need to repair the app after updating the drivers so the app can restore its “hook” into them.
  25. If this is an application that processes sound, like a sound recorder, are the Control Panel settings correct for that? For example, are the input and output devices set correctly? (You may want to experiment with various options in case the control panel thinks, for example, that the line-in jack is the microphone jack.)
  26. If this is a problem getting an application to launch, the likely culprits are disk corruption, corrupted temporary files, corrupted settings files, corrupted application files, or bad registry entries. CHKDSK can fix disk errors. Repairing the app should fix corrupted application files. Deleting temp and settings files should be tried. Renaming the Registry branch used by the app can help restore corrupted Registry entries.
  27. Does the application rely on any Windows Services in order to function? Are those services installed and started? Have you tried stopping and restarting them?
  28. Is there enough free space on the user’s hard disk (1-2GB)? The application may need to create temporary files, or the operating system may need page file room.
  29. Does this application interact with a CD-ROM or other peripheral? If so, is that device attached? Is it working? If it’s a disk drive, does it contain a disk? Is that disk corrupted or unreadable?
  30. Does the application generate any logs itself? (These may appear in the application’s own directory or in the user profile.) Any indication of a problem there? Does searching the error messages on the Internet help any?
  31. If the application interfaces with something on the network, like a web server or application server, can we determine if that server is online? Are other users with this same software able to get to that server? Is there anything wrong with the user’s account on that server?
  32. If this is an issue with a peripheral, like a mouse, have we tried using a generic Microsoft driver for the device (if there is one)? If we’re already using a generic driver for the device, have we tried a manufacturer-specific one?
  33. If the problem in question is display oriented, like a window not refreshing properly or graphics appearing corrupted, etc., have we tried updating the video drivers to the latest available from the card’s manufacturer?
  34. If this is a problem working with a media file, does the PC have the correct “codec” (compression/decompression) software installed? For example, AVI files may need codecs like DiVX, XVid, and so forth installed.
  35. If this is a web-browser-oriented application, does it work when an administrator is logged in and running the browser (be careful about this if you’re going to an untrusted site as you could introduce malware!)? If so, we’re probably missing a plug-in or permission that allows the user to run the app.
  36. Some applications embed an Internet Explorer control into them to read/view content from the Internet. Is that a possibility with this application? If so, have we tried repairing and troubleshooting IE?
  37. If this is an issue with Internet Explorer, have we tried using Tools -> Options -> Advanced -> Reset…. to restore the browser to default configuration? Have we tried deleting temporary files?
  38. Have we considered possible hardware causes for this problem? For example, could a failing hard disk cause this? Could faulty RAM be making this machine unstable? Could a bad motherboard or video card do this? An easy way to test this would be to configure a similar machine with the same software and see if you get the same result.
  39. Have we tried calling, emailing, etc., the application manufacturer if possible?
  40. If you’ve already invested a lot of and are no closer to fixing it, and you can’t replicate the issue for others with a similar hardware/software build, have you considered that a rebuild may be a better use of time? If this is a one-off issue that isn’t recurring for the user (or that you’re not seeing for lots of users), rebuilding the machine may be cheaper to the company than spending more hours fixing the issue.

admin Windows Administration, Windows Support , , , , , , ,

Remotely Monitoring, Controlling, and Setting Windows XP Screen Savers

December 12th, 2008

At the office, we’ve been investigating an issue where (for reasons as yet undiagnosed) a number of  Windows PCs that are configured via Active Directory Group Policy to automatically lock their screens after 5 minutes of inactivity aren’t doing that.  In the process, I’ve started writing some scripts to gather data during off-hours times (e.g., 2am when few people should be working) to see whose machines aren’t locked and capture information about system resources, running processes, and the like.  Not enough data is available yet to reach any conclusions, but I have run into a few interesting tidbits that might be of use to other Windows administrators and support personnel.  I’ve decided to compile those tidbits here so that you’ll be able to make use of them in your own environment if you so choose.

The Registry Keys Governing Screen Saver Activity (Windows XP)


The key that determines if a screen saver is password protected is:

HKCU\Control Panel\Desktop\ScreenSaverIsSecure

This key has a value of 0 if it’s not password protected, and 1 if it is.

The key that tells Windows if a screen saver has been selected for activation or not is:

HKCU\Control Panel\Desktop\ScreenSaveActive

This key has a value of 0 if no screen saver has been selected, and 1 if a screen saver has been selected.

To set how long the “idle time” has to be before the screen saver kicks in, check this key:

HKCU\Control Panel\Desktop\ScreenSaverTimeout

This contains the “idle time” in milliseconds before the selected screen saver activates.

The key governing which screen saver is to be used is this one:

HKCU\Control Panel\Desktop\SCRNSAVE.EXE

The value of this key is the path to the selected screen saver, such as “C:\WINDOWS\System32\logon.scr”.


Where Screen Savers Are Stored
(Windows XP)

The “normal” or “default” screen savers that ship with Windows (along with most user-installed screen savers) can be found in:


C:\Windows\System32

Where “Windows” is the name of the directory into which Windows is installed on your PC (i.e., if you’ve changed that to a different directory, adjust the “C:\Windows” part accordingly.

There’s nothing that requires screen savers to be stored in this particular directory, however, so you could find screen savers in other directories on the PC.

If you want to look for the screen savers on your particular PC, do a Windows search for files whose names end in “.scr” as those are (more likely than not) screen saver modules.

Starting a Screen Saver from the Command Line (Windows XP)

To start a screen saver from the command line on the PC you’re using, bring up a command line and enter the command:

c:\windows\system32\logon.scr /s

Where “C:\Windows” is the directory where Windows is installed on your PC, and “logon.scr” is the name of the screen saver you want to start running.  The “/s” tells Windows to start the screen saver running.  Optionally, you could leave off the “/s” (or use “/c”) to see any options you can set for that screen saver (or get an error if there are none).  You can also use “/p <HWND>” to invoke the screen saver as a “child of the
window referred to the window <HWND>” (I’ve not used that particular function so I can’t tell you much  about it).

Note that even though your screen saver might be set to require a password when it comes back, my testing indicates that invoking the screen saver as above does not cause this to happen. You’re better off,  if you’re concerned about security, issuing a command to force the system locked.

Locking the Screen from the Command Line (Windows XP)

It’s possible to lock your system from the command line.  To do this, bring up a command line and enter the following command exactly as written:

rundll32.exe user32.dll, LockWorkStation

This will almost immediately lock the screen/system.

Locking the Screen or Starting the Screen Saver Remotely (Windows XP)

There may be times you want to lock a system that’s somewhere else on the network.  That can be done pretty easily by first downloading the  “psexec” tool from SysInternals (now a part of Microsoft).  Using psexec, you could remotely lock the screen of a PC on your network named “PC123″ by issing the following command from the command line:

psexec \\pc123 rundll32.exe user32.dll, LockWorkStation

(The above command should all be on one line. It’s not two separate commands.)

You can also invoke a screen saver remotely (with the caveat that it doesn’t actually lock the system) by using psexec to issue the following command:

psexec -i \\pc123 cmd /c start c:\windows\system32\logon.scr /s

(Again, the above command should all be typed together on one line.)

Determining if a Screen Saver is Running on a Remote PC (WMI/VBScript)

Since “normal” screen saver modules are all executables with the extension “.scr” in their name, identifying whether a screen saver is running on a remote PC can be determined by creating a single VBScript to connect to the Windows Management Instrumentation (WMI) service on the remote PC and query the list of processes to find one with “.scr” in the name.  If you find one, then more likely than not there’s a screen saver active on that machine.  The following VBScript code will tell you for the computer named in  “strComputer” whether a screen saver is running or not.

dim objWMIService, colItems
strComputer = “pc123″
Set objWMIService = GetObject(”winmgmts:\\” & _
strComputer & “\root\CIMV2″)
Set colItems = objWMIService.ExecQuery( _
“SELECT * FROM Win32_Process”,,48)
ssActive = false
For Each objItem in colItems
if instr(1,objItem.Caption, “.scr”) > 0 then
ssActive = true
end if
Next
if ssActive = true then
wscript.echo “Screen saver is active on ” & _
strComputer

else
wscript.echo “Screen saver not active on ” & _
strComputer

end if

The above script connects to the specified machine’s WMI provider, retrieves a collection object representing the processes running on the system, scans through the collection looking for any with “.scr” in the name. If one is found, the variable “ssActive” is set to true.  It then checks the value of that variable to see if it found a screen saver running and reports that.  The above script assumes that the user running it has administrator permission on the remote machine.  If not, it will fail.

Note that I’ve intentionally left all error-checking out of the above script code to keep it short for publication. If you plan to use this in any kind of production mode you’ll want to build in checks to identify if the PC in question can be reached, if there is a problem retrieving the list of processes, etc.

If you don’t want to use VBScript but would still like to know if a remote system is locked, and you have administrator permissions on that machine, the “pslist” utility from SysInternals (now Microsoft) can make that fairly easy.  Just download pslist from the Microsoft web site, bring up a command line, and enter a command line like the following:

pslist \\pc123 logon.scr

You’ll get back a response like this if the specified screen saver (logon.scr) is running:

PsList 1.26 – Process Information Lister
Copyright (C) 1999-2004 Mark Russinovich
Sysinternals – www.sysinternals.com

Process information for pc123:

Name       Pid Pri Thd  Hnd   Priv    CPU Time    Elapsed Time
logon.scr 2324   8   1   17    408 0:00:00.078     0:00:09.915

This will tell you if the “logon.scr” process is running on that PC and how long it has been running.  If you’re not sure what screen saver the user might have active, just run pslist without specifying a process name.  You’ll get a much longer list, but anything in that list with “.scr” in the name indicates which screen saver module (if any) is running.

Note that while the above information is based on Windows XP Pro and has been tested with XP, in theory it should also work with Windows 2000 and possibly Windows Vista, but I have not tested it with those.



admin Windows Administration , , , , , ,

System Administration Lessons Learned from Star Trek

November 25th, 2008

1. “You have to know how things work on a starship…” (Star Trek II)

Kirk’s old enemy, Khan, took command of the Reliant, a Federation starship.  When the Reliant approached the Enterprise, Kirk hesitated to raise his shields.  This gave Khan the opportunity to attack and severely damage the defenseless Enterprise.  Kirk retaliated by using his superior knowledge of Federation technology to remotely order the Reliant to drop its shields, allowing Kirk to launch an effective counterattack.  When asked how he knew the strategy would work, Kirk remarked that (if you’re the captain) you have to know how things work on a starship.

Similarly, if you’re a Windows System Administrator, you have to know how PC hardware works and how Windows itself works if you’re going to be very effective.  Since becoming a Windows administrator, I’ve had to dig deeply into the Registry, crash logs, technical references, and programming guides to solve some of the more challenging issues to come my way.  The more I know about how things work (or how they’re supposed to work), the more effective I tend to be.

2. “A no-win situation is the possibility every commander may face.” (Star Trek II)

In Starfleet Academy, the Federation tests potential officers by putting them in a simulated situation that they cannot win.  This is done to see how they react to the pressure and inevitable defeat.  Having just “failed” this unpassable test, a cadet asks Kirk why they are put through it.  He says that a no-win situation is a possibility every commander may face at some point (though Kirk himself cheated his way out of it and won the “no-win” scenario).

In system administration, there are problems that will come your way that you simply can’t fix.  Maybe it’s a system that’s been hit by too many viruses, a Registry that’s too corrupted to be sorted out, or hardware that just doesn’t work.  You can spend hours or days trying to fix a problem like this without ever really solving it.  You have to know when you’re facing a “no-win scenario” and cut your losses by walking away from the problem.  That might mean wiping the system and reinstalling everything instead of spending hours correcting a series of problems, tossing out a piece of hardware that “ought to work” but somehow doesn’t, or giving up on software that simply doesn’t do what it’s advertised to do.

3. “The needs of the many outweigh the needs of the few… or the one.” (Star Trek II)

Spock gave up his life at the end of Star Trek II to save the Enterprise and her crew.  When asked by Captain Kirk why he did it, Spock replied that the needs of the many outweighed the needs of the few, or the one.  In other words, Spock knew that by giving up his life he could save many others.

In system administration, you’re probably not going to be faced with a “life or death” choice like this, but almost daily you’re faced with situations where the needs of your end user community (”the many”) dictate actions you (”the one”) take.  For example, you may find yourself at the office after hours, sacrificing your personal time in order to complete a software upgrade, patch a server, or otherwise do something that would inconvenience users if you tried to do it during the work day.  Chances are, you’re also probably “on call” to help those same users if they have problems late at night or on the weekend, and you’re expected to help them.  The needs of the many, in this case, outweigh your own needs.

4. “Mr. Scott, have you always multiplied your repair estimates by a factor of four?  Certainly, sir. How else can I keep my reputation as a miracle worker?” (Star Trek III)

Mr. Scott admitted to Captain Kirk in Star Trek III that he had always multiplied his repair estimates by a factor of four.  This gave him the opportunity to take all the time he needed to solve a problem, while still completing the task more quickly than the captain had expected.  As a result, he was seen as a miracle worker by Captain Kirk.

In system administration, you’re often asked how long something is going to take.  While I don’t recommend multiplying your estimate by four, I do believe that you should always practice the principle of “underpromise and over-deliver” when dealing with others.  A task that looks like it should be a one-hour job can easily become a 2-3 hour job if things go wrong, the system begins responding too slowly, an emergency arises that you need to address first, etc.  If you tell someone something will take an hour and you aren’t done two hours later, they’re angry.  But if you tell them it will take two hours and you’re done in 90 minutes, you’re a miracle worker.  I’m not suggesting that you make a habit of lying, but rather that you give yourself a little breathing room to allow for things you might have forgotten, things that take longer than expected, or unexpected circumstances.

5. “The fancier the plumbing, the easier it is to stop up the drain.” (Star Trek III)

In Star Trek III, Captain Kirk and the crew of the Enterprise essentially “stole” the ship in order to save Spock and Doctor McCoy.  Mr. Scott expected the Federation’s newest, fastest, fanciest ship (the Excelsior) to be given the task of pursuing the older, slower Enterprise.  He removed a handful of critical computer chips from the Excelsior’s system while working on it, preventing the ship from being able to give chase.  When asked how he managed to sabotage the Excelsior in a way that they didn’t detect, he replied that the fancier the plumbing was, the easier it was to stop up the drain.  In other words, the systems on the Excelsior were so complicated that it was easy to screw them up.

System administrators often have several ways to deal with a situation.  Some ways are simpler than others.  You should always be wary of any solution that has too many potential “points of failure”.  While an elaborate Perl script might push out an urgent security patch to 10 systems simultaneously from the comfort of your desk chair, you could over-think the script and end up accidentally applying that patch to 100 systems you didn’t want to apply it to.  Sometimes it’s better to keep things simple, because it can reduce the chance of failure or allow you to respond more quickly.  Similarly, you can “over engineer” a solution to a problem and spend more time architecting a clever solution to something you could fix manually in a few minutes.

6. “Sometimes the needs of the one outweigh the needs of the many.” (Star Trek III)

When asked why the crew of the Enterprise risked their lives and their careers to save Spock in Star Trek III, Captain Kirk told him that sometimes the needs of the one outweigh the needs of the many.  In other words, Spock was their friend and they were willing to risk themselves because he meant more to them than their lives or careers.

In systems administration tasks, sometimes you have to do things that make a lot of people very unhappy.  For example, when pushing out security patches it is often necessary to reboot someone’s PC to complete the installation.  Naturally, if that person has documents open in Microsoft Office when you reboot them, they’re not going to be happy about it.  Multiply that over a large organization, and that simple reboot action can upset a lot of people.  However, as a system administrator, you’re responsible for protecting your network from malware.  While “the many” users’ needs may dictate that their PCs not be rebooted, your responsibility as “the one” who protects the network must outweigh theirs.  This is not to say that you’re more important, or that you should be fine with mid-day reboots as a matter of practice, but rather that there will be times in the job where you’ve got to risk the wrath of the users for a greater good.

7. “Perhaps ‘because it is there’ is not sufficient reason for climbing a mountain.”  (Star Trek V)

In Star Trek V, Captain Kirk is attempting to climb a mountain when he slips and begins to fall off.  Spock saves him at the last second.  Later, Spock tells Kirk that perhaps “because it is there” isn’t a good enough reason to risk your life climbing a mountain.

There are times in system administration where there is something that you can technically do, but which isn’t a good idea when examined more closely.  Maybe you have a script that could update all the company’s computers with the latest Windows Service Pack overnight.  You might even be tempted to do it, since your management’s asking you about when you’re going to get the job done.  However, just because you can roll that Service Pack out in a heartbeat doesn’t mean that’s the right thing to do.  You could come in the next morning and find out that the Service Pack you pushed out last night broke the salespeople’s contact management software, the accountants’ general ledger program, and the CEO’s favorite screensaver.  Suddenly, instead of being the miracle worker you thought you were going to be, you’re on everyone’s hit list.  There are times in system administration when caution is needed, and experience will often help you know when climbing the proverbial mountain is a good idea and when it isn’t.

8.  “An ancestor of mine maintained that if you eliminate the impossible,whatever remains, however improbable, must be the truth.” (Star Trek VI)

In Star Trek VI, when attempting to figure out who assassinated the Klingon Chancelor, Spock began investigating his shipmates to identify the assassins.  When he came up with a seemingly incredible solution, he uttered the famous line above (which is paraphrased from Sherlock Holmes).

System administrators are often called upon to troubleshoot the strangest problems.  Sometimes the solution to those problems can be counterintuitive, and may even sound “impossible”.  Here’s a real-life example from my Windows 98 days.  The company had just implemented a new application in the Marketing and Finance areas.  For some reason, the laptop users in Marketing were getting a lot of “out of memory” errors when trying to use the application.  They requested more RAM.  We installed it.  The out of memory errors became even more frequent.  I started doing some research online and learned about a table kept by Windows 98 that was used to manage the available RAM.  My research indicated that the table had a fixed size and under certain conditions could “fill up” on the user.  One way you could free up space in this table was to remove some RAM.  I tried this on the Marketing laptops and, sure enough, the “out of memory” errors went away.  So, as impossible as it might seem, removing memory from the machines cleared up an “out of memory” error.

9. “People can be very frightened of change.” (Star Trek VI)

In Star Trek VI, the Klingons suffered an environmental disaster that threatened to destroy their civilization.  As a result, they sought peace with the Federation, a change from their long-standing policy of conflict and subjugation.  In both the Federation and the Klingon Empire, there were people who had hated their rivals so much, and for so long, that the prospect of peace between the two governments was something they couldn’t stomach.  It was said that such people were frightened of change (the coming peace).

This is very true in the Information Technology (IT) world.  When system administrators are about to make any kind of a signficant change, they’re often required to document, justify, explain, and test the change well in advance of making it.  Inevitably, you will eventually change something that causes a problem.  Perhaps some Excel macros quit working after you upgrade Microsoft Office, or the new version of Internet Explorer doesn’t work with an application used in Human Resources.  Those unfortunate consequences tend to make organizations as a whole resistant to change, even fearful of it.  As a system administrator, one of your responsibilities is to introduce change in a manner that allows you to control the potential negative impacts.

When we planned to roll out Windows XP Service Pack 2 (a while ago), I helped test as many of the applications used around the company as possible.  I would try to identify if Firewall changes would be needed, if the application required one of the “compatibility mode” options, if it would need to be patched, etc.  The point of all the hours I put in doing those things was to minimize the disruptive effects of upgrading to Windows XP Service Pack 2.  By all accounts, our hard work paid off and there were few, if any, complaints once the software began rolling out across the organization.

10. “One of the advantages of being a captain, Doctor, is being able to ask for advice without necessarily having to take it.” (ST:TOS “Dagger of the Mind”)

In the original Star Trek series, Captain Kirk often sought the advice of his senior officers.  Even though he sought their advice on how to deal with a problem, he did not always heed it.

Systems administrators typically work in teams.  Members of teams typically have one or more areas of expertise, and other areas where their expertise may be less extensive.  As a member of the team, you should always be willing to seek the advice of your teammates when you’re about to do anything that might reflect negatively on the team if it goes wrong.  Just because you ask for a teammate’s advice, however, doesn’t mean you have to follow it.  Sometimes your own expertise or experience may “trump” the advice of a teammate, however well-intentioned and intelligent the advice might be.  The key lies in knowing when to take advice and when to ignore it, which is something you learn with time and experience.

11. “Power is danger.” (ST:TOS “Balance of Terror”)

A commonly uttered security mantra is that you should give users only the amount of administrative ability necessary for them to do their jobs, and no more.  If users don’t have a business need for administrator access to their systems, they shouldn’t have it.  In this way, if those same users introduce malware to your network via an infected floppy, CD, USB key, etc., that malware will have a hard time spreading.  Having no administrator access will also prevent them from installing unauthorized or pirated software, shutting off their computer’s firewall, or doing other things that could compromise the security and stability of your network.

Similarly, as a system administrator you should always be careful and deliberate with your actions when you’re using administrator permission on a machine.  Don’t do indescriminate web browsing with the administrator account.  Don’t run untested scripts against lots of end user machines.  Don’t delete files you aren’t sure about.  In short, recognize that your “godlike” powers over the computer make you dangerous, and always use those powers sparingly and carefully.

12. “Leave bigotry in your quarters; there’s no room for it on the bridge.” (ST:TOS “Balance of Terror”)

System administrators tend to be the kind of people who like to tinker with things.  Even though we may be Mac administrators, we dabble in Windows or Linux.  If we’re Linux administrators, we can’t resist the urge to fiddle with a script on OS X or a batch file on Windows.  Because we have a lot of experience, we can sometimes become opinionated about technology, to the point of bigotry.  In a corporate setting, this kind of bigotry can be suicidal.  If your response to every Windows problem you’re asked to resolve is to launch into a missive about how this wouldn’t be a problem on the Mac, you’re in the wrong job.  Unless they happen to ask for them, users don’t want your opinions about the technology they’re using.  Most of them could care less whether they’re using Windows, OS X, Linux, or something else.  They just want to do their jobs, and they need you to fix the problem that’s keeping them from working.  You may have a long list of reasons why the company should dump Windows and move to Linux or OS X. They might be very intelligent, objective, and thoughtful reasons.  But if you’re being paid to administer Windows, you should keep those opinions to yourself unless asked for them. You’ll just create unrest and friction with your co-workers, and that doesn’t help anyone.

13. “The more complex the mind, the greater the need for the simplicity of play.” (ST:TOS “Shore Leave”)

Most people adorn their offices with a few well-chosen artifacts.  Perhaps they’re pictures of loved ones, awards they’ve won, or souvenirs from their travels.  System administrators have those things too, but they also tend to like little toys.  For example, I’ve often got a netbook, an MP3 player, and some other gizmo keeping me company.  They might be expensive gadgets to other people, but they’re fun toys to me, and it helps me to reduce my stress to play around with them occasionally… such as on my lunch hour.  Systems administrators tend to be fun, playful, and funny people (once you get to know them).  The complex web of information we have to master and use on a daily basis tends to make us seek out “fun” when we’re not working or need a break.

14. “Insufficient facts always invite danger.” (ST:TOS “Space Seed”)

In the original Star Trek, Captain Kirk freed Khan Noonien Singh and his crew from an extended hibernation.  Khan and his crew were evasive about who they were and what they were doing on the ship they were rescued from.  Both Spock and Kirk did their best to extract information from them, but got very little.  Kirk noticed that Spock seemed uncomfortable with their new guests.  When asked why, he explained that they knew little about them, and that this lack of knowledge could be dangerous.  Later, Khan and his crew attempted to take control of the Enterprise.  Spock was right not to trust them.

System administrators who are willing to jump in and start working with something they know little about often learn through (bad) experience to become more cautious.  In handling security patches, for instance, I’m very careful.  When a new patch comes in, I have no way of knowing if that patch will break a critical business system, prevent systems from booting up, or force a reboot in the middle of the CEO’s presentation to the executive board.  Before I release the patch to anyone else, I try it on my own system first to see how it behaves.  I then try it on my teammates and nearby co-workers.  If it doesn’t cause a problem for them, I begin slowly fanning it out to the rest of the company.  Once I learn that the patch seems harmless I will then allow it to make its way on to large numbers of computers.  I make every effort to learn as much as I can about the patch before letting it “run loose” on the network.

15. “Either one of us, by himself, is expendable. Both of us are not.” (ST:TOS “The Devil in the Dark”)

It’s not uncommon in system administration for there to be one person who handles a specific task, with another person serving as backup to that person.  The logic is to ensure that if the primary person gets sick, goes on vacation, takes a job elsewhere, or is hit by a bus, the team can continue to do the things it is responsible for.  It’s a bad idea for the primary and the backup to be out of the office at the same time, and should be avoided if at all possible.  Inevitably, the day you’re both out of the office there will be a major crisis in your area of expertise, and no one there who can resolve the problem.

16. “If I can have honesty, it’s easier to overlook mistakes.” (ST:TOS “Space Seed”)

Sooner or later, you’re going to make a mistake.  Maybe you accidentally deleted some critical files from a server.  Maybe you meant to adjust the firewall settings and ended up turning it off.  It might be something relatively minor, or heart-stoppingly major.  Whatever mistake you make, be willing to own up to it.  There’s nothing to be gained by lying to your teammates or management to cover up a mistake.  If you own up to your mistakes, people will respect and trust you.  If you lie about them, they soon realize they can’t rely on you and begin to resent you for the time they spend uncovering the truth.  Demand honesty from your coworkers, but deliver it in return.

17. “No one can guarantee the actions of another.” (ST:TOS “Day of the Dove”)

As part of system administration, or indeed any job, it can be necessary to make assumptions about how people will react to something and predict how they’ll deal with it.  But just because a particular reaction seems logical, reasonable, and expected, don’t assume everyone will do it.  Always make allowances in your plans, your scripts, and your procedures for your end users to do the illogical, unexpected, and “wrong thing at the wrong time”.  Build in the safeguards you can to prevent as many problems as you reasonably can, but realize that no matter how hard you try, there’s likely to be someone who does something you didn’t plan for.

admin Windows Administration , ,

VBScript to Determine a PC’s Need for a Reboot

September 26th, 2008

From time to time in Windows administration and patch management, it’s necessary to know whether a machine you’re about to do something to is waiting on a reboot. When an installer program needs to replace a file that’s in use, it can’t do that, so it places the file on the disk with a temporary name and places a value in the Windows Registry to indicate that the file needs to be renamed at the next reboot. Therefore, if you want to detect whether a given machine needs a reboot in order to complete the work of a previously-applied hotfix, patch, or software install, you can look at that value in the Registry to see if there’s any work to be done on the next reboot. If there is, the machine needs a reboot. If there’s nothing there, the machine doesn’t need a reboot.

The Registry key you need to examine is a MultiString Value called, aptly enough “PendingFileRenameOperations” located on the following Registry path:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager

Below is a sample VBScript to perform a test of the local or a remote machine to see if a reboot is needed based on the PendingFileRenameOperations key. The script must be run with Administrator permission on the system to be checked. If run without Administrator permission, the script will be unable to connect with the remote machine and an error will be displayed.

When executed, the script prompts for the name of a PC on the network, which can be the PC you’re using at the time. If no PC name is entered, the script aborts. Otherwise, it makes a Windows Management Instrumentation (WMI) call to the Registry provider on the remote machine and requests the value of the PendingFileRenameOperations key. If an actual value is found, this means that PC requires a reboot. If no value is found or the key isn’t there, then the PC does not require a reboot. A message is displayed for the user indicating if the machine in question does or does not need to be rebooted.

I hope you’ll find the script useful.

dim oReg

'
' Set a constant we'll use later
'
Const HKEY_LOCAL_MACHINE = &H80000002

'
' Ask the user for a PC name to check and abort if they
' don't give us one.
'
strComputer = InputBox("Which PC do you want to check?",_
                 "Reboot Need Checker")

if strComputer="" then
  wscript.quit
end if

thePC = ltrim(rtrim(strComputer))

'
' Use the Windows Management Instrumentation (WMI) capability
' to connect to the remote computer's Registry provider.
'
on error resume next
set oReg = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" & _
           strComputer & "\root\default:StdRegProv")

If Err.Number <> 0 Then
   MsgBox "Could not connect with WMI to PC " & strComputer & _
          "'s Registry.", vbOKOnly, "ERROR!"
   wscript.quit
End If

'
' Use the WMI Registry Provider to look up the reboot status in
' the remote PC's Registry. Display an error if we can't do it.
'
strvalue = "NOTHING"
strKeyPath = "SYSTEM\CurrentControlSet\Control\Session Manager"
strValueName = "PendingFileRenameOperations"

oReg.GetMultiStringValue HKEY_LOCAL_MACHINE,_
                         strKeyPath,_
						 strValueName,_
						 arrValues

If Err.Number <> 0 Then
     MsgBox "Could not read reboot status for the PC " & _
	        strComputer, vbOKOnly, "ERROR!"
     wscript.quit
End If

'
' If arrValues returns a non-zero value below, then there are filenames in
' the PendingFileRenameOperations key, and therefore a reboot
' is needed to complete those rename operations.
'
if arrvalues > 0 then

   msgbox strComputer & " requires a reboot at this time. ", _
          vbokonly,"Reboot Needed"

else

   msgbox strComputer & " does not require a reboot. ", _
          vbokonly,"No Reboot Needed"
   wscript.quit

end if


admin VB and VBScript, Windows Administration , , ,

The Chrome Browser – Google’s First “Evil”?

September 2nd, 2008


One of the things Google is famous for is a saying along the lines of “don’t do anything evil” which is to sum up their attitude as a company. Earlier today, they released the “Chrome” web browser for Windows, a new approach to how browsers should work and designed from the ground up to handle web-based applications.


Having spent a few minutes with the browser, and keeping in mind it’s a beta, I’m reasonably impressed. It seemed to be quick, properly rendered the pages I pointed it to, and didn’t gobble up lots of system resources in the process. However, being a Windows administrator, I have a couple of problems with it.


Chrome doesn’t install in the typical “C:\Program Files” location where (by default) applications are supposed to be installed. Instead, Chrome installs in the “C:\Documents and Settings” directory for the person who runs the installer. That’s weird, and not something I’d expect from Google. Still, in and of itself, it’s not exactly “evil”.


The “evil” thing about Chrome is that it not only doesn’t respect the “C:\Program Files” default installation location (and doesn’t let the person installing it change that location), because it chooses to install in the “C:\Documents and Settings” directory it bypasses the normal protections against unauthorized users installing software on a system. Normally, a user requires administrator permission to install a software package like Internet Explorer, FireFox, or OpenOffice.org. Corporations rely on this to ensure their systems contain only licensed, authorized software. They rely on it to prevent unauthorized and potentially dangerous software from making it onto their systems. Using “Documents and Settings” as a way to get around these protections is, in my view, pretty “evil” and certainly beneath Google.




admin Windows Administration , ,

Open Source Windows System Management

August 26th, 2008


There are quite a few commercial systems management products out there for Windows. As with any product space, each has its strengths and weaknesses. Altiris, for example, offers incredible power. LanDesk may lack some of that power, but is far easier to use. As far as I know, there’s no comparable systems management suite consisting of primarily open source software. I’m considering changing that situation.


In the past couple of years, I’ve begun learning a lot of new things about scripting for systems administration, deploying patches, repackaging and deploying software, and generally maintaining the health of systems on a network. I’ve shared bits of that knowledge here, as I’ve had the time and desire to write them up. But I’ve never taken things to the “next level” and actually converted that knowledge into a usable tool set.


For example, I have a DOS batch script which will deploy a specific Microsoft patch to a specific computer from the command line. I have another script which can simultaneously execute a command on multiple systems. Another set of scripts will run a CHKDSK on a remote system, examine the output, determine if any “significant” errors exist, instruct the system to repair errors on the next reboot, and reboot the system. Other scripts can check for impending disk failure, low disk space conditions, etc. Taken as a whole, these scripts would be useful for a small shop (say, 1000 PCs or less) to manage their systems. Extended a bit, they could probably handle a larger network of machines.


Because I’m starting to get the “itch” to create something, I’m toying with the idea of developing my own equivalent of an Altiris or LanDesk that’s built using free or freely-available software. That way, the small organization with 20-150 PCs can manage their system like the bigger shops. And the bigger shops who may not have the money for one of the commercial products can still reap the benefits of automated systems management, without the expense.


This is still just the germ of an idea in my head. My existing scripts are too site specific and undocumented to be widely used without a lot of tweaking. And heck, I may not even have the programming and scripting skill needed to pull off some of the things I would consider critical to such a tool. (For example, minimizing network bandwidth usage by transmitting a software package to one machine on a subnet, then transmitting the package from that machine to others on the same subnet might be more than I know how to accomplish.)


Still, it’s fun to think about…



admin Windows Administration , , , , ,