Service Outage

Earlier today AU experience an extended service outage affecting the forums, blog and wiki. I was not immediately reachable, so I was not aware until sometime after 2pm, unfortunately. The web service was down, and the database was not responding, a restart of those services restored the system to a live status.

Early this morning (0305h) a file integrity check ran according to it’s daily schedule. Due to an inconsistency on the /var volume, the integrity check ran into some problems and consumed too much space on the volume. Other tasks that were also scheduled in the same time frame began to fail and email alerts were generated and sent to me, which I saw this morning. Upon checking the system I noticed the issues with /var and decided to dismount /var, run a disk check and remount it. While successful, the process created over 10000 files in lost+found that I needed to review. In the midst of that tedious process, I got tasked with dressing a freshly bathed toddler and prepping him for a day out with his parents. I completely forget to come back and finish what I had started. So the simple cause of this whole debacle was simply…

…human error. Mine.

Sorry folks, it’s all back up and running now… but I still have 9900 files to vet.

A Lesson in Cloud

A recent Threatpost article where Greg Hoglund comes pretty close to ranting about the actions of Anon and that they did not “hack” HBGary – they just weaseled their way into the HBGary Google account – does illuminate some of the issues with using Cloud services.

You see, Greg tried to do some damage control upon discovering some level of intrusion was underway, but had to go through a Google call center in India where he got no love.  In the article Hoglund has a few pointers of his own, but I would advise doing your research and consider what all the possibilities are.

If this had been a physical server in an accessible location, a sysadmin could isolate the affected system, remove it from the network, image the drive for forensic purposes and start the incident response machine.

I remain ambivalent about the cloud – it’s not all lollypops and candy canes.

In Case of Fire…

Preparedness – probably not something often thought about by many, unless you are a boy scout.  Of course, when you start thinking about it, you likely have a smoke detector at home, maybe one at the office.  Maybe you also have a fire extinguisher hanging on the wall near the water cooler – has it been tested?  Does anyone know how to use it?  Does everyone?  Do you have a designated employee with a certificate in first aid?

The recent events in Japan – a 9.0 magnitude earthquake, subsequent tsunami and ensuing nuclear emergency has brought to light the need for both personal preparedness and disaster planning for your business.  Planning out the steps to maintaining business continuity when the unforeseen happens is not as simple as it may seem, but with a little research and some preparation ahead of time, having a plan may be worth it’s weight in gold should the need arise

Not All Disasters Are Created Equal

What constitutes a disaster?  Well, in this context the term disaster is going to encompass any event that can interrupt regular business operation.  A fire, a broken water main, a viral outbreak, in fact one of the first steps in your planning is to identify all the possible scenarios so you can develop a plan that covers most if not all contingencies.  Things to think about are local factors, such as geography (floods, fires, earthquakes, mudslides) and local hazards (chemical plants, power stations, refineries) and weather (heat waves, ice storms, tornadoes).  You may also want to consider human activity, civil unrest, terrorism, and so on.  Those are the big events, things causing local or regional issues that can interrupt your business.  Now narrow things down to events within your business, a wayward car in your lobby, an electrical fire in the closet, bad shrimp at a company brunch.  Once you have a good idea of all the things that can throw a wrench in the works, you can start to develop a plan to counter all those possibilities.

Man Overboard

In much of the disaster planning documentation I have read – there is one thing I don’t hear enough of and that’s the human cost.  Save the people first – people are more important than anything else, put their safety first.  The first part of your plan should be about what the staff should do, egress plans should they need to exit the building, rally points for them to gather in to make it easier to get head counts and manifests and seating maps to assist first responders in locating those who may be missing.  Delegate responsibilities, if the building has to get evacuated, who is to grab the critical papers from the safe?  Where is the latest backup tape?  Should any equipment be shutdown gracefully?

Keeping the Lamps Lit

How are you going to continue to serve your customers, possibly in a reduced capacity, when you have limited resources and perhaps no brick and mortar location?  What is your tolerance for lost days – do you need to be up and running in 24 hours, 3 days, a week?  Do you have any spare equipment (servers etc.) set aside at another location that can be brought online to take over.  There are organizations like Sungard, who operate Availability Services, and can provide work space for your staff, as well as equipment.  Not everyone likes or can afford this option though.  For most of us, the key components are a TAM file server, workstations and/or a terminal server and working phone lines.  You may also need a domain controller, printer, a SQL server etc.  This is definitely an area where Virtualization can really shine – a single VMware or Xen Server can, in a pinch run all your essential services.  The downside is having to budget for a server that sits around and does nothing and with any luck, never will.  If you have multiple offices that are geographically dispersed, you can setup a DR data center at one of them and even have the spare servers put to some use.  There is also the task of re-routing phone calls, if you cannot call forward,  or otherwise route the calls to another location, your telco can likely help.

There is definitely much to think about and getting the plan drawn up will take some time and when it’s all said and done you can pat yourself on the back and hope you never have to use it.  You should, however, pull it out once a year and review it to see if anything in your business has changed significantly enough to warrant a revision of the plan.  It would also be wise to ensure there are multiple copies in the hand of key individuals and at least one off-site.

Where to get more info:

I follow @Get_Prepared on twitter, the companion to (Canadian)

FEMA in the US DRP services

Also your local TV and radio news


Replicate Me!

One thing I deal with all too frequently these days, is a rather unique and overly testy MSSQL db server.  I have the distinct displeasure of being thrown to the lions as the resident pseudo-DBA, since there is a lack of a real DBA.  Unfortunately, my years of MySQL experience didn’t prepare me for the quirks of MSSQL, it may actually be putting me at a disadvantage because I get into situations expecting things to work and it seems that more often than not, things do not work as expected.  Take replication for example, it takes me about 30 minutes, tops, to get MySQL replication going – I can pretty much say I’ve worked on this particular MSSQL instance for almost a year and it’s still really not functioning as I would expect.

How Replication Should Work

Logic would dictate that once you are replicating a database, queries executed on the Master (MySQL terminology) get logged to the binary log, those transactions are transferred to the Slave and executed there.  As such, all queries, including things like DROP TABLE and DROP DATABASE take effect on the SLAVE shortly after being executed on the MASTER.  On the MySQL databases I admin, that is exactly what happens.  Not so on MSSQL – in fact, the replication is forever crapping out for one reason or other.  A recent attempt to update a particular application that included some SQL db restructuring failed with an oddball error that it could not execute a DROP TABLE because that table was being replicated.  Well, duh… then drop the table already and lets move on.

I also find any minor glitch or hiccup puts MSSQL replication into a tizzy where out of spite it will choke and stop replicating.  Apparently replication can also go stale… like a bun left on the counter.  Where a MySQL replicated database can suffer through all sorts of interruptions between the MASTER and SLAVES, and just catch-up once the connection is restored – on numerous occasions I have had to redo MSSQL from scratch because it has broke.

What is Missing in MSSQL

Those who favour a GUI tend to poke fun at those who use the more archaic command line – but within seconds I can exec SHOW MASTER STATUS; and SHOW SLAVE STATUS; and instantly I can tell you if MySQL replication is not only working, but if the DBs are sync’d.  MSSQL and it’s Studio Management seems to have all sort of bells and whistles, wizards and monitors – but it’s never very clear what state the replication is in.  In fact, I have had incidences where the monitors have said replication is stopped when in fact it was still running.  I think a nice Redmondian touch would be a traffic light for replication status – red, yellow, green, so at a glance you can see if more attention is required.  If MSSQL has an equivalent to the MySQL “Log Position” records – that would also be handy to have.

In short, I think Microsoft can do better.  I’ve read numerous blogs and articles and reviews telling me how wonderful MSSQL is and in how many ways it is superior to MySQL, and possibly in some areas it may be superior – but it is also sorely lacking in some basics it seems.

Oh, and if anyone is an MSSQL DBA and does work on the command line (isql?) – please give me a shout if you know of any useful commands for getting replication status!

Making the Web Work for You – Part 5

Getting Social

Now before you cast aside the idea of getting involved in any kind of social networking, take a minute to comprehend that what in one hand can be a time-waster and security nightmare you try so hard to keep your users away from, can on the other hand be a powerful business tool when handled properly.  Consider this – Twitter has over 190 million users, Facebook over 400 million – can you really afford to ignore an audience of that size?

Facebook, Twitter and LinkedIn are powerful Internet tools that cost nothing to leverage and can put you in touch with a much larger client base that an ad in the local paper.  These technologies can also interact with each other, your blog and your website and doing so really starts to bring things together.

The media industry really gets this and is very engaged in Social Networking, but you also see many other industries catching on.  The automobile manufacturers are making good use, textiles, fashion, fast-food, you name it.  If you take some time to look, you will also notice a fairly strong presence from the Insurance industry, and it’s growing.  The simple fact is, there is a growing population of customers and potential customers who look to connect through these channels.

While integrating these services with your website is a “nice-to-have” – it’s not critical, each can be leveraged in a standalone manner.  You can also cross-integrate things like Twitter and Facebook.

The only caveat I would bring your attention to, is the same one that goes for your website and blog – be mindful of the content you post.

*Sigh* What’s ‘Google’ for BS?

This stuff just won’t die – but what is even more annoying than the never-ending banality of this journalistic tripe, is the mind-numbing notion that Google does anything inadvertently.  In the words of Johnny Rotten ‘Bollocks!”.  You don’t inadventently War-Drive custom network-sniffin’, gas-sippin’, picture snappin’, Google-mappin’ vehicles through 100-some-odd countries by accident folks.  How freakin’ dumb do you think we all are.  Sure some of us Google how to make a bomb from household items, but not all of us are drooling idiots!

Fess up – for crying out loud, you’re a search engine company, you’re freakin’ nosy by design, you’re the old spinster that can’t help but watch the goings on of all the neighbours which you then regurgitate every chance you get.

I don’t know about the rest of the netizens, but I’m calling BS on this one.

Making the Web Work for You – Part 4

Sprechen Zie Blog?

Although your first reaction might be ‘Why would I want a blog on my website?’, a better question is likely ‘Why wouldn’t you?’.  You have yourself a well-designed and functioning site, you have done your due diligence with respect to Search Engine Optimization and you are keeping tabs on your site with some statistical analysis, these are all good things, but a blog can be a very nice addition.

From a business perspective, a blog is an easy way to reach out to your clients and visitors and provide up-to-date information on things that may affect them.  It provides your site with dynamic content that is useful and interesting, and also is a means to drive both new and repeat traffic to your door.  You do not have to be James Joyce to write a blog, entries can be short and to the point and should be written like a newspaper – in very plain English.  Content can highlight industry changes, your company’s activities in the community, or provide tips to homeowners like how cleaning your gutters prevent ice dams in winter.

Adding a blog to your site does not need to be a technical nightmare – if you have an in-house web server and the expertise, then deploying WordPress is not much of a chore.  If that is too much of a technical undertaking, you can rely on a hosted solution – and are both free and a decent web designer can integrate these solutions into your site quite easily.

I really cannot stress enough how big of a bang-for-the-buck a blog can be.  The only caveats are – be careful what you put in print, have your entries proofread before you publish to avoid any embarrassment or damage to reputation; and for those hosting their own, stay on top of the blogging application updates and patches.

Another aspect of blogging I wanted to touch on, even if you don’t want any kind of blog on your site at all – you may be able to contribute to other blogs.  In return for your content, perhaps you could throw a link back to your own site in your posts.  Something like that could even be done here on AU or over at the ASCNet Community, the drawback to the latter being that it’s not accessible to the public, so the traffic bonus could be limited, but it does all depend on your intended audience.

GMail Security Checklist

The folks at Google have been nice enough to create a checklist to help you secure your system – it’s mostly a collection of best practices known to those in information security, but maybe less obvious to the general user populace.  Either way, it would be beneficial for any GMail user to work through the checklist and tighten up their defenses.

The GMail Checklist

Why SORBS sucks.

I like RDNSBLs – they are extremely useful, and when used properly, they can reduce your SPAM intake by 90% or more easily.  When they don’t work well though, they kind of suck.  No, actually, they really suck.  One big problem with SORBS is it’s overly aggressively blacklist of supposed dynamic IP addresses – many of which are not dynamic.  Add to that the mandatory registration process required to de-list an ip, the molasses-like slowness of their website and their moronic use of a self-signed SSL certificate.  I would consider it a joke if there was humour to be found in the situation.

While the use of SORBS might offer some SPAM reduction – I do not think it is worth the additional hassle, there are plenty of other perfectly good blacklists out there to choose from – SpamCop, SpamHaus, UCEProtect being a few.

SORBS was also acquired in 2009 by GFI Software.