Archive for the 'Tech Talk' Category

PHP UK Conference 2010 thoughts

Tuesday, 2nd March, 2010

On Friday I went to the PHP UK 2010 conference in Islington, London with my colleague John Field. There were three tracks, so between us we were able to cover most talks of interest. The conference organisers have said they’ll post the slides of each talk, with sync’ed audio, online soon. I’ll link the slides for these talks as they become available.

From the schedule, the talks I attended were:

PHP UK Conference 2010 talkI forgot to mention it on my feedback form but the WiFi was very good, considering there was an army of nerds hammering on it, it was fast enough and I never lost my connection (disclaimer: I was largely using it from my iPhone).

For anyone thinking of going next year, be warned, everyone had an iPhone – I’m pretty sure they don’t let you into the conference if you don’t own one! ;)

The talks prior to lunch were, time-wise, somewhat sabotaged by the key note speech overrunning.

Key note – the lost art of simplicity

Talk rating: 8/10

Josh Holmes raised some good points in his key note talk.

If you can’t clearly explain your solution, it’s probably not worth implementing

Josh said this was particularly important to ‘Enterprise’ business, where systems that go live can run for longer than your professional career. It’s a certainty that other developers will need to maintain or modify your code.

Gone are the days of a lone programmer writing enterprise software, free to obfuscate the code to keep his maintenance contact. If you’ve not distilled your solution down to what’s needed, and nothing more, you’re guaranteeing someone else extra mental baggage.

Developers have a tendency to over-engineer solutions

Sometimes we try to future proof our designs by adding features we think might be needed but in reality are sometimes never used. Ultimately this means we just wasted our time and added needless complexity.

Temptation to throw in the kitchen sink is stronger in the early years of your career as a developer, when you’re eager to use new technologies. Experience teaches you to concentrate on the problem and to solve it with the most suitable tool (at work anyway).

We strive for this at work, Paul (the CTO) has said on countless occasions ‘do the simpliest thing that works‘.

Usability testing

This topic highlighted to me that we (developers at ASP) never meet the users of our software. Probably not a surprise in web development. We interact heavily with the editors of the product but never the end users, they sit on the other side of customer support.

Although we’re planning to address usability issues by hiring a UX Manager, we do not perform usability testing; watching customers use our software, employing heat maps to see where the mouse pointer moves or use event logging to see which features are popular and which are failing. We do listen to feedback from ALA conferences and similar.

RDBMS in the social networks age

Talk rating: 7/10

This talk heavily contrasted the key note, which talked about simplicity. I couldn’t help but feel a sense of irony in the first 20 minutes, as this talk involved some complex SQL used to efficiently traverse graphs and trees. It was fortunate I spent some time in November working with Dijkstra’s algorithm, which provided enough of a refresher to help comprehend the concepts in this talk.

Although the topic was interesting, having never seen Common Table Expressions (CTEs) or Window Functions before, the speed at which Lorenzo moved through the slides meant it was demanding trying to keep up. I doubt I was the only one but that didn’t stop me feeling like a developer sat in a room full of DBAs. I don’t think it was so much the fault of the speaker, as the morning talks didn’t get their allotted hour.

After the talk, I found that MySQL has no support for CTE’s or Window Functions. Other major DB vendors such as Oracle (with Oracle SQL’s ‘CONNECT BY’ syntax for recursive CTEs), MS SQL 2005 and Postgres (since version 8.4) do support them.

You might be thinking ’so what? I’ve never needed them’. If you’re building any social features into your products, you’d be surprised how many common features like connections distance to you on Linked in, or friends suggestions on Facebook could / do make use of CTEs.

Database Optimisation

Talk Rating: 5/10

This was one of the more disappointing talks I attended. Remo Gaigioni talked about the issues his company encountered when getting their search marketing intelligence tool to scale. The talk covered a lot of topics familiar (you hope) to most developers, using EXPLAIN to optimize queries, moving larger text fields out into separate tables, use of InnoDB to minimize locking, using replication and memcached.

There were features more specific to Remo’s application such as job queues that you might not use in your own app. Exact details were omitted but it sounded like his app was also write heavy and struggled with table locks.

The nuggets of information I got from this talk:

Replication is single threaded

Spec’ing a beast of a machine to run as a slave doesn’t make your binary log delay disappear. On a typical 8 core machine, replication is going to run off one of them. That’s not the say your slaves shouldn’t be more powerful than your master because they should – dealing with a share of the reads along with repeating all the writes from the master.

Use a work queue service for job queues (aka right tool for the job)

MySQL really isn’t the most efficient tool for job queues, sure you can share queue status information across a cluster of machines easily but work queue services like beanstalkd do a better job. Any work you can take away from the database is going to help what remains in it.

Recommendations from the audience included using MySQL-MMM in place of HA proxy for load balancing cross-replicating masters, and switching out InnoDB for the Percona storage engine.

At this point, the day felt like a mental yo-yo coming down from Lorenzo’s talk to this comparatively high level discussion.

PHPillow & CouchDB

Talk Rating: 6/10

So I felt like I’d come full circle with this talk, from promoting CTEs in a RDBMS to this talk about a NoSQL DB. I was looking forward to this as I’ve been reading up on CouchDB recently. I’d read Kore Nordmann’s tutorial on PHPillow before attending this talk as well. I recommend the free book on the CouchDB site as an insight about how it differs from the traditional RDBMS model.

Unfortunately a large part of the talk was spent explaining CouchDB, as a consequence we had to pick one of three topics to discuss in more detail: PHPillow, views and Map Reduce or scaling CouchDB.

The majority vote went to scaling CouchDB. We still had to skip a lot of slides, my estimate is around 30/70 slides due to time constraints. I guess I would have felt like I got more out of this talk if I’d not read up on the subject beforehand.

Two points I noted from this talk:

  • CouchDB has no concept of schemas, consequently changing CouchDB views is less painful than ALTER statements in RDBMS’.
  • Debugging CouchDB’s Javascript views is a bit problematic at current. Erlang, the functional language it’s written in, apparently throws ambiguous error messages – a trait of functional languages.

‘In search of …’ integrating site search systems

Talk Rating: 9/10

I thought this talk was the best one I attended, and I’ve a page of frantically scrawled notes as proof. Ian Barber did a great job of introducing the key concepts search engines employ, and how you can tailor your own site’s search to best Google’s results. He covered MySQL’s fulltext search, Sphinx, Swish-e, Lucene, Solr and Xapian (which we use at work). There are PECL extensions available for all of these.

I thought about using of one of these search engines on this site and possibly combining it with a page ranking algorithm as detailed in the free second chapter of Manning’s Algorithms of the Intelligent Web.

Beating Google at it’s own game on your site is possible because you can employ ‘zone weighting’ (see slide 44). Google takes a guess using HTML tags, font sizes and other ‘one size fits all’ methods.

Features I thought might benefit products at work; use the spelling correction feature in Xapian and investigate sharding our Xapian databases to reduce the index size to address memory consumption in our ever growing data sets.

PHP code audits

Talk Rating: 5/10

My code is far from perfect, I attended this talk after all. I couldn’t help but have a little déjà vu during this session. I was reminded of a security course I attended back in 2005, the speaker used un-patched Windows 2000 Server machines to demonstrate exploits. This was good and all but we were 5 years on from it’s release, with 5 years of security patches under Win2k’s belt. When you tried out these exploits in the Real World™, none of them worked.

The same thing struct me about the code examples and warnings being presented to us here. Don’t use eval(), avoid using register globals, be careful of include() injection via register globals. Didn’t we have this talk back in 2005 as well?

With the introduction of frameworks in PHP land, a lot of code shifted from the procedural spaghetti code seen in these slides to more structured object orientated code. Most PHP developers know that object orientation promotes encapsulation and information hiding, so you have to go out of your way to write single scope scripts that might fall prey to these issues, if (for some unholy reason) you’re still running with register global enabled (then you’re just asking for it).

Writing OO code also means I never have to worry about include() & require() statements thanks to class autoloading. I use a database abstraction layer and so don’t worry about SQL injection due to using bind parameters. I don’t remember the last time I had to use a global variable because I think about my interfaces. The same goes for eval(), do you remember the last time you used it? You might have had me on escaping output, up until the point we implemented a multi-lingual frontend.

So unfortunately this talk was disappointing too, after coming out of Ian Barber’s on a high note.

Conclusion

Would I go to another PHP Conference? Maybe, it really depends on the subject of the talks. Having never been to a PHP community event, part of my attendance was due to curiosity.

On at least three occasions on Friday talks were cut short. Maybe in future, PHP conferences could have 90 minute slots, or talks that span two slots with an interval?

The the venue was good, location was good, directions were good, the WiFi was good, the free beer was good (I had one before leaving, thanks Facebook!) there just needs to be a bit of fine tuning with talk timings.

Intel X25-M First Impressions

Friday, 19th February, 2010

Two weeks ago I swapped out my Seagate Momentus 7200.2 160GB HDD for an Intel X25-M 80GB. I didn’t want to post my impressions prematurely so held out a couple of weeks.

I took the opportunity to upgrade to Snow Leopard at the same time – as I was performing a clean install of OS X anyway. First thing I did was move my user directory on to my second hard drive (a Seagate 7200.4 320GB), to redirect writes away from my SSD.

Intel X25-M SSD

So what to make of Intel’s SSD; well boot times have come down to a tidy 20 seconds, Skype starts a whole lot faster. Simultaneous reads are noticeably quicker (think concurrent application launches – particularly common at boot time). My Seagate drive was hovering around the 90 second mark with files (images) on the desktop (Mac OS X creates thumbnail previews). Overall though, and I don’t think I’m a particularly IO heavy user, so it’s a fairly subtle difference once booted.

I’ve 4GBs RAM in my Macbook Pro so I doubt I hit the swap file often. The biggest IO waits I experience are loading eclipse, loading Open Office, loading photoshop and performing multi-file searches in eclipse (~45,000 file project). The latter runs off my magnetic drive, so this contributes to a dulled performance increase.

I got this drive as birthday present, and although I’ve been drooling over SSDs since October, I’d have to say they’re still too expensive. If I had to buy this myself, I’d have held out for another price drop. Intel’s roadmap doesn’t show additions to their consumer SSD line until Q4 2010.

Ultimately, once manufacturers overcome the performance penalty of long term SSD write usage then I’d consider solid state disks all round, instead of a hybrid set up. For now, hold on to your wallet.

A real UK keyboard layout for Mac OS X 10.6 “Snow Leopard”

Wednesday, 17th February, 2010

Thanks to this blogger who made real UK keyboard layouts that work under Snow Leopard. I couldn’t get alternative community created UK layouts, that previously worked in Leopard, to work under Snow Leopard. They would never appear as a selectable layout in ‘Language & Text’ » ‘Input Sources’. Maybe because I tried to installed them under /Library/Keyboard Layouts instead of my user specific directory (~/Library/Keyboard Layouts). The author of the post admits he encountered the same problem.

I’m just relieved to have # ~ ” \ back in their rightful places. As a developer ALT+3 for # is like pulling teeth (thank god I don’t do much Perl programming!).

What’s that? Switch back to Windows? Never! :)

Two ways around SSH’ing to different machines with the same IP

Sunday, 14th February, 2010

Like a lot of people, I ditched my desktop in 2007 and moved to a laptop as my main computer. I take my Macbook Pro to work everyday and it’s also my personal computer at home.

To make life simpler I have an almost identical network set up at home as we do in the office, using the same DHCP range and gateway address (our dev server is also our gateway in the office). So the development server I SSH into at work has the same IP as my home linux box. At work we have internal DNS set up, I’m a little more lazy at home and just refer to my linux box by IP.

A problem arises doing this though; since I connected to the office development server with SSH first, it got first place in my known_hosts file, consequently when I connect to my home linux server (with the same internal IP address) I’m presented with a warning ‘REMOTE HOST IDENTIFICATION HAS CHANGED!’ and I that I could be subject to a ‘man in the middle attack’.

SSH warning - remote host identification changed
Oh noes!

Of course this isn’t really the case, my linux server just has a different RSA key fingerprint to the office dev machine. There’s two ways we can surpress this…

First Method – disable strict key checking & tell ssh to shut up
When you SSH to a machine for the first time, you’re prompted to save the machine’s fingerprint to your known_hosts file. Your SSH client will save the hostname (if you used one) and the IP address it resolved to:

<hostname>,<ip> ssh-rsa <public key>

I only recommend this for machines on your LAN / inside your DMZ, and not across the public internet.

SSH always resolves hostnames to an IP and complains if two keys for the same IP address exist in your known_hosts. To surpress this you can use the following arguments:

$ ssh -o "StrictHostKeyChecking=no" -q hostname

That’s quite a handful to type, you can alias this command in bash to save your fingers some work.

$ alias sshq="ssh -o 'StrictHostKeyChecking=no' -q"
$ sshq hostname

Method Two – temporarily ignore your known_hosts
Using the -o switch we can redefine the location of our known hosts file to /dev/null by overriding UserKnownHostsFile. Observe:

$ ssh -o "UserKnownHostsFile /dev/null" user@192.168.0.1
The authenticity of host '192.168.0.1' can't be established.
RSA key fingerprint is 35:f8:d4:46:0e:...:9f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.0.1' (RSA) to the list of known hosts.
Linux hostname 2.6.18-6-k7 #1 SMP Mon Oct 13 16:52:47 UTC 2008 i686

While SSH has said it’s saved your server IP to known hosts, it really hasn’t as it’s writing to /dev/null.

Hopefully these two methods prove more convenient than swapping known_hosts files.

Changing User home directory under Mac OS X Leopard and beyond

Sunday, 31st January, 2010

Yesterday I finally got around to removing my flakey superdrive from my Macbook Pro and replaced it with an optical bay hard disk (OBHC) caddy (Google NewModeUS). This meant I was able to add another 320GB drive where my superdrive once lived (much needed as I was at 92% disk usage on my system drive). I also have an Intel X25-M 80GB to install but I’ll save that for a weekend when I have more time.

In preparation to moving to SSD, I’m looking to divert as many writes as possible, to prolong the performance of the drive (no TRIM support in OS X at present – no, not in Snow Leopard either). One of the best ways to do this is to migrate my user directory to my second (magnetic) hard drive. You might argue I’m losing the performance I paid for getting an SSD but boot times and applications launches should still benefit as the OS and apps reside on my X25-M.

<notice>Before proceeding with these instructions it’s vital you have a backup of your user directory and any other data on your Mac that you want to keep.</notice>

One of the top search results in Google for how to move / change your user directory in Mac OS X recommends using a symlink. There is however a cleaner method documented back in 2002 by Dan Frakes.

First, make sure that on the new volume, the “Ignore ownership on this volume” setting—in the volume’s Get Info window—is not checked

Open Terminal and enter:

$ sudo ditto -rsrc "/Users/greg" "/Volumes/newvolume/Users/greg"
$ sudo niutil -createprop / "/users/greg" home "/Volumes/newvolume/Users/greg"

The problem being that niutil no longer exists in OS X 10.5 (Leopard) and later.

$ niutil
-bash: niutil: command not found

In Leopard and later the NetInfo Manager has been replaced the Directory Services client (dscl). Since Mac OS X 10.4 ditto also has the -rsrc arguments enabled by default.

The commands you’re looking for under Leopard (and Snow Leopard) are (substitute your own username instead of ‘greg’, and here I’m moving my user directory to my second hdd/volume ‘Data’):

$ sudo ditto "/Users/greg" "/Volumes/Data/Users/greg"
Password: (enter password)
$ dscl
Entering interactive mode... (type "help" for commands)
 > cd /Local/Default/Users/
 > ls
 > change greg dsAttrTypeNative:home /Users/greg /Volumes/Data/Users/greg
 > exit

Logout and login again, verify your home directory has moved by opening Terminal and checking the public working directory:

$ pwd
/Volumes/Data/Users/greg

Now rename your old user directory to be sure, before removing it (or leaving it if you can afford the space):

$ cd /Volumes/OSX/Users/
$ mv greg greg.old

I rebooted after this to check my machine started up without any issues. Then removed my old user directory (sudo rm -rf greg.old).

Congratulations you’ve moved your user directory.

Update: When starting Eclipse, it could not locate my workspace but I only had to browse to my new user directory (/Volumes/Data/Users/greg). VirtualBox automatically updated the path to my images created with their wizard (presumably because they use a system variable for the path). The VDIs I had manually selected required my assistance to update their location in the Virtual Media Manager.

You are currently browsing greg's weblog – the more I learn, the less I know archives in the Tech Talk category.

Categories

xhtml 1.1 compliant   xhtml 1.1 compliant