Recently the subject came up on the LRUG mailing list of the easiest way to download multiple images quickly. The conversation started with Typheous and ended with shelling out to the command line with wget and parallel… All good and solid solutions.
However there’s one solution I’d like to go into that involves thread pools. If you’re lucky enough to not be constrained by dogma and are able to use JRuby, then you’ll have access to the lovely java.util.concurrent library.
Wait, did I just use the word Java? It’s OK, don’t panic. We’re not going to write any Java per se, but we are going to use a battle hardened library that has such niceties as thread safe data structures and high level concurrency semantics such as thread pools.
Threaded code can be hard to write/debug and the guys in JVM land realised this a very long time ago. Rather than just bitching and waiting for Scala to be the new standard, they instead developed j.v.c to ease the life of developers everywhere.
In order to demonstrate how easy it is to get some concurrency utilising thread pools, I wrote an extremely simple/naïve script for concurrent file downloading.
This is literally just scratching the surface of j.u.c. More JRuby/concurrency related subjects to follow in the coming weeks.
So my wife told me I remind her of Archer. Eerrrm, pretty sure that’s not a good thing. Not a good thing at all…
Apart from taking pictures and programming I’ve recently re-discovered my love of reading. I often get asked what do I read, so I thought I’d share what’s currently on my reading list.
This weekend’s book is Mastermind: How to Think Like Sherlock Holmes which I picked up yesterday when it was released in the UK. It had rave reviews previously in the US, so I thought I’d give it a go.
So what’s the premise of the book? Well it’s mostly in the title. It explains how using a variety of techniques a person can achieve Sherlock Holmes like abilities of observation and intuition.
I’m currently half way through the book already and I love the writing style. Rather than being just a pure factual book, the author has a lovely way of presenting facts intermingled with stories. Definitely a fantastic read so far.
It’s been a long time since I’ve written a tech based article, and as part of my New Year resolution I’m going to rectify that now.
There’s been quite a lot of discussion about software architecture within the Rails community lately. I’ve got to be honest that I’m slightly confused by all the blame towards Rails itself.
While some of it is justified I do think a lot of the problems boil down to people’s mentality about software architecture in general.
People are always coming up to me to talk about the scalability of the architecture of their infrastructure. How they’re using the latest in NoSQL hotness, their new in memory queue or their gridded web caching layer. If I then ask them how easy would it be to swap out a technology or piece of functionality, their eyes glaze over and suddenly the conversation stops.
Scalability of software engineering is something that’s not always so well understood. People will tell you that they don’t have problems because they’re “Agile”. Heck if I ask any startup in London how they develop software, they’ll say they’re “Agile” and ream off some buzzwords such as SCRUM, Kanban, XP, TDD, Lean etc.
Ah, if only writing maintainable and scalable software was just about processes alone life would be so much more simpler. Before I go further, let me just say that when I use the word “scalable” in this particular context I’m talking about being able to consistently and predictably be able to maintain, rework the code and add features if required in a timely fashion.
Any engineer whether you’re working in the Rails ecosystem or other has felt the sheer joy of working on a new green field project. Bashing out code and features to the client on a daily, sometimes hourly basis.
Assuming you understand the domain you’re trying to solve your work-flow might go something like this:
- Write a test/spec
- Write just enough code to make the spec pass
- Refactor then onto the next
- Wait for CI to pass then deploy (You do practice continuous deployment right?)
Sounds simple? Yes. For a while at least. Though slowly things begin to start becoming, well… Slow.
You’re still following the above steps but it’s becoming harder to work with your code, even though you have tests in place that help with refactoring. The tests/specs are running oh so slowly. It’s slow for new engineers to the project to figure out how every thing’s all wired up. When it comes to estimation you start measuring things in weeks rather than days.
We’ve all been there.
So to start 2013 off with a bang here are my 8 things to be mindful of when writing your next web app.
1. Know when to use Rails over Sinatra
You’ve seen it time and time again. Someone has a falling out with Rails, be it philosophically or funnily enough because productivity on their last app they wrote didn’t scale very well. What they do next? I’ll tell you what they do. They replicate Rails in Sinatra *face palm*.
Now Sinatra is good for a lot of things. Simple API endpoints, Email collection forms, encapsulated mini apps that can run stand alone or as a Rack app in your Rails application and prototyping simple features etc.
You know when Sinatra isn’t a good fit when you start replicating the Rails stack in your in your app. Rails does quite a lot for engineers that we take for granted.
When I moved from the land of JEE (or J2EE as it as called back then <strokes neck beard> in 2005 one of the things I loved about Rails was the fact that I could concentrate on the tasks of solving business problems. This was in stark contrast to where I would previously spend a week just wiring up a base app for a Java project.
People seriously need to think first what their app is trying to do and whether it would really be better solved with Sinatra.
What will you be spending most of your time solving and how will it evolve?
2. Is it really a prototype?
A while back I was working for an agency which apparently was big on Agile and Lean. Great I thought, kindred spirits and all that.
Everything seemed fine until one day when was re-assigned onto a large, high profile project that I brought up the subject of how the code would soon become unmaintainable because of how the Rails models where designed and shared between multiple apps.
The tech director looked at me and asked “How would you have done it?”. To which I replied “Have a true service orientated architecture. Define contracts between each application with a version-able and formalised API. And more importantly separate your business logic from persistence as it’s that what you’re really trying to share”. He stayed silent for a minute before answering with a disconcerting answer “It’s fine, it’s a prototype. The client knows this and in a few months time if it’s a success we’ll get budget to do a re-write”. This was on a large multi month project costing hundreds of thousands of pounds. I can tell you now the client did not consider this a prototype.
Believe it or not this wasn’t the first time someone has said the above to me, nor will it be the last.
OK, so I love prototypes. There is absolutely nothing wrong with them. They’re a fantastic way of solidifying ideas and testing hypothesis.
I do however have a problem with people who don’t understand what a prototype is and how it may evolve in the future. Does the client really understand that if it’s a success that you’ll need time and more budget to re-implement it? If not have you put enough up front thought about how you’ll evolve it beyond a prototype for the business?
In nine out of ten scenarios the answer is no. There are no tests so you’re having a heck of a time trying to shoe horn a test suite onto it and you’ve made some OK design choices that are fine for a prototype but have been done in such a coupled manner to make it hard to swap implementations in the future.
Just to re-iterate, I don’t have a problem with prototypes. Most of the time that goes into writing code is actually understanding the problem domain you’re trying to solve. Prototypes and spikes help you do this. I just wish people understood that they should either throw said prototype away or put enough in place to allow them to evolve the prototype in a painless way. And most of all, let the client/owner know what a prototype is, that either you can get it done very quickly but if it’s a success it’ll be thrown away and started again. Or deliver it a little bit later but in a state that can be built upon in the future.
3.Separate persistence from your business objects
This has been quite a popular subject in the Rails community of late. I won’t go too deeply into it but the gist is this; ActiveRecord tightly couples your schema to your model muddying the line between business logic and persistence. Add to that, that the everything seems to go into the model. Oh and did I mention that they also double as DAO’s for presenting information to the view? Before you know it you have yourself a case of a GOD object.
There are so many reasons why the above is bad, breaking the rules of SRP, fat models which isn’t necessarily a bad thing in the beginning but can become unmaintainable later on as it’s taking on too many responsibilities.
From a persistence angle, a small example of this problem would be when trying to speed up slow running wired tests. You’ve decided to mock/stub out all the AR stuff to speed things up. You’ve then probably later found after making a fundamental change that “should” break everything that your tests still pass?
OK, so if you cancel out the obvious fact that when modelling you may have not replaced the mocked object with a concrete one once it’s been implemented. You’re still left with the problem that you’ve mocked out AR and have probably made quite a lot of use of the AR life cycle callbacks that are not only concerned with the object life cycle, but persistence as well. And it’s the persistence part that you’ve mocked and so need to jump through hoops to have a sane test suite.
There are lots of ideas of how best to solve the coupling problem between objects and persistence. They all solidify around the idea of moving your behaviour into mixins that don’t need to rely upon persistence and so a) can easily stub that part out if need be and b) be tested independently outside of AR.
Ideas such as DCI get around the callback problem by having another object called the Context which orchestrates the interactions. For example if some analytics where to be triggered after a user signs up, traditionally you could put that in an observer or an after create hook on an AR model. With DCI you’d put that in a Context object that handles creation of the signed up user and pushing off a payload to the analytics service.
A little off topic but still on the subject of SRP, I’ve noticed some people take SRP a little too literally. As in contexts that do only one thing, like literally ONE method type of thing. What they should be doing is grouping things that relate to a *single* responsibility.
4. Cron must die aka reduce the moving parts
It’s 2013, is this really as good as it gets? Seriously? I can appreciate Cron for its simplicity but for non trivial apps why am I still seeing this used.
In a clustered setup which server actually executes the tasks? If that server goes down, which server gets promoted to run those tasks? Do you have to wait for another instance to be spun up before those tasks are run?
PaaS companies like Heroku (who I luuurvvee) provide services that abstract these decisions away from you. But what if you’re not able to utilise a PaaS that does this, what are your choices?
If you’re lucky enough to use JRuby you can use Quartz on Tomcat /Jetty. In fact I remember there being some wrappers for Trinidad that encapsulated Quartz for you and so simplified the setup. My personal favourite though is Torquebox as the scheduling solution is “cluster aware”. Yep, you’ve guessed it. If one of my nodes in my cluster goes down, the other nodes in the cluster figure out between them who should be promoted to run the tasks. All this without having to write a single line of Chef to handle configuration of a new “task running” node.
While on the subject of reducing moving parts and Torquebox. Do you use Memcached? A Separate queue/messaging process? Are you having to separately monitor each process? What about deployment and cluster configuration? Does it support auto discovery of services without restart or do you need to run a Chef recipe that updates configuration then bounce the above services? A standard MRI, Memcache and queue solution in a clustered setup is starting to look a bit antiquated right now isn’t it? Oh and with JRuby you’ve got no GIL. Can you say better utilisation of server resources? Just saying. Can’t use JRuby for some unfortunate reason? Try Rubinius, since they replaced the GIL with finer grained locks in the Hydra branch a long while ago you get real threads and no GIL to get in your way.
5. Evolve SOA architectures
This is a short one. Every man/woman and their dog knows that separating a large application down into smaller encapsulated services helps maintainability. Of course splitting up your app from the start into separate services without thought can have the opposite effect.
How do you know which parts of the app really warrant splitting out into a separate service? If you get it wrong you’ve now multiplied your problems by N * number of services.
My advice is to start with one monolithic app first. See where your pain points are, then highlight good candidates for extraction. Finally make sure your business logic is cleanly separated away from persistence i.e when you extract a particular area of functionality out as a service you’re porting the business logic, not your database coupled models.
6. Documentation isn’t your enemy
There’s an old meme that always makes my eyes roll when I hear it. It goes along the lines of how documentation always goes quickly out of date and how good code should be self documenting. Ergo documentation is useless.
Heck, it even says so in the Agile Manifesto. At least that’s what people who don’t know what they’re talking about try and jam down my throat during a debate.
Do you know what it really says? Well it definitely doesn’t say no documentation. In fact it says “Working software over comprehensive documentation”.
Yes, COMPREHENSIVE documentation. And when it says comprehensive documentation we’re talking about the dull 300 page type that references other paragraphs in a circular manner so as to cause confusion and setup blame structures. After all if it says so in the spec then it must be person X’s fault for not implementing it correctly right? Right?
Think about the best Open Source projects you’ve ever worked on or used. Did it have documentation? A decent Readme? Tutorials? Maybe some RDoc/Tomdoc? Do you see where I’m going?
Now imagine being an engineer, it’s your first day on the job. You’ve looked through the test suite so have a rough idea of how things work. Of course things are a little bit obfuscated, after all there are 2+ years of tests there and while the teams been quite religious about deleting deprecated code they haven’t been as careful with their test suite. You know what would help you be really awesome and hit the ground running (apart from a good Pair buddy)? Some documentation. At least some to get you past the knarly or less obvious points of the app.
A few months back I was part of a good sized team that hadn’t worked together before. We had 6 weeks to deliver a production piece of software that had to service high volumes of users. One of the things I had to do was expose a non trivial REST API that could service not only a single page application for web and mobile, but also a Flash and iPhone client.
Instead of diving straight into some code in a haymaker type fashion, I sat down and wrote some docs of what I’d like the API to look like and the client/server interaction including example Curl calls with example JSON. After I did this I was not only able to give this to the other engineers for them to write stubbed tests against and ultimately wired integration tests. But I used the example JSON payloads in my own tests which sped up development no end.
I could of written a very basic Sinatra app with stubbed out endpoints that returned canned JSON responses. Yes, that would have given the other engineers something real to work against but in reality it may have given them the false impression they were working against a real implementation and may have even discouraged them from writing non wired tests. Plus every time I changed the structure of the JSON payloads I’d need to send out an email letting them know and there’s bound to be one person that doesn’t “get the memo”.
Wait, I know what you’re going to say. I should of given them access to the tests/specs. Now that could of worked, but now not only would they have to pay attention to their own teams repo commits, but mine as well to figure out what changes I’ve made to the specs in the commit messages and diffs. Then there’s the other problem of what happens if the IOS client needs tweaking a few months down the line. I may not be around or anyone who understands Ruby. Sure, if he’s smart he could go through and grok it but it’s all billable time that’s wasted.
I guess the biggest resistance to documentation I’ve found so far has been with engineers who’ve been either lucky enough to just work on new projects and then leave when it gets tricky or are lazy (nothing wrong with being lazy, it’s why we automate everything). Remember TDD? It seems so natural and normal now and yet I remember when XP was still new that the same kind of arguments were used by people as to why it was a waste of time.
Just remember, every time you utter the stupid words “documentation is pointless” you’re just repeating a meme out of context. Even worse a small furry animal dies and we’re not talking the cute magical kind either.
7. Don’t argue against logic with verbatim internet memes
See point 6. End of.
8. REST is cool, just not all the time
I get quite sad every time I see another internal private REST’ful API with message delivery semantics implemented instead of using messaging middleware that already had built-in support for this.
HTTP is great for some things. Not this. The time you spend on kludging HTTP to solve problems like this, you could of written a few more tests or made the software a little bit better for the end user.
Just to clarify, my point isn’t about REST being “bad”. It’s about knowing to use the right tool for the job so your time is better spent.