Moving Data to Drupal / Ubercart

Update (August 2011): We’re now recommending that people try using the Migrate module (http://drupal.org/project/migrate) first, before trying a custom-coded import solution. Use with Migrate UI, and check their list of required modules carefully (depends on version of Drupal you’re using). The solution below is for situations that Migrate can’t handle well, or where you need to do imports that tie into custom modules that have their own data tables. We’ve also built a similar importer as a Drupal module (its highly customer-specific – if there’s demand we can look at generalizing it) which can work well in situations where you need a quick upload and import inside the admin panel with a well-defined data layout (i.e. a CSV product list).

The following blog entry describes the solution to an issue encountered by Jeremy and Nathan in moving data from an old website to a new one. Based on a brief search for a solution, it looks like many other people may find a use for the (admittedly rather crude) code that we wrote to solve the problem. We hope this helps!

We’ve recently had to convert data from an old custom-built shopping cart, to a new website based on Drupal and Ubercart. The old website had approximately 5000 products, with a data structure that was completely unlike the one used by Drupal to store data. We initially tried using the Drupal data loader module, but as other people have discovered, it isn’t necessarily a good fit for loading Ubercart product data, particularly when the products have images, or – and this is especially the case – custom data fields.

You can download a copy of our script here.

In the end, we put together a custom php script to pull data directly from one database to another. The script is fairly crude, but it may be useful to somebody trying to import data into Ubercart from another non-standard system.

As a result, I will link a copy of the script below. Please bear in mind that a) we accept no responsibility  whatsoever for its use (i.e. back up your databases first!!!), and b) while we have documented our assumptions, this is a crude script that is specifically designed for our particular needs. Be that as it may, you may be able to find a use for this.

Additionally, it was a pain just getting a list of the Drupal and Ubertcart tables that needed to be looked at in order to do the data conversion. The list below may be useful to you as well.

How it works:

The old shopping cart had two relevant tables: a category listing (hierarchical), and a products table, which included some customer specific fields, as well as a single url to a product photo.

Drupal/Ubercart, on the other hand, uses a large number of tables to represent the same data. This is because products use the same node-based system as the rest of Drupal, in addition to a number of product specific tables.

In addition, product images are dealt with using Drupal’s file cacheing system, which is relatively complex in nature. All in all, our data goes from two tables to approximately nine (depending on your specific needs) in Drupal.

The script has some handling – very buggy still – in place for keeping track of the current position in the old database. This means that you can run the data conversion in small chunks, checking at each point to see how well it is doing. You can also set how many records to convert each time. In order for this to work, a very simple table to track the record pointer needs to be added into the old database.

We did not cover the conversion of product categories, since typically there are a much smaller number of categories than products, and in addition, the Drupal data loader does a reasonably good job of loading them.

List of tables involved:

  • node – in Drupal, everything is a node, including products. Each new product record will need a corresponding node of the correct type.
  • node_revisions – this table contains much of the actual textual data (i.e. product description text) involved in displaying the node.
  • content_type_product – if you need to create custom fields for your Ubercart products, the data will go here. Our code in this section will probably not map directly to what you’re working on.
  • uc_product – where the product records “live”. There will be one record here for each of the product nodes.
  • uc_product_stock – contains stock-related data (i.e. how many of the product you have on hand etc) for products.
  • files – each product image needs to have a record in this table, which stores the file names and locations – but does not map these images directly to the product records (you need another table for that!).
  • content_field_image_cache – this is the mechanism that ties a product image to the underlying product. We didn’t really try to work with Drupal’s actual cache mechanism, so after you have loaded the products, you may want to clear the cache.
  • term_data – this is where product category information resides, based on Drupal’s vocabulary / hierarchy system. We hacked the solution in a fairly inelegant way – by doing a category name lookup to retrieve the term id.
  • term_node – a connection table between a term (i.e. a product category) and a node (in this case the products).

Base assumptions:

  • We assumed that the old and new websites both have mysql databases.
  • Both databases will probably have to be on the same server (or at least one of them needs to be externally accessible if that isn’t the case)
  • The script should be uploaded and run from the new website’s location.
  • We assumed that there is only one product image per product – although this would be fairly easy to modify.
  • We manually added our categories to the new website
  • We manually moved product images across – there are comments in the script where you could automate this though.
  • Error checking is extremely rudimentary; if this scripts hits a problem it just stops.
  • If you hit data conv snags, the fastest way to proceed is to restore your backup database, reset the pointer record, and start over.

Microsoft – Twitter Deal

Nathan forwarded me this link from Mashable, with the subject line prefaced with the word “HUGE”.

From what I can tell, it looks like Microsoft is finally starting to put together the pieces of an overall web strategy: determine what Google would like to do and put roadblocks in their way. Hence the previous Yahoo deal.

Its obvious far to early to see if this helps them out. I’m fairly sure though that it means search engines will be displaying a lot more “current” or trending data pulled from profiles and micro-blogging posts.

Reblog this post [with Zemanta]

Coding Practices at Large Companies

I just had an interesting email exchange with one of my newer staff, a friend from university who worked for [insert name of company] for a number of years. Aforementioned anonymized company being a Fortune 500 company that is in the IT industry. I’ve got stuff with their logo on it in my office.

The conversation began when he asked if he could use a goto statement (in PHP code!) for error handling.

Bearing in mind that this is somebody who is extremely familiar with both Object Oriented and good coding practises, I realized that there must be an interesting story underlying this.

His response to my query for more info is informative:

Tease all you want — I’ll lean on the weight of nine years of experience at [big company name], where (gasp) gotos were ubiquitous (almost exclusively in error handling code, but still).

To clarify further: I actually wasn’t aware that PHP had a goto statement (see: php.net/goto – they have the nice xkcd cartoon in the comments). I’ve been coding in PHP for a long time.

There’s two methods that I usually use to handle errors in PHP code, in case you’re wondering:

1. Make sure that code that can crash is encapsulated in a nice neat function. Check return values of function calls inside the function. If necessary, stick an “@” before function calls that tend to crash in a messy manner. Then return useful info about the final state from the function itself, and check things out higher up in the stack.

2. Stick try/catch code around code that can crash. If necessary, subclass error classes and put in nice handlers for them.

In both cases, make sure that the error level for reporting is appropriate, and that we don’t output actual error messages back to the end user. Where useful, put in logging, and possibly put in code to email error reports back to admin.

Reblog this post [with Zemanta]

Talk To Me – Sort Of!

I just created a “Cyber Twin”. Basically its a (slightly snarky) chat bot that is programmed to mildly ape my mannerisms.

You can talk to it here: www.mycybertwin.com/jeremylichtman.

It isn’t well trained yet though, so don’t expect too much.

Their more sophisticed commercial level AIs are pretty good at holding down the fort while all of the sales people are busy. Its an interesting idea. Takes a lot of work to train them though.

Managing Multiple Projects

Waterfall Model

Image via Wikipedia

I’ve discussed project management recently with a number of people who work in more “traditional” software development issues, where projects tend to be large and involve many people working on a project for long periods of time. They often give me odd looks when I tell them that typically my company has around 20 projects on the go at any point in time, with an average length of well under a month.

Bear in mind that these are actual projects, not “operational” things like supporting existing software or running an SEO campaign.

I’d be interested in discussing how to manage this sort of situation with other people – what to do when all of the traditional project management tools go right out the window; how to avoid stressing out staff by making them switch back and forth between many different tasks etc. What kinds of tools do you use to track large numbers of very short projects (I don’t have hours usually to set up a file in MS Project or other similar tools – I write quick checklists on a notepad and then wander from desk to desk)? Is anyone using agile techniques (especially controversial things like two people per screen)?