How many times have you need to import raw data in your database? In the company that I work (Cloudbiz ), this used to be a very common scenario. As a company that specializes in CRM systems, we had the responsibility to launch a new CRM model and also try to import (after doing some data healing) the data from a legacy system. Very often that legacy system used to be Excel (!) or another System that could only export data to text files which we had to process and import them to our system. Also, many times the process had to be run over and over (not just once) so the import process was look like more than an Integration process instead. Those that are familiar with this kind of Job should know how tedious this is and how much effort it takes to do it the right way.
Hopefully many things have changed since then. New tools have arisen that promise the flexibility and the “RAD” factor that was missing all those years in the field. Within this post I am going to show you the ease of implementing on a basic Data Import Scenario. I am going to demonstrate you how to use Apache Camel to import data from a text file to a REST web service. At the end of the post you will also find the source code of this demo.
Customer X has a text file that has his existing customers. Since the platform that he just bought from you also keeps customers, he asks you to import the old data inside the new database. Your platform’s API is implemented with REST web service that handle JSON objects.
Approach A (the bad)
Many people in order to solve this problem they would have written a new program that would read a file and line by line, marshal the line to JSON and then do an HTTP request on the web service. This solution although it might seem simple or optimal at first sight it is actually the worst for the following reasons.
– It is not clean. You do this and after you complete it you feel dirty. Most probably on your next confession at the priest you will mention this.
– It is not reusable. If you had a new customer with a similar problem then the only way to reuse it would be to copy paste the whole project and do the modifications accordingly. This way most probably lead you to Copy paste errors and all of us know evil they are.
– The process is bound to the problem. If your next customer for example has the old data to some other form, i.e. another database or even another WS, then you either must rewrite the half of the code or must create an additional program that first transfers the old data to text and run it before.
– You have to reinvent the wheel many times. You have to create the read file process from scratch or do the http request with all the error handling involved.
Approach B (the expensive)
A well-grounded developer most probably would approach this problem differently. He/she would design the solution with a required level of abstraction in order to be reusable for the next customers without having to delete not a single line of code. If I had to deal with that, I would have done the following. A) Instead of hardcoding reading from file and sending to REST WS, I would have written something that reads from endpoint 1 and sends the result to endpoint 2. Both of those endpoints should implement the same interface and have similar behaviors. With this approach if the next project were read from a rest WS and write to a file then I should only implement the extra functionality leaving my previous code untouched and reusable. Also the data those 2 endpoints are exchanging should also be abstract. This solution although it seems more proper it has its disadvantages.
– You have to reinvent the wheel many times. Like the previous one you have to create the read file process from scratch or do the http request with all the error handling involved.
– It is too expensive. Most of the times you are not going to have the time to implement something like this. Also the customer did not pay you to create the UBER import tool module and almost certain the expense of creating it ends up to your company.
Approach C (the good)
What if someone told you that the program the developer in Approach B did already exists and does much more things that you already need? Wouldn’t be great? Hopefully that tool truly exists, it is open source and it is considered the easiest to use from all the rest of its competitors – mule and Spring Integration (read this article for more ).
Having our tool ready we should only focus with things like what is the file name or what is the web service url and stop caring about how to open the file, when to close it.
In order to show you how easy it is here is a sample Apache Camel project that reads records from a csv file, transform them to JSON and sends them directly to REST web service.