Few weeks back I wrote a blog post about Writing RESTful Services in C which explain the use of Axis2/C REST API. Basically when you provide a HTTP Method (GET, POST, PUT or DELETE) and a HTTP URL, it is matched with a given HTTP method and a URL pattern in order to identify the operation and extract out the request parameters. For the example mentioned in the above blog, we can summarize the URL mapping like this.
Operation | HTTP Method | URL Pattern | Example Requests |
getSubjects | GET | subjects | GET subjects |
getSubjectInfoPerName | GET | subjects/{name} | GET subjects/maths |
getStudnets | GET | students | GET students |
getStudnetsInfoPerName | GET | students/{name} | GET students/john |
getMarksPerSubjectPerStudent | GET | students/{student}/marks/{subject} | GET students/john/marks/maths |
You can watch an application with this URL mapping in live, written using WSF/PHP which in fact run Axis2/C algorithms underneath.
Last week I updated this REST mapping algorithm and started a discussion about the changes on Axis2/C Dev list. I thought it would be better explain the idea on by blog too.
What the early algorithm (before my changes) did was, it search each pattern in the order it was declared, and returns when a match is found. Sequential searching for a matching pattern can reduce the performance as the number of operations grows. So my solutions was to keep the url pattern in a multi level (recursive) structure and match the url from one level to another.
Here is the structure of the ‘c struct’. (defined in src/core/util/core_utils.c)
/* internal structure to keep the rest map in a multi level hash */ typedef struct { /* the structure will keep as many as following fields */ /* if the mapped value is directly the operation */ axis2_op_t *op_desc; /* if the mapped value is a constant, this keeps a hash map of possible constants => corrosponding map_internal structure */ axutil_hash_t *consts_map; /* if the mapped value is a param, this keeps a hash map of possible param_values => corrosponding_map_internal structre */ axutil_hash_t *params_map; } axutil_core_utils_map_internal_t;
Here is how it will looks like when the above URL pattern set (shown in the above table) is kept inside this multi-level (recursive) structure.
svc->op_rest_map (hash) | "GET:students" --------- axutil_core_utils_map_internal_t (instance) | | | op_desc (axis2_op_t* for "GET students" op) | | | consts_map (empty hash) | | | params_map (hash) | | | "{student_id}" ------------- axutil_core_utils_map_internal_t (instance) | | | op_desc (axis2_op_t* for "GET students/{student_id}" op) | | | parms_map (empty hash) | | | const_map (hash) | | | "marks" ------------------- axutil_core_utils_map_internal_t (instance) | | | op_desc (NULL) | | | consts_map (empty hash) | | | params_map (hash) | | | "{subject_id}" ----------- axutil_core_utils_map_internal_t (instance) | | | op_desc (axis2_op_t* for "GET students/{student_id}/marks/{subject_id}" op) | | | consts_map / params_map (Both NULL) | "GET:students" --------- axutil_core_utils_map_internal_t (instance) | op_desc (axis2_op_t* for "GET students" op) | consts_map (empty hash) | params_map (hash) | "{student_id}" ------------- axutil_core_utils_map_internal_t (instance) | op_desc (axis2_op_t* for "GET students/{student_id}" op) | Â consts_map / params_map (Both NULL)
This structure is build at the time the server initialize the services. (from the “axis2_svc_get_rest_op_list_with_method_and_location” function in src/core/description/svc.c)
As each request hit the service, the request HTTP method and the URL is matched (which we call ‘rest dispatching’) with the above structure using the following algorithm. (defined in the “axis2_rest_disp_find_op” function in src/core/engine/rest_disp.c). Note that here we are extracting out the user REST parameters as well, but it is not shown in here.
- The request URL is spitted in to URL components from ‘/’ character. Retrive the instance of axutil_core_utils_map_internal_t from the svc->rest_map to the varaible ‘mapping_struct’.
- Check the existance of URL components, count(URL components) > 0.
- If it doesn’t exist any URL components, get the value of mapping_struct->op_desc where the mapping_struct is the current mapping instance of axutil_core_utils_map_internal_t. if the mapping_struct->op_desc is not NULL, we found the operation. If it is NULL just exit returning NULL.
- Else If some URL component(s) exist, check the most former URL component in the mapping_struct->const_map hash. If mapping_struct->const_map[‘former_url_component’] is not NULL, assign the mapping struct->const_map[‘former_url_component’] value to mapping_struct and follow the step 2 with the remaining URL components. (note that here hash[‘key’] syntax is used to take the value for the key from the hash ‘hash’. If that returns TRUE, we found the opeartion, if not countine to step5.
- if mapping_struct->const_map[‘former_url_component’] is NULL, match the former url component with each key (which is a URL component pattern) in mapping_struct->param_map hash. (We use the function “axis2_core_utils_match_url_component_with_pattern” in src/core/util/core_utils.c to map URL component with the URL component pattern). If matching pattern found assign the mapping_struct->param_map[‘key’] to mapping struct and follow the step 2 with the remaining URL components. If that returns TRUE for some key it will be the matching operation.
Where as the earlier algorithm can be simplified to,
- Match the request URL with URL patterns of each operation. This will be like calling the function “axis2_core_utils_match_url_component_with_pattern” (mentioned in step5 of the above algorithm) for the complete URL rather than for a URL component
- If the pattern is matched, matching operation is the selected operation for the request.
I approximately calculated the time complexity of both of these algorithm.
Here is the time complexity of the later described algorithm.
Average time complexity of iterating ‘n’ number of operations | n/2 = O(n) |
Time complexity of matching pattern with a URL with the length ‘p’ (complexity of the ‘axis2_core_utils_match_url_component_with_pattern’ function) | O(p^2) |
Complete time complexity of the algorithm | O(n*p^2) |
Time complexity of the formerly described algorithm. (which is currently in the SVN).
Time Complexity of a Hash Search | O(1) |
Average Number of has searches required. This is the average number of levels in the tree of recursive structures drawn above | long(n)/2 = O(log(n)) |
Time complexity of matching pattern with a URL component with the average length ‘d’, d < p (p = the length of the complete URL) | O(d^2) |
Number of time pattern matching is required = number of param components in the URL = k, k < p/d (p = the length of the url, d = average length of the URL component)/ | k = O(k) |
Complete time complexity of the algorithm | O(log(n)*d^2*k) |
Considering the facts, O(logn) < O(n),d < p and k < p/d we can safely conclude
O(long(n)*d^2*k) < O(n*p^2)Â => The newer algorithm has better (low) time complexity.
However the time complexity is valid only it takes high values for the parameters. For low values the actual time taken by the newer algorithm can have high values, considering the constant overhead of the recursions and the hash search. So in order to judge the performance of the algorithm, we have to run some test cases and measure the actual times. Possibly a task for the weekend 🙂