Data transformation
From Wikipedia, the free encyclopedia
- This article is about data transformation in computer science (metadata). For statistical application, see data transformation (statistics).
In metadata, a data transformation converts data from a source data format into destination data.
Data transformation can be divided into two steps:
- data mapping maps data elements from the source to the destination and captures any transformation that must occur
- code generation that creates the actual transformation program
Data element to data element mapping is frequently complicated by complex transformations that requires one-to-many and many-to-one transformation rules.
The code generation step takes the data element mapping specification and creates an executable program that can be run on a computer system. Code generation can also create transformation in easy-to-maintain computer languages such as Java or XSLT.
When the mapping is indirect via a mediating data model, the process is also called data mediation.
Contents |
[edit] Transformational Languages
There are numerous languages available for performing data transformation. Many transformational languages require a grammar to be provided. In many cases the grammar is structured using something closely resembling Backus–Naur Form (BNF). There are numerous languages available for such purposes varying in their accessibility (cost) and general usefulness. Examples of such languages include:
- XSLT - the XML transformation language
- TXL - prototyping language-based descriptions using source transformation
It should be noted that though transformational languages are typically best suited for transformation, something as simple as regular expressions can be used to achieve useful transformation. Textpad supports the use of regular expressions with arguments. This would allow all instances of a particular pattern to be replaced with another pattern using parts of the original pattern. For example:
foo ("some string", 42, gCommon); bar (someObj, anotherObj); foo ("another string", 24, gCommon); bar (myObj, myOtherObj);
could both be transformed into a more compact form like:
foobar("some string", 42, someObj, anotherObj); foobar("another string", 24, myObj, myOtherObj);
In other words, all instances of a function invocation of foo with three arguments, followed by a function invocation with two invocations would be replaced with a single function invocation using some or all of the original set of arguments.
Another advantage to using regular expressions is that they will not fail the null transform test. That is, using your transformational language of choice, run a sample program through a transformation that doesn't perform any transformations. Many transformational languages will fail this test.
[edit] Difficult Problems
There are many challenges in data transformation. Probably the most difficult problem to address in C++ is "unstructured preprocessor directives". These are preprocessor directives which do not contain blocks of code with simple grammatical descriptions - example:
void MyFunc () { if (x>17) { printf("test"); #ifdef FOO } else { #endif if (gWatch) mTest = 42; } }
A really general solution to handling this is very hard because such preprocessor directives can essentially edit the underlying language in arbitrary ways. However, because such directives are not, in practice, used in completely arbitrary ways, one can build practical tools for handling preprocessed languages. The DMS Software Reengineering Toolkit] is capable of handling structured macros and preprocessor conditionals.
[edit] See also
- data conversion
- data mapping
- data element
- data migration
- metadata
- XSLT
- Model transformation
- ATL
- QVT
- Refinement (contrast)
- FermaT Transformation System
- Identity transform
[edit] References
- For further information on data transformation see Chapter 2.4 of [1].