VSTSC's Community Server

Community Server: The platform that enables you to build rich, interactive communities.
VSTeamSystemCentral: Your Team System community!

Welcome to VSTSC's Community Server Sign in | Join | Help
in Search

Applied Master Data Management

  • A Brief Introduction to MDM

       The new Microsoft MDM site is a good place to start learning about Master Data Management and Microsoft’s efforts in this area.

     

       Master Data is really a pretty common thing for engineers. I learned about it way back in my manufacturing engineering days.

     

       Consider this scenario: Conglomerate C (CC) makes widgets and starts acquiring businesses that also make widgets. CC sells widgets by the pound, but Acquisition A (AA) measures them by counting individual widgets, while Acquisition B (AB) sells them by the case (gross, or 144 ea).

     

       CC now wants all this data in a data warehouse so they can compare apples to apples and know, among other things, how many widgets they’re actually making and selling on a given day.

     

    Note: Instrumentation and measurement are scientific disciplines in their own rite. There's a lot more to this, which I hope to cover in this blog.

     

       The Unit of Measure in the existing database, dbCC, is pounds. The Widgets tables from the three companies look like this:

     

    dbCC.dbo.Widgets

    ID

    Date

    Weight

    1

    1/1/2007

    2076

    2

    1/2/2007

    2100

    3

    1/3/2007

    1977

     

    dbAA.Product.Widgets

    ProductID

    Date

    Count

    F0932E13-218D-458A-BE09-3286AFDE0280

    1 Jan 2007

    10,265

    F68BF7AC-553E-4A32-B1CB-442DD310194C

    2 Jan 2007

    13,009

    8C0C7511-1386-4C13-84B8-2351248280E6

    3 Jan 2007

    17,121

     

    dbAB.dbo.Widgets

    ID

    Date

    Cases

    1

    20070101

    84

    2

    20070102

    82

    3

    20070103

    99

     

       MDM is all about standardizing this data. The keys to standardizing this data are recognizing traits in the data types. For instance, the Cases to Count ratio is most likely stable and predictable. Conversion is easily accomplished using multiplication (or division, depending on which way you go in the standardization). But the weight to count (individual or case count) is going to depend on other factors. Most notably, do all widgets weigh the same? If not, what’s the acceptable tolerance?

       Dimensional analysis (the multiplication or division you do to convert known quantities) is really a question about measurement granularity (or grain). You will want to store as fine a grain as possible, trust me. Looking at the sample data, you will want to store WidgetCount somewhere. dbAA is already in this format. Yay. dbAB is easy enough: dbAD.dbo.Widgets.Cases * 144 gives you WidgetCount. Again, the math on widget Weight becomes fuzzy.

       This fuzziness will impact the integrity of your data. There are a couple important measures of data warehouse integrity – data accuracy and signal to noise (usually defined by the percentage of “unknowns” in the data).

     

       When I have encountered this scenario in the field, I have informed the customer of the dangers and begged them to collect better metrics at the WidgetWeight station.

     

       There are other issues in these examples: date and ID standardization. Dates are fairly straightforward. The IDs can be a little tricky. To standardize the IDs in this example I would consider a Location and ID compound key on the first pass. I’d create a couple tables in the data warehouse staging database that look like this:

     

    Staging.Products.Widget

    LocationID

    ID

    Date

    Count

    1

    1

    1/1/2007

    10380

    1

    2

    1/2/2007

    10500

    1

    3

    1/3/2007

    9885

    2

    1

    1/1/2007

    10,265

    2

    2

    1/2/2007

    13,009

    2

    3

    1/3/2007

    17,121

    3

    1

    1/1/2007

    12,096

    3

    2

    1/2/2007

    11,808

    3

    3

    1/3/2007

    14,256

    Staging.Products.Location

    LocationID

    LocationDescription

    1

    dbCC

    2

    dbAA

    3

    dbAB

     

     

       I’ve assumed (based on customer feedback) I get 5 widgets / pound from dbCC, and I know the math for the rest. I normalized dates and IDs and added a LocationID and Location table to manage my data source / IDs.

     

       There are some definite tricks to initial master data loads. To get data into this format in Staging I would execute the following queries:


    -- initial load...

    -- dbCC...

    Insert Into Products.Widget
    (LocationID
    ,ID
    ,Date
    ,[Count])
    Select l.LocationID
    ,s.ID
    ,s.Date
    ,(s.Weight * 5)
    From Products.Location l
    Left Outer Join [dbCC].dbo.Widgets s on s.ID >= l.LocationID or s.ID < l.LocationID
    Where l.LocationDescription = 'dbCC'
    order by s.ID

    -- dbAA...
    declare @tbl table
    (ID int
    ,Date smalldatetime
    ,[Count] int)

    Insert Into @tbl
    (ID
    ,Date
    ,[Count])
    Select
    row_number() over(order by s.ProductID) as 'RowNumber'
    ,s.Date
    ,s.[Count]
    From [dbAA].Product.Widgets s
    order by s.ProductID

    Insert Into Products.Widget
    (LocationID
    ,ID
    ,Date
    ,[Count])
    Select l.LocationID
    ,s.ID
    ,Convert(smalldatetime, s.Date)
    ,s.[Count]
    From Products.Location l
    Left Outer Join @tbl s on s.ID >= l.LocationID or s.ID < l.LocationID
    Where l.LocationDescription = 'dbAA'
    order by s.ID

    -- dbAB...
    Insert Into Products.Widget
    (LocationID
    ,ID
    ,Date
    ,[Count])
    Select l.LocationID
    ,s.ID
    ,Convert(smalldatetime, Left(Convert(varchar,s.Date), 4) + '-' +
    SubString(Convert(varchar,s.Date), 5, 2) + '-' +
    Right(Convert(varchar,s.Date), 2))
    ,(s.Cases * 144)
    From Products.Location l
    Left Outer Join [dbAB].dbo.Widgets s on s.ID >= l.LocationID or s.ID < l.LocationID
    Where l.LocationDescription = 'dbAB'
    order by s.ID

       There are definitely better ways to finesse the data into the inital Staging load than this. The Outer Joins are meaningless and serve no logical purpose other than to "trick" the data into the same row as the location lookup data. We will examine some better ways to load MDM in future posts.

     

       There’s more to Master Data, but this is the type of business problem folks are trying to solve when they talk about MDM.

     

    :{> Andy

     

    Technorati Tags:

    Technorati Profile
Powered by Community Server (Personal Edition), by Telligent Systems