1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Multi Label Classification on Data Columns in Tables

Discussion in 'Computer Science' started by DPascal, Oct 8, 2018.

  1. DPascal

    DPascal Guest

    I am seeking guidance on a machine learning problem involving the tagging of data columns. Currently, I have a system where users can add multiple tags to a columns in a table. However, I want to automate the tagging of new columns by using Multi Label Classification. I have extracted 21 features from each column by doing a column analysis on the column values. The features obtained would include statistical values such standard deviation, max,min, kurtosis and etc. Am I on the right path in using these features as inputs for a Multi Label Classification model ? Right now I am focusing on numeric values in columns

    Example:

    [​IMG]

    As an example, the above table on the left represent some arbitrary table which consist of 3 columns. As a user I would tag the column with the appropriate tags. So the Rainfall column would have rainfall and precipitation and Temperature column would have temperature. The table on the right just represents the tags being assigned to a column in a table format.

    [​IMG]

    Example sample data set in the above image

    In order for me to do multi-label classification to automate the tagging of columns automatically, when tables with similar columns are ingested into the system, I would need to extract some features or properties that describes the already tagged columns to use as input for the multi label model. So I did some column analysis and placed just several example features in the table above. This includes standard deviation, maximum,minimum, median and kurtosis. I have about 21 features in total. The output labels are also represented for each column in the above image where 1 signifies the label is present and 0 is not present.

    [​IMG]

    In the end the model will decide which tags are assigned to a newly discovered column based on its features.

    Login To add answer/comment
     

Share This Page