You will not be able to anticipate everything you will want to learn from a large body of data at first. Starting with a well-documented, flexible metadata model for your content and other data assets will ensure that the learning systems can understand new information in context and, with less work than starting with completely unstructured data, start identifying unrecognized patterns. CIO has a good basic outline:
There are four distinct metadata categories to look at if you want to ensure that you’re delivering comprehensive, relevant and accurate data to implement AI:
- Technical metadata – includes database tables and column information as well as statistical information about the quality of the data.
- Business metadata – defines the business context of the data as well as the business processes in which it participates.
- Operational metadata – information about software systems and process execution, which, for example, will indicate data freshness.
- Usage metadata – information about user activity including data sets accessed, ratings and comments.
Source: Effective artificial intelligence requires a healthy diet of data