Training text classifier models, I have three inquiries regarding language specificity that I would appreciate assistance with:
Regarding Google Cloud AutoML Natural Language, the documentation indicates a current limit of up to 100,000 documents (rows) per dataset for text classification tasks. Are there any plans to increase this limit in the future? If so, does anyone have any insights on when such an increase might occur?
In my BigQuery dataset, I have observed a discrepancy in row counts between the original data (149,000 rows) and the corresponding table in Looker Studio (72,000 rows). Is there a predefined limit on the number of rows that can be displayed in a Looker Studio table? Additionally, are there any plans to augment this limit in the future?
Given the constraints on the number of rows for training data and the inherent complexities of multilingual text analysis, I am deliberating whether it is advisable to train separate text classifier models for each language or if a single model trained with different languages is viable. My inclination leans towards developing distinct models for each language. Before proceeding, I would greatly appreciate insights on this matter. Would you recommend training individual models for each language, or do you believe that a single multilingual model would suffice?
Additionally, if anyone wish to discuss AutoML further, I will be available for conversation at the 2Gather Sunnyvale: Cloud Optimization Summit on Feb 15.
Your expertise and guidance on these matters would be immensely valuable to me. Thank you in advance for your support.