Dataset Splitting and Joining

Learn how to split datasets by field values and combine multiple datasets for better data organization and analysis.

Meado provides powerful tools for dataset manipulation that allow you to split large datasets into smaller, more manageable pieces and combine multiple datasets of the same type. These features are essential for organizing your data effectively and preparing it for analysis and optimization tasks.

Accessing Dataset Management Tools

To access the splitting and joining functionality:

  1. Navigate to your project dashboard
  2. Click on the "Datasets" tab in the project navigation
  3. Use the "+ Combine Datasets" or "+ Split Dataset" buttons in the action bar

Dataset Management Action Buttons

These buttons appear in the dataset management interface

Dataset Splitting

Dataset splitting allows you to divide a large dataset into multiple smaller datasets based on the distinct values in a specific field. This is particularly useful for organizing data by categories, regions, or other logical groupings.

When to Use Dataset Splitting

  • Regional Analysis: Split customer or facility data by geographic regions
  • Category Management: Separate orders or shipments by product categories
  • Performance Optimization: Break large datasets into smaller chunks for faster processing
  • Team Collaboration: Assign different dataset segments to different team members

How to Split a Dataset

Follow these steps to split a dataset:

  1. Click the "+ Split Dataset" button in the dataset management interface
  2. Select the base dataset you want to split from the dropdown menu
  3. Choose the field you want to split by (the system will load available fields automatically)
  4. Optionally, provide a name prefix for the new datasets
  5. Click "Split" to create the new datasets

Split Dataset Interface

The split interface allows you to configure how your dataset will be divided

Split Results

After splitting, you'll receive:

  • Multiple New Datasets: One dataset for each unique value in the split field
  • Automatic Naming: Datasets are named using the prefix (if provided) and the field value
  • Preserved Structure: All original columns and data types are maintained
  • Record Counts: Information about how many records were created in each split dataset

Example: If you split a customer dataset by "Region" field with values "North", "South", "East", "West", you'll get four new datasets: "Split by Region - North", "Split by Region - South", etc.

Dataset Combining (Joining)

Dataset combining allows you to merge multiple datasets of the same type into a single, larger dataset. This is useful for consolidating data from different sources or time periods.

When to Use Dataset Combining

  • Data Consolidation: Merge datasets from different time periods or sources
  • Regional Aggregation: Combine regional datasets into a national dataset
  • Team Collaboration: Merge datasets created by different team members
  • Historical Analysis: Combine monthly or quarterly datasets for yearly analysis
  • Data Recovery: Reconstruct a complete dataset from partial backups

How to Combine Datasets

Follow these steps to combine datasets:

  1. Click the "+ Combine Datasets" button in the dataset management interface
  2. Select a base dataset from the dropdown menu
  3. Choose additional datasets of the same type to combine with the base dataset
  4. Optionally, provide a name for the new combined dataset
  5. Click "Combine" to create the merged dataset

Combine Datasets Interface

Select datasets...
Q2 Orders
Q3 Orders
Only datasets of the same type can be combined.

The combine interface allows you to select multiple datasets to merge

Combining Requirements

Important Requirements

  • Same Dataset Type: All datasets must be of the same type (e.g., all "order" datasets)
  • Compatible Structure: Datasets must have compatible column structures
  • Minimum Two Datasets: You need at least two datasets to combine
  • Unique Records: The system will handle duplicate records appropriately

Combine Results

After combining, you'll receive:

  • Single Combined Dataset: One new dataset containing all records from the selected datasets
  • Preserved Data: All original data and columns are maintained
  • Record Count: Information about the total number of records in the combined dataset
  • Automatic Naming: The dataset will be named based on your input or auto-generated

Best Practices

For Dataset Splitting

  • Choose Meaningful Fields: Split by fields that create logical, useful groupings
  • Use Descriptive Prefixes: Provide clear prefixes that indicate the split criteria
  • Consider Data Size: Ensure split datasets are still large enough to be useful
  • Plan for Analysis: Split in ways that support your intended analysis workflows

For Dataset Combining

  • Verify Compatibility: Ensure all datasets have the same structure and data types
  • Check for Duplicates: Review your data for potential duplicate records before combining
  • Use Clear Naming: Provide descriptive names that indicate what was combined
  • Backup Original Data: Keep copies of original datasets before combining

Troubleshooting Common Issues

Split Operation Fails

Solution: Ensure the selected field has multiple distinct values, check that the dataset is not empty, and verify you have sufficient permissions to create new datasets.

Combine Operation Fails

Solution: Verify all selected datasets are of the same type, check that datasets have compatible column structures, and ensure you have at least two datasets selected.

Field Not Available for Splitting

Solution: The field may not have enough distinct values, or it may be a system field that cannot be used for splitting. Try selecting a different field with more variety.

Datasets Not Available for Combining

Solution: Only datasets of the same type can be combined. Check that all selected datasets have the same dataset type (e.g., all "order" or all "customer" datasets).

Performance Considerations

  • Large Datasets: Splitting or combining very large datasets may take several minutes
  • System Resources: These operations are performed on the server and may be queued during peak usage
  • Storage Space: Ensure you have sufficient storage space for the new datasets
  • Network Stability: Maintain a stable internet connection during the operation

Success: Once your datasets are split or combined, they will appear in your project's dataset list and be available for all standard dataset operations including analysis, optimization, and export. The original datasets remain unchanged unless you choose to delete them.