BasicAI Cloud v1.1: Data Annotation for Large Models Training

Annotate Smarter

BasicAI Cloud v1.1: Data Annotation for Large Models Training

BasicAI Cloud is now upgraded to v1.1, with annotation tools and workflows tailored for Generative AI and large language model training.

min

Admon W.

It is a truth universally acknowledged, that the natural language processing (NLP) landscape is undergoing a seismic shift, largely driven by the emergence of large language models (LLMs). These powerful models have showcased their ability to tackle a wide array of tasks, with Generative AI (Gen AI) built on top of them propelling virtual assistants from the likes of "Eliza" to the realm of "Jarvis."

As the era of large models arrives, data annotation, a crucial cog in the AI machine, is evolving to keep pace. At BasicAI, we've been quick to identify this trend and have proactively incorporated annotation tools and workflows tailored for Generative AI into BasicAI Cloud*'s roadmap. We strive to help our customers achieve AI success in the new wave – as always.

This brings us to the launch of BasicAI Cloud* v1.1. Here is a brief introduction to this update:

Introducing New Annotation Tools for SFT and RLHF Tasks

Supervised Fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) are two critical stages in the supervised training pipeline of large language models. SFT provides the model with suggested answers, while RLHF fine-tunes the model's output to align with human preferences through a reward-punishment mechanism (via scoring or ranking).

BasicAI v1.1 Now Supports Generative AI Data Annotation

BasicAI Cloud* now supports both of these annotation task types. It also enables the annotation of Generative AI data that combines images and text, facilitating the creation of training datasets for large multimodal models.

New Ontology and Tools for Generative AI Data

Accordingly, we've introduced the Generative AI Ontology type, which comes with two annotation tools: Dialog Evaluation and Dialog Response.

Generative AI Ontology with Two Annotation Types

Dialog Response Tool: Enabling annotators to continue the dialogue with suitable responses by different roles based on the given context. By providing high-quality, human-generated responses, annotators create a rich SFT dataset that serves as a reference for the model during the fine-tuning process.

Dialog Evaluation Tool: Designed for RLHF tasks. The tool allows annotators to label dialogue content generated by models, providing valuable feedback on the quality and appropriateness of the generated responses. This labeled data is then used to fine-tune the model, reinforcing desirable behaviors and penalizing suboptimal ones, thereby aligning the model's output with human preferences.

Classification Tool: Used to assign labels or categories to the entire dialogue, providing valuable metadata for the model, like language, domain, or formality level. This contextual information enables the model to generate more appropriate and targeted responses, enhancing its overall performance and usability.

Dialog Response Tool for SFT Tasks and Dialog Evaluation for RLHF Tasks

Collaboration System for GenAI Annotation

At the same time, the collaboration system has been updated to support SFT and RLHF tasks, offering features like custom workflows, task distribution, members management, and automatic QA – similar to other task types. The performance statistics logic remains largely unchanged. Users can batch modify the Ontologies, workflow, basic settings, and QA rules for Generative AI tasks.

Annotation Interface

The image below shows the annotation interface for Generative AI data. Users can upload and process .json, .csv, .xlsx, .xls files and .zip, .gzip, .tar, and .rar archives containing valid files.

The platform automatically parses the data and renders it as dialogue bubbles, differentiating between User and Bot roles. Similar to text annotation, the canvas supports content search, font size scaling, and toggling between edit / read-only modes. Ontology attributes can be expanded or collapsed. The "Class Inheritance" feature allows quick application of the same Class labels to different dialogue content.

We've also introduced a nifty Pin feature that lets users pin up to 4 selected bubbles on the interface.

Start Your LLM Training Journey with BasicAI Cloud*

From computer vision data like images, videos, and point clouds, to NLP data spanning text, speech, and now LLM data support, BasicAI Cloud* remains committed to being the go-to annotation platform for algorithm experts and businesses across diverse domains. We firmly believe that data is the bedrock of any model's success, and we aim to accelerate this success by simplifying the building of high-quality training datasets.

Click the button below to create your first GenAI dataset now.

Annotate for Gen AI

History View & Restore for Annotation Tools

History tracking is a vital feature in tool-based software (e.g., document management, data management) to monitor data changes. We've incorporated this into our platform, enabling users to view and restore point cloud and image annotation tool results. In case of annotation data loss due to human error or system issues, the nearest version can be swiftly restored based on the history.

The annotation modules automatically save every 5 minutes when not in a Paused state. Manual operations that alter the data state also trigger the saving of history records. The entry point for history records can be found at the top of the annotation interface. Upon entering, users can view operation details and preview or restore them to a specific version.

If you are a project manager, you can freely configure team members' permissions to view and restore task history records.

Enhanced Upload/Export Progress Tracking with Stage-wise Breakdown

During user research, we learned that the platform's display of total progress during upload and export could sometimes be perplexing, especially when data remained stuck at 40% or 70% for extended periods. To address this, we've introduced a more detailed breakdown, splitting the progress bar into distinct stages to provide users with clearer insights into the current upload/export progress.

For data uploads, the stages may include: Uploading, Pulling, Unzipping, Data Format Conversion, and Parsing. Hovering over the progress bar reveals the time spent on each stage. For batch uploads, the queue status is also displayed, indicating the position of specific data in the queue or whether it's being processed. In case of upload failures, a specific reason is provided, enabling users to make necessary adjustments and re-upload the data. We've also expanded the size limit of uploaded files to 100GB (free for a limited time).

Data exports involve stages such as Standard Format Processing, Script Conversion, Zipping, and Transferring. The export process also showcases the queue, progress, and processing status.

Explore BasicAI Cloud* v1.1 Now

These are the three key updates in BasicAI Cloud* v1.1. Additionally, we've made a host of UX optimizations and squashed some bugs. Now, with BasicAI Cloud* v1.1, you can confidently hop on the express of AI and make strides toward a future where your bold innovation is accessible to all.

Get Started

* To further enhance data security, we discontinue the Cloud version of our data annotation platform since 31st October 2024. Please contact us for a customized private deployment plan that meets your data annotation goals while prioritizing data security.

Back to All Posts

Get Essential Training Data
for Your AI Model Today.

Let's Talk

AI Training Data Solutions & Services

Overview of BasicAI’s professional, efficient and low-cost data annotation services for all types of training data and all industries.

Contact BasicAI to get project estimates and free pilot for your customized data labeling project.

End-to-end image/video annotation services for robust computer vision.

Leading 3D Sensor Fusion annotation services for autonomous systems.

Data labeling services for large language model and Gen AI training.

Get Project Estimates

BasicAI Data Annotation Platform

Overview of BasicAI’s all-in-one smart data annotation platform.

Explore the AI-powered labeling toolset for all types of AI training data.

See how BasicAI facilitates collaborative annotation project.

Learn about annotation tools designed for SFT, RLHF and classification tasks.

Tools for auto point cloud data labeling and semantic segmentation.

Choose the right plan for your teams, no matter how small or large.

Industries & Use Cases

Proprietary Data Engine
Prompt Delivery
Full Quality Assurance

Competitive Pricing
Dedicated Project Manager
Robust Data Security

Free Pilot Project

Blog

Platform

Open Source

An all-in-one open-source data labeling platform for multimodal training data.

BasicAI Cloud v1.1: Data Annotation for Large Models Training

Introducing New Annotation Tools for SFT and RLHF Tasks

New Ontology and Tools for Generative AI Data

Collaboration System for GenAI Annotation

Annotation Interface

Start Your LLM Training Journey with BasicAI Cloud*

History View & Restore for Annotation Tools

Enhanced Upload/Export Progress Tracking with Stage-wise Breakdown

Explore BasicAI Cloud* v1.1 Now

Get Essential Training Data
for Your AI Model Today.

AI Training Data Solutions & Services

Overview of BasicAI’s professional, efficient and low-cost data annotation services for all types of training data and all industries.

Contact BasicAI to get project estimates and free pilot for your customized data labeling project.

End-to-end image/video annotation services for robust computer vision.

Leading 3D Sensor Fusion annotation services for autonomous systems.

Data labeling services for large language model and Gen AI training.

Get Project Estimates

BasicAI Data Annotation Platform

Overview of BasicAI’s all-in-one smart data annotation platform.

Explore the AI-powered labeling toolset for all types of AI training data.

See how BasicAI facilitates collaborative annotation project.

Learn about annotation tools designed for SFT, RLHF and classification tasks.

Tools for auto point cloud data labeling and semantic segmentation.

Choose the right plan for your teams, no matter how small or large.

Industries & Use Cases

Proprietary Data Engine Prompt Delivery Full Quality Assurance

Competitive Pricing Dedicated Project Manager ​Robust Data Security

Free Pilot Project

Blog

Platform

Open Source

An all-in-one open-source data labeling platform for multimodal training data.

BasicAI Cloud v1.1: Data Annotation for Large Models Training

Introducing New Annotation Tools for SFT and RLHF Tasks

New Ontology and Tools for Generative AI Data

Collaboration System for GenAI Annotation

Annotation Interface

Start Your LLM Training Journey with BasicAI Cloud*

History View & Restore for Annotation Tools

Enhanced Upload/Export Progress Tracking with Stage-wise Breakdown

Explore BasicAI Cloud* v1.1 Now

Get Essential Training Data for Your AI Model Today.

Proprietary Data Engine
Prompt Delivery
Full Quality Assurance

Competitive Pricing
Dedicated Project Manager
Robust Data Security

Get Essential Training Data
for Your AI Model Today.