top of page

Computer Vision

Is Data Annotation Obsolete with Meta's Segment Anything

Meta released Segment Anything on April 5th, announcing the arrival of the image-based large model era. Do we still need manual labeling?

5

min

Mahmoud_edited.jpg

Basic Marketing

In the last decade, machine learning has ignited a new wave of artificial intelligence, culminating in the launch of ChatGPT late last year, that fully showcased the superpowers of large models. Some are exclaiming "after ChatGPT, NLP will no longer exist". In the past, natural language experts specialized in their own fields, with some focusing on text classification, information extraction, question answering, or reading comprehension. With the advent of large models, the prompt paradigm in the NLP field has begun to expand into the computer vision (CV) domain, allowing large models to achieve good results in zero-shot and few-shot learning of new datasets by using "prompt" technology. A few weeks ago, everyone was looking forward to the arrival of "ImageGPT" and "multimodal GPT."


On April 5th, Meta released Segment Anything [1], announcing the arrival of the image-based large model era. Has the standard answers (Ground truth) of manual annotations in academia and commercial applications become a relic of the past?


A Glimpse at Semantic Segmentation


Semantic segmentation is an advanced image processing technique that classifies each pixel to create semantically meaningful regions. It goes beyond merely outlining objects, providing precise labeling of their specific locations and shapes. Semantic segmentation has widespread applications in CV, medical image processing, and digital art.



Unveiling the Segment Anything Project


Meta's Segment Anything project introduces a new image segmentation task, model, and dataset. The dataset boasts the most extensive segmentation collection to date, with over 100 million Mask images and 11 million license-compliant images. The "promptable" model allows zero-shot learning transfer to new image tasks, showcasing impressive results, even surpassing previous results using fully supervised learning. The "Segment Anything Model (SAM)" and corresponding dataset (SA-1B) have also been released.



Creating accurate segmentation models typically "requires highly specialized work by technical experts with access to AI training infrastructure and large volumes of carefully annotated in-domain data." By creating SAM, Meta hopes to "3" the process by reducing the need for specialized training and expertise, promoting further development in computer vision research.


SAM: A Game Changer?


SAM claims to have mastered a general concept of objects, generating Masks for any object in any image or video, even those not encountered during training. Its versatility lends itself to a wide range of use cases, enabling zero-shot transfer to new image domains.


SAM vs. Manual Annotation


Challenges & Opportunities


Similar technologies are already on the market


We've seen many decent segmentation tools, like Photoshop or iOS's built-in image cutout feature. They can generate decent results and improve image processing efficiency with a good interactive experience.


In iOS 16 and later, you can isolate the subject of a photo from the rest of the photo and then copy or share it [2]


“Ground truth” in open-source datasets are full of errors


Image algorithm engineers often request annotators to re-annotate open-source datasets in commercial projects, incurring substantial costs. This is primarily due to the subpar "standard answers (Ground truth)" in open-source datasets, riddled with annotation errors.



For niche research scenarios, manual annotations may not satisfy data quality requirements. For example, a traffic light scenario researcher discovered glaring annotation errors in datasets like COCO. These issues persist in other renowned datasets, such as CIFAR-100 and ImageNet. Data annotation is challenging and error-prone, resulting from ambiguous requirement documents and human judgment inconsistencies.


Incorrect annotations for traffic lights in the COCO dataset


Not accurate enough in the professional field


In professional fields, when SAM's performance closely matches or even surpasses human annotation results in some data segmentation tasks, we question the poor quality of open-source datasets. Undoubtedly, no one would use ChatGPT's responses entirely in their articles, as we must accept occasional "nonsense" that may distort facts. Large general models cannot meet project demands when trained with insufficient professional data. In rigorous disciplines, such as medical diagnosis, autonomous driving, and security, errors are unacceptable.


SAM performs worse in medical data annotation. Professionals with medical backgrounds are necessary for this work


Other Challenges


Other challenges include issues similar to those faced by ChatGPT, like computing power and data security. In practical environments, many online small models cannot handle excessive operating costs.


Opportunities we found


Indeed, as artificial intelligence technology continues to grow, traditional NLP and CV techniques may gradually become obsolete in the future. Future research should focus on deeper, more abstract frameworks for thinking and exploration.


  • Embrace cutting-edge technology: Revolutionary technologies shouldn't bring despair; we should strive to understand and employ them. New paradigms improve production efficiency and lay a solid foundation for future innovation. Professionals in niche areas must continue digging deep.

  • Open-source software: AI technology's rapid development benefits from open-source concepts, enabling everyone to stand on giants' shoulders. This is the original intention behind our open-source Xtreme1 project

  • Open-source data: AI requires excellent data to operate correctly. We're currently researching the world's first multimodal data, covering the latest sensor devices and accurate human-annotated data. Stay tuned!


Lastly, we'd like to share some open-source image segmentation datasets for tech enthusiasts like you:


ADE20K (82.6k, 3.9GB):

https://opendatalab.com/ADE20K_2016



The ADE20K dataset offers benchmark scene data and partial segmentation data. Each folder houses images sorted by scene categories, with object and partial segmentation stored in distinct PNG files. All instances have been individually annotated for precision.


Medical Segmentation Decathlon (4.4k, 354.9GB):

https://opendatalab.com/Medical_Segmentation_Decathlon



MSD is an extensive collection of medical image segmentation datasets, comprising 2,633 three-dimensional images collected across multiple anatomical areas of interest, modalities, and sources. Specifically, it includes data for brain, heart, liver, hippocampus, prostate, lung, pancreas, hepatic vessels, spleen, and colon.


Panoptic Agricultural Satellite TIme Series (14.6k, 36.8GB):

https://opendatalab.com/PASTIS



PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural plots in satellite time series. It features 2,433 patches within French urban territories, each with panoptic annotations (instance index and semantic labels per pixel). Each patch consists of a variable-length Sentinel-2 multispectral image time series.


3D Lane Synthetic Dataset (30k, 17.8GB):

https://opendatalab.com/3D_Lane_Synthetic_Dataset



This synthetic dataset is designed to promote the development and evaluation of 3D lane detection methods. It expands upon the Apollo Synthetic Dataset. For detailed construction strategies and evaluation methods, refer to the ECCV 2020 paper: "Gen-LaneNet: a generalized and scalable approach for 3D lane detection," Y. Guo, et al., ECCV, 2020.


We hope that these resources will be helpful for researchers and practitioners working on image segmentation and other computer vision tasks. As AI technology continues to advance, it is essential to collaborate and share knowledge to promote further innovation and breakthroughs in the field.


References:


[1] Segment Anything. https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation

[2] Create and share photo cutouts on your iPhone. https://support.apple.com/en-us/HT213459

[3] The Mislabelled Objects in COCO. https://www.neuralception.com/mislabelled-traffic

[4] How I found nearly 300,000 errors in MS COCO. https://medium.com/@jamie_34747/how-i-found-nearly-300-000-errors-in-ms-coco-79d382edf22b

[5] Many thanks to OpenDataLab for providing dataset support. For more datasets, please visit: https://opendatalab.com


Cover image from Segment Anything | Meta. https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation

Get Project Estimates
Get a Quote Today

Get Essential Training Data
for Your AI Model Today.

bottom of page