The DIY forge - AI content Tutorials and Coding

Popular articles

  • Comfyui: Is your workflow optimized for your computer?
  • How to install SageAttention2 on a Windows system with ComfyUI?
  • How to install fastvlm from Apple on a windows computer?
  • What can you ask to MiniCPM-V 4.5 on pictures?
  • Installation of MiniCPM-V 4.5 and test scripts

Home

What can you ask to MiniCPM-V 4.5 on pictures?

Details
Written by: Super User
Category: tests
Published: September 16, 2025
Hits: 67
  • img2txt
  • vlm
  • minicpm

What can you ask to MiniCPM-V 4.5 on pictures?


Artistic vision of the test of minicpm45


In my latest tutorial, I provided some corrected installation method for MiniCPM-V 4.5 and some test scripts, so it's time to use them to see what we can ask to MiniCPM-V 4.5.

Today I will use the script to test the pictures.

Download the script above and put it in your "MiniCPM-V" directory, then activate your venv environment and run it in a command line prompt from the "MiniCPM-V" directory. (If you are used to use automatic1111 or comfyUI, you know what I mean, so I won't explain more).I would also suggest that you read first the "MiniCPM Model License.md" file (open it with notepad) from the same directory.

When you start the script, the main commands are:

Read more: What can you ask to MiniCPM-V 4.5 on pictures?

Installation of MiniCPM-V 4.5 and test scripts

Details
Written by: Super User
Category: tutorial
Published: September 14, 2025
Hits: 56
  • img2txt
  • vlm
  • txt2txt
  • minicpm

Artistic view of the VLM minicpm

Installation of MiniCPM-V 4.5 and test scripts


MiniCPM is a family of compact, efficient, open-source language models developed by OpenBMB.
Its multimodal variants, like MiniCPM-V, can understand both text and images, enabling strong performance on vision-language tasks while being small enough to run on consumer-level hardware.

(Thank you AI, for the definition.)

In summary MiniCPM is like a VLM and if you feed it pictures, it can comment on them.
You can feed it many things: one picture, several pictures, videos, pdf (with Ollama-OCR)...

For its size it is surprisingly smart.
Honestly, wow...

It could even fit in a mobile phone and there is a test apk project that you can test.

Read more: Installation of MiniCPM-V 4.5 and test scripts

How to install fastvlm from Apple on a windows computer?

Details
Written by: Super User
Category: tutorial
Published: September 09, 2025
Hits: 96
  • Tutorial
  • vlm

Can you install an Apple VLM model on a Windows system?

humor:
        the vlm found out

Beside a couple of other products that look promising like the Airpod pro 3 (reportedly with live translation). Apple also released, a little bit before, a series of VLM models on huggingface, the FastVLM series:


Problem: its license is extremely restrictive

Second problem: which version of FastVLM are we talking about?

Read more: How to install fastvlm from Apple on a windows computer?

How to install SageAttention2 on a Windows system with ComfyUI?

Details
Written by: Super User
Category: tutorial
Published: September 08, 2025
Hits: 131

How to install SageAttention2 on a Windows system with ComfyUI?

Artistic vision of Sage Attention with a wheel and a Sage
Attention


As I explained in this recent tutorial on how I optimized a ComfyUI workflow for Wan 2.2, SageAttention is a tool/component that allows you to generate videos faster.

To make the things easier, I'll consider that SINCE YOU HAVE COMFYUI, you already have a python virtual environment (venv) with everything installed and so on, because this will make the things much shorter to explain.

If you come here, I suppose that you are already aware of this kind of command:

Read more: How to install SageAttention2 on a Windows system with ComfyUI?

Comfyui: Is your workflow optimized for your computer?

Details
Written by: Super User
Category: video generation
Published: September 04, 2025
Hits: 176
  • video generation

Illustrative picture of an AI scientist generated by
Gemini


This article, based on my own experimentations with wan 2.2, explains why you should optimize your settings and your nodes with ComfyUI. This means to not just rely on third-party tools...
At the end, I created my own optimized workflow that I link here to do a "First frame, last Frame" video generation with wan 2.2.

If you are interested in the workflow, this link will redirect you to the related CivitAi page. Note, that I still plan to tune it a bit.
Click on "read more" if you want to know how I did...

Read more: Comfyui: Is your workflow optimized for your computer?

Official release of Sora

Details
Written by: Super User
Category: Text-to-Video
Published: December 10, 2024
Hits: 1938
  • txt2vid
  • video
  • sora

Click on the picture below or here where you can find how to use Sora

Screenshot
        website sora

How to generate super-resolution versions of images ? - Part 2: Basic txt2img upscaling concepts and upscalers in Automatic1111 (img2img and Extras tab)

Details
Written by: Super User
Category: image to image
Published: November 03, 2024
Hits: 3698
  • txt2img
  • img2img
  • Extras
  • Upscale

Karen from North America

This is the second part of my tutorials on super-resolutions.

I want to be able to generate pictures above their normal resolutions.
It can be like in this famous "zoom and enhance" trick, to give additional details to pictures.
But it can also be to make generated images larger with a better quality or to remove the blur that would otherwise have an upscaled picture. This is what we are going to see here.
This article is more a serie of experiments that I make with Automatic1111, to see the changes induced by some upscale settings on generated pictures, that's not really a "how to".

Read more: How to generate super-resolution versions of images ? - Part 2: Basic txt2img upscaling concepts...

How to generate super-resolution versions of images ? - Part 1: restore pictures with img2img

Details
Written by: Super User
Category: image to image
Published: October 17, 2024
Hits: 7687
  • super-resolution
  • img2mg
  • picture enhancement

Part 1: How to restore blurry/pixelated pictures with basic img2img in Automatic111 ?

 


I want to be able to do this famous Hollywood "zoom - enhance" trope trick at home.
I used various tools to get a result and this result varies a lot in function of the technic that I used.

In this article I will cover how to use the img2img tab of Automatic1111, an interface to use "stable diffusion" models, to generate enhanced pictures.

Read more: How to generate super-resolution versions of images ? - Part 1: restore pictures with img2img

How to use Stable Diffusion 3 models with Automatic1111 ?

Details
Written by: Super User
Category: Text-to-Image
Published: July 04, 2024
Hits: 11225
  • stable diffusion
  • automatic1111
  • how to
  • stable diffusion 3

Update: i see that this tutorial is now obsolete, there is no more a sd3 branch of Automatic1111. If you experience any issue, download Forge instead.

 

Let's make it short: You can't with the normal version but you can with a special branch of the software.

Image generated with Stable Diffusion 3 with automatic1111

 

Do you want to know more ?

Read more: How to use Stable Diffusion 3 models with Automatic1111 ?

Microsoft released some courses on Generative AI

Details
Written by: Super User
Category: Uncategorised
Published: July 04, 2024
Hits: 977
  • lessons

There are, for the moment, 18 lessons that are available on github.
The github features videos and source code. 
Have a look on the screenshot or click on this link to go to the Github of Microsoft.

 

Page 1 of 6

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6