Welcome to the forum, Guest

TOPIC: ComicVine scraper: List of Covers text

ComicVine scraper: List of Covers text 3 years 2 months ago #5753

sandman4sure

Offline

Fresh Boarder

Posts: 19

Karma: 1

When scraping information from ComicVine, a lot of times extra information in the synopsis is added which should be removed, for instance:
List of covers and their creators:CoverNameCreator(s)Sidebar LocationRegRegular CoverFernando Heinz1TortureTorture CoverRaulo Caceres8ADArt Deco CoverMichael DiPascale9FFFemme Fatale CoverJuan Jose Ryp10HTHomo Tortor CoverFernando Heinz11RitualRitual CoverRaulo Caceres12WrapWraparound CoverRafa Ortiz13MMMegafauna Mayhem Wraparound CoverFernando Heinz14TBTerror Birds Wraparound CoverFernando Heinz15RIRed Crossed Order Incentive CoverFernando Heinz16DlxSaber Tooth Deluxe Collectors Box Set CoverFernando Heinz17RESchool Days Cover, Limited to 350?4RECalgary Art Deco Cover, Limited to 350Michael DiPascale2REChicago Exclusive Art Deco Cover, Limited to 350Michael DiPascale3REDenver Comic Con Exclusive Art Deco Cover, Limited to 350Michael DiPascale7REPhoenix Comic Con Exclusive Art Deco Cover, Limited to 350Michael DiPascale5REVIP Pure Art Cover, Limited to 250Juan Jose Ryp6

I think everything starting with "List of covers..." should be removed from the comic synopsis
This is probably an error in the ComicVine API but a String replace on the text shouldn't be too hard
The administrator has disabled public write access.

ComicVine scraper: List of Covers text 3 years 2 months ago #5754

Luis Ángel

Offline

Administrator

Posts: 2612

Thank you received: 543

Karma: 21

The problem is that YACReader shouldn't try to be smart about how to interpret the data comming from the API. If the JSON field says synopsis then all of it is handled as the synopsis.

So maybe it is better to report this to ComicVine?
Contribute to the project becoming a patron: www.patreon.com/yacreader
You can also donate via Pay-Pal: www.paypal.com/donate?business=5TAMNQCDD...e=Support+YACReader\
The administrator has disabled public write access.

ComicVine scraper: List of Covers text 3 years 2 months ago #5755

sandman4sure

Offline

Fresh Boarder

Posts: 19

Karma: 1

Hmm, I bought the iOS version and the synopsis have a lot of weird text in it. So maybe you can take it up with ComicVine because it is your program?

Don't get me wrong, I like the program and love that it syncs with a backend (that is the reason I use Yacreader over other alternatives) and when it was only the opensource backend I am oke with helping out with these problems, but when you have added a paid iOS counterpart we can expect to report bugs and not have to solve them ourself, right?
The administrator has disabled public write access.

ComicVine scraper: List of Covers text 3 years 2 months ago #5756

Luis Ángel

Offline

Administrator

Posts: 2612

Thank you received: 543

Karma: 21

Could you send me a link to a comic having these problems in Comic Vine?

Your are a YACReader user but also a Comic Vine user (you need a personal API key to use their service). If the problem is on YACReader's side, then there will be a fix, otherwise there is nothing I can do about it other than hope for a fix on their side.
Contribute to the project becoming a patron: www.patreon.com/yacreader
You can also donate via Pay-Pal: www.paypal.com/donate?business=5TAMNQCDD...e=Support+YACReader\
The administrator has disabled public write access.

ComicVine scraper: List of Covers text 3 years 2 months ago #5757

sandman4sure

Offline

Fresh Boarder

Posts: 19

Karma: 1

comicvine.gamespot.com/comic/4000-721070/

It's about that table.
Not sure how the API gives back the synopsis but in Yacreader it is shown as plain text and that doesn't work with the table
The administrator has disabled public write access.

ComicVine scraper: List of Covers text 3 years 2 months ago #5758

sandman4sure

Offline

Fresh Boarder

Posts: 19

Karma: 1

comicvine.gamespot.com/api/issue/4000-721070/?api_key=<API_KEY>&format=json

They are sending back the whole table. I think YacReader strips the html tags
The administrator has disabled public write access.

ComicVine scraper: List of Covers text 3 years 2 months ago #5759

sandman4sure

Offline

Fresh Boarder

Posts: 19

Karma: 1

Sorry for the multiple posts.
Because YACReader strips all the html tags words also get concatenated.
SERIES PREMIERE“TOTAL ECLIPSE OF THE HEART,” Part OneNo matter how...

See there is no space between One and No

You can probably use \n here, right?
The administrator has disabled public write access.

ComicVine scraper: List of Covers text 3 years 2 months ago #5760

Luis Ángel

Offline

Administrator

Posts: 2612

Thank you received: 543

Karma: 21

Yeah, looks like the problem is that they use html in that field but also it can be just text (or it used to be). I will try to see if there is a better key I can use, if not then better html handling is needed.
Contribute to the project becoming a patron: www.patreon.com/yacreader
You can also donate via Pay-Pal: www.paypal.com/donate?business=5TAMNQCDD...e=Support+YACReader\
The administrator has disabled public write access.
The following user(s) said Thank You: Drybonz, sandman4sure
Powered by Kunena Forum