Scraping recorded TV-Shows – extend TVDB Scraper – get function calls right

Already spent days to get this working: I got a local PVR to record Movies and TV shows. For recorded movie files I was already successful scraping them with the information my PVR gives me by calling its XML-API – with the TV shows I still fail and I now hope somebody can help me.

I thought the task would be simple: Just make a HTTP-Call to the PVR API, get the TVDB-ID there for the file to be scraped, and then go on with the regular TVDB-scraper using this ID. So mainly just modifying the functions “CreateSearchUrl” or maybe also “GetSearchResults”.

Yet the main problem I have is calling a function in the right way to do this TVDB-ID lookup. I tried many ways – two of them are shown below.

What I was still not able to figure out is how to call a function in the right way. My main questions are:

  1. When I use a function to get some XML-file from another site – how do I trigger to really GET the content from those sites? At what point of the execution of the scraper are URLs realy evaluated? My log files suggests that this is not triggered just calling a function – they mainly just extend the URLs and make them richer with code.
  2. Do I need to enclose the results of a function with any XML-tags? Mostly all of the code sample put the results between <details> and </details> – but why? Why “details” and could I also use “url” for example. I don’t understand how this is used. I just noticed that if I don’t use any enclosing tags at all the function simple doesn’t show up being executed in the log file.

So this is my code …

First try – extend CreateSearchUrl to call the PVR API to get the TVDB-ID and then pass it on regularily to GetSerachResults:

Code:
<CreateSearchUrl dest="3">
    <RegExp input="$$1" output="<chain function="GetTVDBIdFromEpisode">\1</chain>" dest="5">
        <expression >(?:%20| |_)([]0-9]+)(?:\.ts|$)</expression>
    </RegExp>
    <RegExp input="$$5" output="<url>http://thetvdb.com/api/GetEpisode.php?id=\1&amp;language=$INFO[language]</url>" dest="3">
        <expression noclean="1" />
    </RegExp>
</CreateSearchUrl>

<GetTVDBIdFromEpisode dest="3">
    <RegExp input="$$4" output="<details>\1</details>" dest="3">
        <RegExp input="$$1" output="<url function="ParseTVDBIdFromEpisode">http://10.0.0.1:8081/record.onexml?id=\1</url>" dest="4">
            <expression />
        </RegExp>
        <expression noclean="1" />
    </RegExp>
</GetTVDBIdFromEpisode>

<ParseTVDBIdFromEpisode dest="5">
    <RegExp input="$$1" output="<details>\1</details>" dest="5">
        <expression><t_thetvdbid>([0-9]+)</t_thetvdbid></expression>
    </RegExp>
</ParseTVDBIdFromEpisode>

Not working – both the API call and the function ParseTVDBIdFromEpisode are working, but the result is not passed on to the top. So the value for the ID to be used in CreateSearchUrl is finally empty.

Code:
23:15:30 T:1826356112   DEBUG: std::vector<CScraperUrl> ADDON::CScraper::FindMovie(XFILE::CCurlFile&, const string&, bool): Searching for '20160830 rtl Bones - Die Knochenjaegerin 2026' using IPTV PVR TV Series Scraper scraper (path: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')
23:15:30 T:1826356112   DEBUG: scraper: CreateSearchUrl returned <url>http://thetvdb.com/api/GetEpisode.php?id=<chain function="GetTVDBIdFromEpisode">2026</chain>&language=de</url>
23:15:30 T:1826356112   DEBUG: scraper: GetTVDBIdFromEpisode returned <details><url function="ParseTVDBIdFromEpisode">http://10.0.0.1:8081/record.onexml?id=2026</url></details>
23:15:30 T:1826356112   DEBUG: CurlFile::Open(0x6e1624b0) http://10.0.0.1:8081/record.onexml?id=2026
23:15:30 T:1826356112    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://10.0.0.1
23:15:30 T:1826356112   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://10.0.0.1:8081/record.onexml?id=2026"
23:15:30 T:1826356112   DEBUG: scraper: ParseTVDBIdFromEpisode returned <details>4818866</details>
23:15:30 T:1826356112   DEBUG: CurlFile::Open(0x6e1624b0) http://thetvdb.com/api/GetEpisode.php?id=
23:15:30 T:1826356112   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://thetvdb.com/api/GetEpisode.php?id="
23:15:30 T:1826356112   DEBUG: scraper: GetSearchResults returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><results></results>

Second try – extend the function GetSearchResults to make the API-call:

Code:
<CreateSearchUrl dest="3">
    <RegExp input="$$1" output="<url>http://10.0.0.1:8081/record.onexml?id=\1</url>" dest="3">
        <expression noclean="1">(?:%20| |_)([]0-9]+)(?:\.ts|$)</expression>
    </RegExp>
</CreateSearchUrl>

<GetSearchResults dest="8">
    <RegExp input="$$5" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results><entity>\1</entity></results>" dest="8">
        <RegExp input="$$1" output="<title>\1</title>" dest="5">
            <expression><t_caption>([^<]*)</t_caption></expression>
        </RegExp>
        <RegExp input="$$1" output="<url cache="tt\1.xml" function="GetTVDBIdFromEpisode">http://thetvdb.com/api/GetEpisode.php?id=\1&amp;language=$INFO[language]</url>" dest="5+">            
            <expression><t_thetvdbid>([0-9]+)</t_thetvdbid></expression>
        </RegExp>
        <expression noclean="1" />
    </RegExp>
</GetSearchResults>

<GetTVDBIdFromEpisode dest="3">
    <RegExp input="$$1" output="<url cache="\1-$INFO[language].xml">http://thetvdb.com/api/1D62F2F90030C444/series/\1/all/$INFO[language].zip</url>" dest="3">
        <expression><seriesid>([0-9]+)</seriesid></expression>
    </RegExp>
</GetTVDBIdFromEpisode>

Still not working – the API function call is not done during execution of GetSearchResults. Instead the whole code for calling the function is passed on to the function “GetDetails” and there it is not working.

Code:
23:11:04 T:1825258144   DEBUG: std::vector<CScraperUrl> ADDON::CScraper::FindMovie(XFILE::CCurlFile&, const string&, bool): Searching for '20160830 rtl Bones - Die Knochenjaegerin 2026' using IPTV PVR TV Series Scraper scraper (path: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')
23:11:04 T:1825258144   DEBUG: scraper: CreateSearchUrl returned <url>http://10.0.0.1:8081/record.onexml?id=2026</url>
23:11:04 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://10.0.0.1:8081/record.onexml?id=2026
23:11:04 T:1825258144    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://10.0.0.1
23:11:04 T:1825258144   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://10.0.0.1:8081/record.onexml?id=2026"
23:11:04 T:1825258144   DEBUG: scraper: GetSearchResults returned <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results><entity><title>Bones - Die Knochenj&#xE4;gerin</title><url cache="tt4818866.xml" function="GetTVDBIdFromEpisode">http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de</url></entity></results>
23:11:04 T:1825258144   DEBUG: bool ADDON::CScraper::GetVideoDetails(XFILE::CCurlFile&, const CScraperUrl&, bool, CVideoInfoTag&): Reading movie 'http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de' using IPTV PVR TV Series Scraper scraper (file: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')
23:11:04 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de
23:11:04 T:1825258144    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://thetvdb.com
23:11:05 T:1825258144   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de"
23:11:05 T:1825258144   DEBUG: scraper: GetDetails returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><details><id></id><chain function="GetArt"></chain><episodeguide><url cache="-.xml">http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de</url></episodeguide></details>
23:11:05 T:1825258144   DEBUG: scraper: GetArt returned <details><url function="ParseArt" cache="-de.xml">http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml</url></details>
23:11:05 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml
23:11:05 T:1825258144   ERROR: CCurlFile::Open failed with code 404 for http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml

Thank you for very much your help or your ideas – highly appreciated!
Gerald