Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
L
legifrance-bot
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Incidents
Environments
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
leger
legifrance-bot
Commits
0e1cd65d
Commit
0e1cd65d
authored
Aug 23, 2016
by
Jean-Benoist Leger
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
abstraction
parent
0b72d345
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
21 additions
and
17 deletions
+21
-17
legifrance.py
legifrance.py
+21
-17
No files found.
legifrance.py
View file @
0e1cd65d
...
...
@@ -30,6 +30,26 @@ import re
import
time
import
configobj
def
get_articles_from_page
(
link
):
articles
=
{}
r
=
requests
.
get
(
link
)
reg
=
'^.*?<a href="(affichCodeArticle\.do[^"]*idArticle[^"]*)" title="En savoir plus sur l
\'
article ([^"]+)"'
c
=
r
.
content
while
True
:
a
=
re
.
match
(
reg
,
c
,
re
.
DOTALL
)
if
a
is
None
:
break
l1
=
'https://www.legifrance.gouv.fr/'
+
a
.
groups
()[
0
]
l1
=
re
.
sub
(
'&'
,
'&'
,
l1
)
l1
=
re
.
sub
(
';jsessionid=[^\?]*\?'
,
'?'
,
l1
)
l1
=
re
.
sub
(
'&dateTexte=[^&]*'
,
''
,
l1
)
articles
[
a
.
groups
()[
1
]]
=
l1
c
=
re
.
sub
(
'href='
,
''
,
c
,
1
)
return
articles
def
get_code
(
codename
,
codeids
):
if
not
codeids
.
has_key
(
codename
):
...
...
@@ -55,23 +75,7 @@ def get_code(codename,codeids):
articles
=
{}
for
link
in
links
:
r
=
requests
.
get
(
link
)
reg
=
'^.*?<a href="(affichCodeArticle\.do[^"]*idArticle[^"]*)" title="En savoir plus sur l
\'
article ([^"]+)"'
c
=
r
.
content
while
True
:
a
=
re
.
match
(
reg
,
c
,
re
.
DOTALL
)
if
a
is
None
:
break
l1
=
'https://www.legifrance.gouv.fr/'
+
a
.
groups
()[
0
]
l1
=
re
.
sub
(
'&'
,
'&'
,
l1
)
l1
=
re
.
sub
(
';jsessionid=[^\?]*\?'
,
'?'
,
l1
)
l1
=
re
.
sub
(
'&dateTexte=[^&]*'
,
''
,
l1
)
articles
[
a
.
groups
()[
1
]]
=
l1
c
=
re
.
sub
(
'href='
,
''
,
c
,
1
)
return
articles
articles
.
update
(
det_articles_from_page
)
class
codes
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment